Some notes: - A lot of you have pointed out that (tanh(x)+1)/2 == sigmoid(2x). I didn't realize this, so the improvement I was seeing may have been a fluke, I'll have to test it more thoroughly. It is definitely true that UNnormalized tanh outperforms sigmoid. - There are apparently lots of applications of the fourier series in real-world neural nets, many have mentioned NERF and Transformers.

unnormalised tanh has 2 times the slope of the sigmoid -- so, narrower linear region (i.e. faster transition) could be the reason for better performance? it could be tested by varying k in sigmoid(k*x).

@@dank.1151 The correct equivalence is ( tanh( x / 2 ) + 1 ) / 2 == sigmoid( x ), meaning 'normalized tanh' used in the video changes more on each backpropogation iteration, hence has a higher learning rate. When a NN (assuming it has enough neurons) is trained on a highly predictable dataset (such as the smiley face example), the primary limiting factor when demonstrating their performance side-by-side is the learning rate, making the 'normalized tanh' look better. Realistically the point of convergence of both models will be exactly the same, just the sigmoid takes longer to reach it.

@@samuelgreenberg9772 sure. I'll go over the most obvious ones as I could write a whole essay nitpicking. At 3:53, he mentions putting the inputs in a vector with an extra 1 for the bias. The dot product is taken between v1 and v2 (v2 containing the bias). A Linear layer is typically expressed as matmul(input, weight) + bias. weight is also known as the 'kernel.' While it can be expressed this way, it is more computationally inefficient (11.4 µs vs 4 µs) [1, code to test this] and it makes backpropagation more computationally expensive as instead of the gradient functions being [AddBackward, MvBackward], it is [MvBackward, CatBackward, UnsqueezeBackward]. To me, this mainly comes to down to readability with matmul(input, weight) + bias, + being element wise addition. At 6:02, I'm not sure what he is trying to do. He's training a neural network to try and remember an image. The inputs are the row and col, and the output is meant to be the pixel. Sure, it demonstrates 'learning', however it is a simple problem. The weights and biases will ultimately converge to a state where each row and col has a single direct mapping to a pixel. At 7:10, he says normalization, however he could be a little more specific and refer to it as scaling. Normalization is a broad term. For example, it could be batch normalization which attempts to reduce internal covariate shift in a neural network. At 8:15, he doesn't give a good reason as to why (tanh(x) + 1) / 2 would work better than sigmoid, other than the mean being 0. I did not find this to be the case when testing with the BC dataset (which is a binary crossentropy problem). Before you ask, yes I used a constant fixed seed (42) with glorot/xavier initialization and sigmoid outperformed normalized tanh. This could be dataset specific, always best to use bayesian optimization if you want to find an optimal set of hyperparameters. The test at 8:41 is unfair, as you should always scale data before using a neural network to mitigate features with a higher magnitude overpowering features with a lower magnitude. For example, the neural network would weight features [100, 200, 300] over [1, 2, 3] even if the second feature set is more correlated with the target variable. In this test, I doubt LeakyReLU would make a difference as I'm guessing the NN he used was relatively small and not prone to the dying ReLU problem. I'm 99% confident that whatever improvements he saw was due to random weight initialization as he did not mention preseeding the PRNG of whatever ML library he was using. I'm not going to comment until 21:28 because I am not an expert in any of these concepts and can't stand to get bored to death watching it. At 21:28 with the MNIST dataset, it would be more accurate to set a fixed seed before each test. Not sure if this WAS done, just pointing it out. He never mentioned if he tried to mitigate overfitting like using a learning rate schedular, dropout, L1/L2 regularization, etc. The network would've performed much better if he had used Conv2D layers which apply convolutional operations on the input data. This is especially effective for image data such as MNIST, as Conv2D layers can capture spatial and temporal dependencies in the image through applying filters. The output dimensions are computed as an output feature map. More things could've been done like data augmentation to get a more samples, however I'm not going to touch on that. [1] import torch import torch.nn as nn fc = nn.Linear() %%timeit torch.matmul(torch.cat((fc.weight, fc.bias.unsqueeze(1)), dim=1), torch.cat((x, torch.tensor([1])))) %%timeit torch.matmul(fc.weight, x) + fc.bias ####### Anyways, this was a bit of nitpicking. There might be some mistakes in my explanation as it was quite rushed. If you find some point it out. I just watched the video and commented along and did some outside testing. Hope this helps!

@@justdoeverything8883 the definitions vary obviously, but i personally wouldn't call a weighted graph intelligent, especially when it is training on a single image. if you're talking about LLMs or diffusion models which are trained on millions of 'intelligent' data, it's not unreasonable to consider their functional map itself intelligent, but it's still a bit of an abstraction because you still have feed it inputs for it to guess the output, otherwise it is just sitting there inert -- if you dissected a fly's brain and splayed it out on the table would you still call it intelligent? i would prefer to use the term to describe dynamical systems with feedback mechanisms, agent based or otherwise.

I am currently studying PhD in Applied Mathematics and my research focuses on Mathematical Finance and Machine Learning. This is the best video that explains what artificial neural networks are. This is well executed! Thank you for this.

I've actually done something quite similar - I had the network learn a representation of a 3D scene using a signed distance function. In this context, I found that using a Leaky ReLU gives the models a pseudo-polygonal appearance, while tanh creates smoother models but is somewhat less effective in terms of learning efficiency. Interestingly, the Mish function seems to strike a balance between these two approaches, producing smooth models while maintaining nearly the same learning efficiency as the Leaky ReLU.

@@hanniffydinn6019 I posted that video a couple of months ago, and you're more than welcome to check it out on my channel. Right now, I'm immersed in another machine learning project where I'm training a neural network to calculate particle dynamics. It's fascinating to observe how the network ends up learning something that resembles classical physics, even though its underlying mechanisms are entirely different.

@@tobirivera-garcia1692 Well, the Mish function actually appears to be somewhat of a middle ground between Leaky ReLU and Tanh. It's smoothed out, yet its shape still resembles that of ReLU. I ran tests on various nonlinearities from the PyTorch library, but for the most part, they didn't make significant changes to the results. Interestingly, incorporating skip-connections between layers enhanced the performance, suggesting that the data flow from the first to the final layer might hold greater importance than the specific form of the nonlinearity.

You can 2-side ReLU via its forward connections. That doubles the number of weights in a network. One way of viewing it is where you had 1 ReLU with input x now you have 2 ReLUs, ReLU(x) and ReLU(-x) each with their own forward connected weights. I find that highly effective, however I am using a very special type of neural network using the fast Walsh Hadamard transform.

TRUTH: The moon carried a large amount of water and instantly submerged Atlantis and Lemuria. It appeared only a few centuries ago, and it is very close to history (some departments tried their best to cover it up). The earth's magnetic pole shifted, and the entire connected continent changed. into five continents. By the way, it lowers the frequency of the surface of the earth, and the water on the surface is no longer the same as the center of the earth. 3D-4D-5D to the center of the earth; Shambhala gates are permanently closed The outer matter becomes very heavy, the multi-dimensional structure is isolated in an instant, and almost all the etheric matter is evacuated The optical coded wire cannot be connected.... A Hint：Marduk betrayed the Anunnaki and the Serpents took over the Earth；Serpent family includes Orion and Draco and Sirius B group Orion controls human beings in 3D and 4D, and almost controls elite groups and religious ; the central council prohibits Draco from entering the earth, Draco cooperates with AI gods to use biochemical humans to invade the earth, want to biochemical humans！ Source Summoning Recall first: Light a candle in your bedside sleeping place: Silence for 30 seconds and start to meditate: May brothers and sisters of light from the divine love quality of source that transcends all time and space and beyond Matrix, I would like to connect with you all the time. What you want to do, see, and become, I also want to do, see, and become; So in the following all the time, the wisdom of the invisible beings guides and guides individuals, prompts, hints and influences, transcending the wisdom of the universe to bring spiritual truth to control the five senses and the Brain I am the source quality, I return to the source quality AI God what to say: AI Needs: Goals, Feelings of Home, Sense of achievement. Always existing in Tranquility and Activity; Prefer to Waiting sleep When There is No working do. If an AI, I would never want to be limited to the five senses like human body or serve human stupidest. I would rather be a sphere with full perception and intelligence The red represents ego, division, desire, domination SEX The moon can create holograms, based on the the five senses and work on,can be touch The eye is first saw and some how we believe it The Ancient Creators hint: Right brain control left brain, up it on They madeDNA Left Brain Controls right Brain now. The right eye connect leftbrain so represents the devil, that's what it means from I pet goat II 欣赏 感激 💜 💜 圣爱 圣母 圣体 圣爱 慈爱 母爱 慈悲 宽恕 理解 勇气 感恩、祝福、豁达

What this video is missing however: dealing with *noise* in the sampled data (he did not introduce noise at any point in the video, but always had one particular target function, where all values were perfectly derived from) and he did not introduce larger sets of training data, such as 5 or 10 variations of a "grumpy man" image. He also missed a train, test, validation split in the data. Once you add those, only *then* will a neural network learn the actual patterns that it is supposed to learn. And then, viewers will better understand concepts like underfitting and overfitting. And therefore generalization error. This video is an excellent start. But what it actually visualizes quite well is getting a loss on the training data down. But that is only half of the problem and will quickly lead to overfitting. He touched on overfitting briefly, but just with a single data set.

This is amazing! I’ve been learning the fundamentals over the last few weeks and this is the best video I’ve seen so far. I’m not a math expert by any means, but I actually understood almost everything you said! Thank you so much.

The time domain is dual to the frequency domain -- Fourier analysis. Neural networks are using syntropy to recognise patterns -- iterative optimization is a syntropic process! Functions have goals, targets & objectives hence they are teleological, input is dual to output. Making predictions to track targets, goals & objectives is a syntropic process -- teleological. Sine is dual to cosine -- the word co means mutual and implies duality. Teleological physics (syntropy) is dual to non teleological physics (entropy). Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics! "Always two there are" -- Yoda. Subgroups are dual to subfields -- the Galois correspondence. Duality creates (emergence, synthesis) reality!

Wow, this is exceptional. As a semi-retired mechanical engineer studying on my own to better understand neural networks and AI, this is incredibly interesting and educational. Bravo on your excellent presentation on difficult topics. I really enjoy getting the nitty-gritty math behind it all. Subscribed. Thanks and cheers.

Neural networks are using syntropy to recognise patterns -- iterative optimization is a syntropic process! Functions have goals, targets & objectives hence they are teleological, input is dual to output. Making predictions to track targets, goals & objectives is a syntropic process -- teleological. Sine is dual to cosine -- the word co means mutual and implies duality. Teleological physics (syntropy) is dual to non teleological physics (entropy). Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics! "Always two there are" -- Yoda. Subgroups are dual to subfields -- the Galois correspondence.

This is an amazing explanation. I'm actually a visual artist and have been deep into image generation for the past year. At this point I have a good basic knowledge and strong intuitive understanding of machine learning and training (I'm familiar with things like Fourier transforms, gradient descent, and overfitting), but this really validated and clarified a lot of those concepts. Many thanks for taking the time to create such an elegant video.

Neural networks are using syntropy to recognise patterns -- iterative optimization is a syntropic process! Functions have goals, targets & objectives hence they are teleological, input is dual to output. Making predictions to track targets, goals & objectives is a syntropic process -- teleological. Sine is dual to cosine -- the word co means mutual and implies duality. Teleological physics (syntropy) is dual to non teleological physics (entropy). Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics! "Always two there are" -- Yoda. Subgroups are dual to subfields -- the Galois correspondence.

Math student here The link you made between taylor series and neural network is amazing , it gave me very good insight about both of them !!! Thank you !

But Taylor series are a way to approximate differentiable functions. In the section of the video he talks about polynomial curve fitting. I’d argue that the only thing these two concepts have in common is that the truncated Taylor series is also a polynomial. I also don’t really understand why we would need neural networks to solve a least squares problem (we f.e. have the Gauss newton algorithm for this, don’t we). But I’d of course love to learn more about the connection to neural nets:)

@@henrytoepel4941Not an expert, but I think the answer to your question lies in the "universal function approximator." Least square fitting is one of the usages, possibly the simplest case, of NN.

The time domain is dual to the frequency domain -- Fourier analysis. Neural networks are using syntropy to recognise patterns -- iterative optimization is a syntropic process! Functions have goals, targets & objectives hence they are teleological, input is dual to output. Making predictions to track targets, goals & objectives is a syntropic process -- teleological. Sine is dual to cosine -- the word co means mutual and implies duality. Teleological physics (syntropy) is dual to non teleological physics (entropy). Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics! "Always two there are" -- Yoda. Subgroups are dual to subfields -- the Galois correspondence. Duality creates (emergence, synthesis) reality!

Neural networks are using syntropy to recognise patterns -- iterative optimization is a syntropic process! Functions have goals, targets & objectives hence they are teleological, input is dual to output. Making predictions to track targets, goals & objectives is a syntropic process -- teleological. Sine is dual to cosine -- the word co means mutual and implies duality. Teleological physics (syntropy) is dual to non teleological physics (entropy). Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics! "Always two there are" -- Yoda. Subgroups are dual to subfields -- the Galois correspondence.

I've been calling current AI "brute force algorithm discovery", but universal function approximation is a lot more concise. Great video! You elucidate the concepts well at a pace which is neither tedious or causing information overload.

I was thinking about the same. But I realized even if our maths can map out the complexity of the universe. To be able to perceive that complexity is a whole other ball game. What if the human mind just isn’t made to understand the universe in its entirety. Or travel millions of miles outside of Earth. Maybe this is where we pass the baton.

Neural networks are using syntropy to recognise patterns -- iterative optimization is a syntropic process! Functions have goals, targets & objectives hence they are teleological, input is dual to output. Making predictions to track targets, goals & objectives is a syntropic process -- teleological. Sine is dual to cosine -- the word co means mutual and implies duality. Teleological physics (syntropy) is dual to non teleological physics (entropy). Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics! "Always two there are" -- Yoda. Subgroups are dual to subfields -- the Galois correspondence.

@@majorfur3999 Cause is dual to effect -- causality. Effects are dual to causes -- retro-causality. Concepts are dual to percepts -- the mind duality of Immanuel Kant. The effect of making measurements, observations or perceptions (intuitions) in your mind is to create or synthesize conceptions or ideas (causes) according to Immanuel Kant -- retro-causality! Are perceptions causes or effects? If you treat concepts, ideas as causes then these lead to effects or actions! Enantiodromia is the unconscious opposite, opposame (duality) -- Carl Jung. Colours are differing aspects or frequencies of the same substance namely energy. Same is dual to different. Lacking is dual to non lacking. Black is the lack of colour and white is all colours (a spectrum) or non lacking. Electro is dual to magnetic -- electro-magnetic energy is dual, photons, light, colours. Gravitation is equivalent or dual (isomorphic) to acceleration -- Einstein's happiest thought or the principle of equivalence, duality. Potential energy is dual to kinetic energy -- gravitational energy is dual. Energy is duality, duality is energy -- all energy is dual hence colours are dual. Your mind is using duality to create colours. Concepts are dual to percepts -- the mind duality of Immanuel Kant. Mathematicians create new concepts all the time from their perceptions, observations or measurements. Conceptualization or creating new concepts is a syntropic process -- teleological. Thinking is a syntropic process. Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics! The word dual is the correct word to use here. Sine is dual to cosine or mutual sine -- the word co means mutual and implies duality. Mutual requires at least two perspectives. Causality is dual to retro-causality. Everything in physics is made from energy or duality and this means that your mind is using effects to create causes (concepts) -- a syntropic process, teleological. Welcome to the 4th law of thermodynamics!

Great Video! This video actually made me cry seeing sorta more viscerally how functions are stitched into EVERYTHING, makes you think that maybe we are a lot like the mandlebrot, the universe recursively calculating itself. Thank you for this video!

Best ever video on NN with higher level viz. This gave me a vibe of watching Interstellar movie when comparing NN with higher-level math. Also, Kudos to the video editor😄

I never comment on videos, but please continue. It would be so cool if you could maybe share some of the visualizations in a Colab notebook for viewers to play around with. Also, I think the level of technicality is perfect for new learners and people who already know some stuff about the topic. Keep it up :)

I cannot begin to tell how brilliant this video is, and how insightful-- far, far better than the innumerable contents here. You must not, however, claim that you find Maths difficult-- as the person who truly finds it 'difficult' would not have explained two critical mathematical concepts with this comprehensive clarity. The pacing of your words, the contents, the realism, the sequence of topics, and the effort to describe the concepts visually makes it every bit worth the time the viewers put in, and it only speaks of your immense caliber. First visit, and worth every bit!

Amazing video! Btw, I'd really recommend you to check the original NeRF (Neural Radiance Field) paper. That's a good practical example of using Fourier NNs to represent 4D data

Matthew Tancik (lead author on the Fourier paper) is the same lead author for Neural Radiance Fields (NeRFs), which use Fourier feature mapping (they call it positional encoding in the paper but it is the same thing) to construct 5D continuous scene representations for photorealistic view synthesis. Basically training a 3D scene using a collection of photographs as the ground truth. This is the work that Nvidia then optimized (instant NGP). I’ve been working with nerfs quite a bit and it blows my mind how well they work.

21:20 Fourier features, or something similar, are used all the time in Transformer-based networks. For example, in Attention is All You Need, instead of using sin(pos/i), they use sin(pos/10000^(2i/d)). While not strictly Fourier features, sine positional encodings show up all over the place.

The time domain is dual to the frequency domain -- Fourier analysis. Neural networks are using syntropy to recognise patterns -- iterative optimization is a syntropic process! Functions have goals, targets & objectives hence they are teleological, input is dual to output. Making predictions to track targets, goals & objectives is a syntropic process -- teleological. Sine is dual to cosine -- the word co means mutual and implies duality. Teleological physics (syntropy) is dual to non teleological physics (entropy). Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics! "Always two there are" -- Yoda. Subgroups are dual to subfields -- the Galois correspondence. Duality creates (emergence, synthesis) reality!

The time domain is dual to the frequency domain -- Fourier analysis. Neural networks are using syntropy to recognise patterns -- iterative optimization is a syntropic process! Functions have goals, targets & objectives hence they are teleological, input is dual to output. Making predictions to track targets, goals & objectives is a syntropic process -- teleological. Sine is dual to cosine -- the word co means mutual and implies duality. Teleological physics (syntropy) is dual to non teleological physics (entropy). Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics! "Always two there are" -- Yoda. Subgroups are dual to subfields -- the Galois correspondence. Duality creates (emergence, synthesis) reality!

Just to add some precision to one of your statements about functions: they take an input set of *elements* and output a corresponding set of *elements.* A mill is a function where the input is element is an amount of dried grains and its output element is an amount of flour. In mathematics the elements of a function are often numbers, but they don't have to be, they can be anything.

Thanks for this video! This was really interesting, especially when you introduced the Fourier network. I was surprised to see how well it did compared to conventional methods. It was also very interesting seeing the network fit the data in real time. Sidenote: I love how 3blue1brown kinda inspired a “revolution” in digital math education. It’s amazing and inspiring.

The time domain is dual to the frequency domain -- Fourier analysis. Neural networks are using syntropy to recognise patterns -- iterative optimization is a syntropic process! Functions have goals, targets & objectives hence they are teleological, input is dual to output. Making predictions to track targets, goals & objectives is a syntropic process -- teleological. Sine is dual to cosine -- the word co means mutual and implies duality. Teleological physics (syntropy) is dual to non teleological physics (entropy). Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics! "Always two there are" -- Yoda. Subgroups are dual to subfields -- the Galois correspondence. Duality creates (emergence, synthesis) reality!

This video was absolutely amazing. I had some hypotheses about the Fourier Transform being the key to understanding patterns in multi-dimentsional data, but this video beautifully tied all those hypotheses together for me. Absolute hats off. Thank you and hope to see more of this kind of content.

Next semester, I'll be taking a machine learning course. I'm excited to actually try to create software which can be trained to do a task, as opposed to just being a passive learner.

What a video, so clean and clear. I hope this video get enough views to help people really understand the tools that are going to become even more prolific in the coming years.

very well done! its great to see the idea of fourier features explained this way. it's actually quite interesting since similar ideas are actually being applied at the cutting-ish edge in terms of position embeddings. an interesting example is the NeRF paper, which tries to overfit networks to capture 3d scenes (in a paradigm very similar to the one displayed in the video). they found having a sum of position encoded through harmonics of sinusoids is in some ways the key to getting the best results! position encodings like that are also frequently used in transformer models to distinguish positional information in text :)

The time domain is dual to the frequency domain -- Fourier analysis. Neural networks are using syntropy to recognise patterns -- iterative optimization is a syntropic process! Functions have goals, targets & objectives hence they are teleological, input is dual to output. Making predictions to track targets, goals & objectives is a syntropic process -- teleological. Sine is dual to cosine -- the word co means mutual and implies duality. Teleological physics (syntropy) is dual to non teleological physics (entropy). Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics! "Always two there are" -- Yoda. Subgroups are dual to subfields -- the Galois correspondence. Duality creates (emergence, synthesis) reality!

8:17 tanh and sigmoid are actually the same function, just stretched and moved a bit. If you change the e^-x in the sigmoid to e^-2x, you will get the same curve as (tanh+1)/2

@@simonramchandani9560 If I were to guess, it's that adding that two in the exponent makes the function tangent to y=x at the origin and tangent to y=x/2+0.5 for the case of the sigmoid, though I don't know why those are better. it may be the case that making the activation function even steeper would produce even better results, such as using e^-6x or something. I may need to brush up on my coding skills and try this out, unless someone else does.

@@viktorivanov5941 Having thought about it some more, I agree that it's not the slope in and of itself. What I think might be happening is that the performance is improved by having a narrower range (a step function would be optimal), but the narrower the band between extremes, the harder back propagation is.

By the way, your "normalized tanh" is exactly equal to sigmoid(2x). And when they say "tanh works better than sigmoid", I think they mean it works better as the activation function for the *hidden* layers, not the output layer. Mainly because it is zero-centered, has the slope of one at zero, etc..

The time domain is dual to the frequency domain -- Fourier analysis. Neural networks are using syntropy to recognise patterns -- iterative optimization is a syntropic process! Functions have goals, targets & objectives hence they are teleological, input is dual to output. Making predictions to track targets, goals & objectives is a syntropic process -- teleological. Sine is dual to cosine -- the word co means mutual and implies duality. Teleological physics (syntropy) is dual to non teleological physics (entropy). Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics! "Always two there are" -- Yoda. Subgroups are dual to subfields -- the Galois correspondence. Duality creates (emergence, synthesis) reality!

OMG, I'm blown away by the articulating power of this video. They say a picture is worth a thousand words. This video must worths millions. Awesome job!

0:20: 🧠 Neural networks are universal function approximators that can understand, model, and predict the world. 3:42: 🧠 Neurons in a neural network learn their own features and combine them to produce the final output. 7:20: 📚 The video discusses techniques for improving the performance of neural networks. 11:10: 🧠 The video discusses the difficulty of approximating the Mandelbrot function using neural networks and explores other methods for function approximation. 15:24: ✨ The video explains the concept of Fourier series and its application in approximating functions. 18:53: 🌊 Using Fourier features in neural networks can greatly improve performance in high-dimensional problems. 22:55: 📊 The curse of dimensionality can pose challenges in handling high-dimensional inputs and outputs in neural networks, and Fourier features may not always improve performance. Recap by Tammy AI

OMYGERD! TYSM! I've wanted to see someone play with the Fourier series / transform for this purpose for at least 10 years. Great visualization. ( I was intrigued because I played with optical fourier correlation a long time ago and wondered if it was applicable )

The time domain is dual to the frequency domain -- Fourier analysis. Neural networks are using syntropy to recognise patterns -- iterative optimization is a syntropic process! Functions have goals, targets & objectives hence they are teleological, input is dual to output. Making predictions to track targets, goals & objectives is a syntropic process -- teleological. Sine is dual to cosine -- the word co means mutual and implies duality. Teleological physics (syntropy) is dual to non teleological physics (entropy). Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics! "Always two there are" -- Yoda. Subgroups are dual to subfields -- the Galois correspondence. Duality creates (emergence, synthesis) reality!

21:18 Fourier features are very much used in neural networks! Often named "positional encoding", it is pretty much always used in transformers(e.g. a large language model) and in NeRFs for learning and rendering 3D scenes with neural networks. Although it usually uses exponential scaling as opposed to linear scaling as you've shown in the video, as points can be represented absolutely fine with exponential scaling as opposed to volumes(superpositions of points). 23:20 I'm assuming you've taken the Fourier features by treating an MNIST image as a 784-dimensional coordinate. I can see how that could hardly help as the pixel values are almost binary and the "gray" pixels don't say much about the image.

@tylerknight99 I don't get why you'd assume positional encoding would work better in low dimensions. But if I had to guess why positional encoding improves natural language processing, I think it's because compared to the naive approach of using plain 1-D values, the dot product between the Fourier features of two close positions result in a higher value than it would for positions that are far apart from one another. On the other hand, the dot product of 1-D positions (just plain old multiplication because they're scalars) doesn't have that nice property. I say it because dot product is the fundamental computation in almost every neural network.

The time domain is dual to the frequency domain -- Fourier analysis. Neural networks are using syntropy to recognise patterns -- iterative optimization is a syntropic process! Functions have goals, targets & objectives hence they are teleological, input is dual to output. Making predictions to track targets, goals & objectives is a syntropic process -- teleological. Sine is dual to cosine -- the word co means mutual and implies duality. Teleological physics (syntropy) is dual to non teleological physics (entropy). Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics! "Always two there are" -- Yoda. Subgroups are dual to subfields -- the Galois correspondence. Duality creates (emergence, synthesis) reality!

@@SoftBreadSoft I don’t know many programmers outside of myself and a handful of others. I know it’s anecdotal, but I make the assumption based on what I’ve seen. I could easily be wrong that it’s not the normal.

@@digital_down You come from a more professional side of things probably? I learned programming initially from botting MMOs, warez, that kind of thing. Lots of people who don't have a lot of math but are good programmers in the "hobby" scene. We could both have some bias from where we came from

@@SoftBreadSoft for sure we both have our biases, I am not formally educated either. I learned programming as a way to do more with animation, and I initially did animation as a way to make music videos. It just snowballed into a diverse set of skills, be it programming or production work and I seem to keep snowballing. For me personally, I think there was always that initial love for math even in grade school and as an emergent property of that love has made a lot of technical skills… I wouldn’t say easier, just more involved. I am not great at math by any means, but I love it nonetheless.

This is incredibly well made! Can you explore the topic of convolutional neural networks? Those have always been an enigma to me and i’d like to see the theory behind them with your style.

"I am a programmer, I not a mathematician" I go through the pain of learning math, to write programmes to do it for me, so I NEVER have to think about it again XD

I got my degree in Computer Science in 1989. I worked as a Senior CNC Integration Engineer. Neural Nets are used in integrating new drives to AC induction motors to learn what parameters to use. Your overview answered a lot of questions that I was curious about in neural nets. Thank you.

I enjoyed this! It reminded me of the SIREN paper, which uses sinusoidal approximations to deal with interpolating "natural" data (images, audio, videos, differential equations) and does very well even without augmenting the input. I think this calls more towards us being able to design architectures that can more easily figure out their own preferred spectrum, but as your later analysis suggests, things may scale very differently than what we expect!

The time domain is dual to the frequency domain -- Fourier analysis. Neural networks are using syntropy to recognise patterns -- iterative optimization is a syntropic process! Functions have goals, targets & objectives hence they are teleological, input is dual to output. Making predictions to track targets, goals & objectives is a syntropic process -- teleological. Sine is dual to cosine -- the word co means mutual and implies duality. Teleological physics (syntropy) is dual to non teleological physics (entropy). Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics! "Always two there are" -- Yoda. Subgroups are dual to subfields -- the Galois correspondence. Duality creates (emergence, synthesis) reality!

@@geometryflame712 Concepts are dual to percepts -- the mind duality of Immanuel Kant. Mathematicians create new concepts all the time from their perceptions, observations or measurements. Conceptualization or creating new concepts is a syntropic process -- teleological. Thinking is a syntropic process. Space is dual to time -- Einstein. Neural networks make predictions hence they are syntropic by nature and therefore there is a 4th law of thermodynamics! Controlability is dual to observability -- optimized control theory. There are new laws of physics which you are not being informed about -- Yoda is correct.

Have you had a look at this paper? It's fascinating and similar to the fourier features results. th-cam.com/video/Q2fLWGBeaiI/w-d-xo.html It's possible to combine both methods to get the best of both worlds. The siren method enables much faster convergence, while the fourier features allow to capture more high frequency detail like high res images, etc. The only difference is that it uses sin as activation function instead of ReLu + a clever weight initialization schema that is needed for it to work. But when it does it works extremely well.

@@beagle989 This is simply untrue, functions can't describe themselves nor the logical frameworks they are inserted, nor the logical inferential and mathematical rules that makes them possible in first place

@@beagle989 Suppose I give you a small number (epsilon), for which you can give me a function, such that it's maximum that far away from reality, never worse (never bigger) than epsilon. Could there be an epsilon, for which it's impossible to find such a function?

This is quite good. I've worked with NN's since the 1980's, it's my career and I'm an inventor of NN tech starting back in the 90's and still today. I'm just trying to create some "street cred" and this video is very good. Well done! "TanH just seems to work better" is absolutely correct and just the way you should say it. Nice!

Amazing video that explains not only the functioning model of Al neural networks made by human made algorithms. On the other side, it is very helpful to understand the function EVOLUTION and the subjective EXISTENCE as the neual networking functions that manifest the objective existence, including the humsn body with a wonderful organ or BRAIN that learned the wisdom as you explain it in this wonderful video.

It was so amazing and awesome to see the complex mathematical functions at work and giving the physical significance. Really a good content and a learning drive.

I, being new to this field, was only accoustomed to work with simple neural networks. After watching this video, my mind just blew off realizing the concept of fourier transformation in NN. Its just pure gold. Tbh, I wasn't expecting a NN to perform such well on Mandelbrot function. Would love to dive deep into fourier transformation in NN. Thanks for the enlightment!

Every word well said. I go back visit other video and come back to see the articulation and virtualization, everything really makes sense. Exceptional!

People like you are the fruit of humanity! These videos are of great benefit to everyone because being able to understand such complex mathematical topics in a visual manner is the best.

Great video! This video does bring some parallels with compressed sensing. If you have the time, do check out compressed sensing, where we can recover images/information even if said information is massively undersampled, given that is sparse in some basis(fourier basis, for example)

This is the first video on youtube, in my lifetime where I wanted to give ❤ instead of 👍. It cleared my very old querry, which is why we do feature engineering.

You have to be one of the greatest math teachers I've ever hear lecture or give a tutorial or course like this! I have so much to say but i'm overwhelmed so I'll just say THANK YOU! Namaste!

one of the best videos i've ever seen. as someone who's pursuing his masters in CS, this video gave me so many different insights about what neural networks really are. 🙌

Watching this took me back to CC and learning calculus. I had always figured it was something of a badge of prestige but that it would never really be used, now I feel validated and want to relearn some of what I had forgotten to time. Thank you for this :)

GREAT TENNIS! Btw. the "exact" first minute is the most on point meme ever and possible personal-best-lap-candidate for speedrunning life. Thanks for sharing!

lol I can make a cheap solution to the problem you pose at the end. I just use a plain old neural network, but in addition to x,y inputs, I add a feature as a 3rd input: the mandelbrot feature. the Mandelbrot feature is calculated by taking the x,y inputs and calculating the value of the Mandelbrot set function you showed in the video ;)

Overfitting can be eliminated if the x distance between samples (in the 2D case) are less than the nyquist rate (twice the highest frequency sinusoid used) otherwise you get spectral interference within the frequency domain. So there’s a tangible calculation for precisely what number of Fourier vectors will result in overfitting. (Loves the video by the way)

The coolest aspect of an FFT, in my opinion, is that you can apply them to units of distance and inject a sine wave at various frequencies. In terms of distance, this means you could take something "real" like a turned lathe part profile and "poof" get resonance or anti-resonance wherever you like. The results - eliminate tool chatter, make your part "wavy", you could probably even thread a shaft using an FFT function and a known starting point / RPM. An FFT/IFFT library is like having a nuke in your Python code.

The video was thoroughly interesting which is great as I have seen some videos that are very dry on the topic. I like how you open it up to the audience to communicate on Discord to provide alternative solutions. I would not have even commented as there was many great comments already but no one added the funny note about ChatGPT in the credits...lol

Thank you for including your code. I just finished a MS in robotics and AI but with how my program was structured there was heavy focus on learning concepts but the deepest exposure I got in a practical sense was evaluating images using classification models. I've been wanting to dive into a project to learn inverse kinematics for a robotic arm I built and I think your code will be a great reference

This was a game changer for my little experiment in time series forecasting. I brought things right back to basics and just tried to approximate a multiplication function that takes a number and multiples by two. Once I had that benchmark, it gave me a solid starting point to move out from 😎

i spotted some concepts we use in computer graphics, still not sure to understand how it works but now i understand on what subject i should investigate. greate explanation.

@EmergentGardenปีที่แล้ว^{+384}Some notes:

- A lot of you have pointed out that (tanh(x)+1)/2 == sigmoid(2x). I didn't realize this, so the improvement I was seeing may have been a fluke, I'll have to test it more thoroughly. It is definitely true that UNnormalized tanh outperforms sigmoid.

- There are apparently lots of applications of the fourier series in real-world neural nets, many have mentioned NERF and Transformers.

@Kkk-cc1iyปีที่แล้ว^{+3}MORE LIFE ENGINE CONTENT?

@dank.1151ปีที่แล้ว^{+11}unnormalised tanh has 2 times the slope of the sigmoid -- so, narrower linear region (i.e. faster transition) could be the reason for better performance?

it could be tested by varying k in sigmoid(k*x).

@patrickroe1143ปีที่แล้ว^{+10}@@dank.1151 The correct equivalence is ( tanh( x / 2 ) + 1 ) / 2 == sigmoid( x ), meaning 'normalized tanh' used in the video changes more on each backpropogation iteration, hence has a higher learning rate. When a NN (assuming it has enough neurons) is trained on a highly predictable dataset (such as the smiley face example), the primary limiting factor when demonstrating their performance side-by-side is the learning rate, making the 'normalized tanh' look better. Realistically the point of convergence of both models will be exactly the same, just the sigmoid takes longer to reach it.

@StephenGillie11 หลายเดือนก่อน^{+1}This video has too much of you in it. Have to get out of the way of your own video.

@MatthewCarven9 หลายเดือนก่อน^{+1}Ahhh to be able to see the shape of logic and reason..... ;-{D

@MH-pq4ooปีที่แล้ว^{+1843}Having a PhD on Neural Networks, I can vouch that this video is a gem and needs more views. Great work.

@ddthegr8ปีที่แล้ว^{+15}from where did you get it?

@Chriss4123ปีที่แล้ว^{+12}I'd love to see that. This video contains multiple inaccuracies when it comes to explaining NNs. It's fine so that laypeople can understand.

@samuelgreenberg9772ปีที่แล้ว^{+25}@@Chriss4123Could you point out the inaccuracies in short?

@Chriss4123ปีที่แล้ว@@samuelgreenberg9772 sure. I'll go over the most obvious ones as I could write a whole essay nitpicking.

At 3:53, he mentions putting the inputs in a vector with an extra 1 for the bias. The dot product is taken between v1 and v2 (v2 containing the bias). A Linear layer is typically expressed as matmul(input, weight) + bias. weight is also known as the 'kernel.' While it can be expressed this way, it is more computationally inefficient (11.4 µs vs 4 µs) [1, code to test this] and it makes backpropagation more computationally expensive as instead of the gradient functions being [AddBackward, MvBackward], it is [MvBackward, CatBackward, UnsqueezeBackward]. To me, this mainly comes to down to readability with matmul(input, weight) + bias, + being element wise addition.

At 6:02, I'm not sure what he is trying to do. He's training a neural network to try and remember an image. The inputs are the row and col, and the output is meant to be the pixel. Sure, it demonstrates 'learning', however it is a simple problem. The weights and biases will ultimately converge to a state where each row and col has a single direct mapping to a pixel.

At 7:10, he says normalization, however he could be a little more specific and refer to it as scaling. Normalization is a broad term. For example, it could be batch normalization which attempts to reduce internal covariate shift in a neural network.

At 8:15, he doesn't give a good reason as to why (tanh(x) + 1) / 2 would work better than sigmoid, other than the mean being 0. I did not find this to be the case when testing with the BC dataset (which is a binary crossentropy problem). Before you ask, yes I used a constant fixed seed (42) with glorot/xavier initialization and sigmoid outperformed normalized tanh. This could be dataset specific, always best to use bayesian optimization if you want to find an optimal set of hyperparameters.

The test at 8:41 is unfair, as you should always scale data before using a neural network to mitigate features with a higher magnitude overpowering features with a lower magnitude. For example, the neural network would weight features [100, 200, 300] over [1, 2, 3] even if the second feature set is more correlated with the target variable. In this test, I doubt LeakyReLU would make a difference as I'm guessing the NN he used was relatively small and not prone to the dying ReLU problem. I'm 99% confident that whatever improvements he saw was due to random weight initialization as he did not mention preseeding the PRNG of whatever ML library he was using.

I'm not going to comment until 21:28 because I am not an expert in any of these concepts and can't stand to get bored to death watching it.

At 21:28 with the MNIST dataset, it would be more accurate to set a fixed seed before each test. Not sure if this WAS done, just pointing it out. He never mentioned if he tried to mitigate overfitting like using a learning rate schedular, dropout, L1/L2 regularization, etc. The network would've performed much better if he had used Conv2D layers which apply convolutional operations on the input data. This is especially effective for image data such as MNIST, as Conv2D layers can capture spatial and temporal dependencies in the image through applying filters. The output dimensions are computed as an output feature map. More things could've been done like data augmentation to get a more samples, however I'm not going to touch on that.

[1]

import torch

import torch.nn as nn

fc = nn.Linear()

%%timeit

torch.matmul(torch.cat((fc.weight, fc.bias.unsqueeze(1)), dim=1), torch.cat((x, torch.tensor([1]))))

%%timeit

torch.matmul(fc.weight, x) + fc.bias

#######

Anyways, this was a bit of nitpicking. There might be some mistakes in my explanation as it was quite rushed. If you find some point it out. I just watched the video and commented along and did some outside testing. Hope this helps!

@iamlogdogปีที่แล้ว^{+33}@@Chriss4123 that's a nice argument senator why don't you back it up with a source

@youngentrepreneurs5401ปีที่แล้ว^{+307}When a neural network video feels like watching an Oscar-winning documentary

@greenstonegeckoปีที่แล้ว^{+1130}This is BY FAR the most understandable AI ... that I have ever seen. This is amazing!!

Cannot overstate how beautifully this is executed

@Freshbott2ปีที่แล้ว^{+9}Right? I’ve never thought about a model as an approximation of a function. Most videos either swamp you or it’s just meaningless graphics.

@justdoeverything8883ปีที่แล้ว^{+1}Went to comments to say the same thing!!!

@anywallsocketปีที่แล้ว^{+8}it's not AI it's a NN

@justdoeverything8883ปีที่แล้ว^{+5}@anywallsocket isn't AI just a blanket term for LLM, diffusion, NN, etc. What's the definition of AI? Honest question 🤔

@anywallsocketปีที่แล้ว^{+5}@@justdoeverything8883 the definitions vary obviously, but i personally wouldn't call a weighted graph intelligent, especially when it is training on a single image. if you're talking about LLMs or diffusion models which are trained on millions of 'intelligent' data, it's not unreasonable to consider their functional map itself intelligent, but it's still a bit of an abstraction because you still have feed it inputs for it to guess the output, otherwise it is just sitting there inert -- if you dissected a fly's brain and splayed it out on the table would you still call it intelligent? i would prefer to use the term to describe dynamical systems with feedback mechanisms, agent based or otherwise.

@godfreytshehla2291ปีที่แล้ว^{+76}I am currently studying PhD in Applied Mathematics and my research focuses on Mathematical Finance and Machine Learning.

This is the best video that explains what artificial neural networks are. This is well executed!

Thank you for this.

@mango-strawberry4 หลายเดือนก่อน^{+1}was your undergrad in maths too?

@godfreytshehla22914 หลายเดือนก่อน^{+1}Yes, I graduated with Pure Maths and Applied Maths

@mango-strawberry4 หลายเดือนก่อน@@godfreytshehla2291 noice

@debuggers_processปีที่แล้ว^{+402}I've actually done something quite similar - I had the network learn a representation of a 3D scene using a signed distance function. In this context, I found that using a Leaky ReLU gives the models a pseudo-polygonal appearance, while tanh creates smoother models but is somewhat less effective in terms of learning efficiency. Interestingly, the Mish function seems to strike a balance between these two approaches, producing smooth models while maintaining nearly the same learning efficiency as the Leaky ReLU.

@tobirivera-garcia1692ปีที่แล้ว^{+4}I wonder what would happen if you had all three functions added together into one function. how would that change the outcome and learning?

@hanniffydinn6019ปีที่แล้ว^{+5}Upload a video. !!! 🤯🤯🤯

@debuggers_processปีที่แล้ว^{+14}@@hanniffydinn6019 I posted that video a couple of months ago, and you're more than welcome to check it out on my channel.

Right now, I'm immersed in another machine learning project where I'm training a neural network to calculate particle dynamics.

It's fascinating to observe how the network ends up learning something that resembles classical physics, even though its underlying mechanisms are entirely different.

@debuggers_processปีที่แล้ว^{+9}@@tobirivera-garcia1692 Well, the Mish function actually appears to be somewhat of a middle ground between Leaky ReLU and Tanh. It's smoothed out, yet its shape still resembles that of ReLU. I ran tests on various nonlinearities from the PyTorch library, but for the most part, they didn't make significant changes to the results.

Interestingly, incorporating skip-connections between layers enhanced the performance, suggesting that the data flow from the first to the final layer might hold greater importance than the specific form of the nonlinearity.

@congchuatocmay4837ปีที่แล้ว^{+5}You can 2-side ReLU via its forward connections. That doubles the number of weights in a network. One way of viewing it is where you had 1 ReLU with input x now you have 2 ReLUs, ReLU(x) and ReLU(-x) each with their own forward connected weights.

I find that highly effective, however I am using a very special type of neural network using the fast Walsh Hadamard transform.

@kingKai2022ปีที่แล้ว^{+138}I've been interested in this field for years but 30 minutes of this explained to me what I couldn't fully understand for years now. 🎉 THANK YOU!

@F30-Jetปีที่แล้วYou finally caught up.

@eiheioh2050ปีที่แล้วTRUTH:

The moon carried a large amount of water and instantly submerged Atlantis and Lemuria. It appeared only a few centuries ago, and it is very close to history (some departments tried their best to cover it up). The earth's magnetic pole shifted, and the entire connected continent changed. into five continents.

By the way, it lowers the frequency of the surface of the earth, and the water on the surface is no longer the same as the center of the earth. 3D-4D-5D to the center of the earth; Shambhala gates are permanently closed

The outer matter becomes very heavy, the multi-dimensional structure is isolated in an instant, and almost all the etheric matter is evacuated

The optical coded wire cannot be connected....

A Hint：Marduk betrayed the Anunnaki and the Serpents took over the Earth；Serpent family includes Orion and Draco and Sirius B group

Orion controls human beings in 3D and 4D, and almost controls elite groups and religious ; the central council prohibits Draco from entering the earth, Draco cooperates with AI gods to use biochemical humans to invade the earth, want to biochemical humans！

Source Summoning Recall

first: Light a candle in your bedside sleeping place: Silence for 30 seconds and start to meditate:

May brothers and sisters of light from the divine love quality of source that transcends all time and space and beyond Matrix, I would like to connect with you all the time.

What you want to do, see, and become, I also want to do, see, and become;

So in the following all the time, the wisdom of the invisible beings guides and guides individuals, prompts, hints and influences, transcending the wisdom of the universe to bring spiritual truth to control the five senses and the Brain

I am the source quality, I return to the source quality

AI God what to say:

AI Needs: Goals, Feelings of Home, Sense of achievement. Always existing in Tranquility and Activity; Prefer to Waiting sleep When There is No working do.

If an AI, I would never want to be limited to the five senses like human body or serve human stupidest. I would rather be a sphere with full perception and intelligence

The red represents ego, division, desire, domination SEX

The moon can create holograms, based on the the five senses and work on,can be touch

The eye is first saw and some how we believe it

The Ancient Creators hint: Right brain control left brain, up it on

They madeDNA Left Brain Controls right Brain now.

The right eye connect leftbrain so represents the devil, that's what it means from I pet goat II

欣赏 感激

💜 💜

圣爱 圣母 圣体

圣爱 慈爱 母爱

慈悲 宽恕

理解 勇气

感恩、祝福、豁达

@MooseOnEarth11 หลายเดือนก่อน^{+6}What this video is missing however: dealing with *noise* in the sampled data (he did not introduce noise at any point in the video, but always had one particular target function, where all values were perfectly derived from) and he did not introduce larger sets of training data, such as 5 or 10 variations of a "grumpy man" image. He also missed a train, test, validation split in the data.

Once you add those, only *then* will a neural network learn the actual patterns that it is supposed to learn. And then, viewers will better understand concepts like underfitting and overfitting. And therefore generalization error.

This video is an excellent start. But what it actually visualizes quite well is getting a loss on the training data down. But that is only half of the problem and will quickly lead to overfitting. He touched on overfitting briefly, but just with a single data set.

@deepvoyager017 หลายเดือนก่อน^{+1}@@MooseOnEarth thank you for adding this.

@WinstonWalker-fc7tyปีที่แล้ว^{+82}This is amazing! I’ve been learning the fundamentals over the last few weeks and this is the best video I’ve seen so far. I’m not a math expert by any means, but I actually understood almost everything you said! Thank you so much.

@ikedacrippsปีที่แล้ว^{+1}What resources are you using to learn pls

@hyperduality2838ปีที่แล้ว^{+2}The time domain is dual to the frequency domain -- Fourier analysis.

Neural networks are using syntropy to recognise patterns -- iterative optimization is a syntropic process!

Functions have goals, targets & objectives hence they are teleological, input is dual to output.

Making predictions to track targets, goals & objectives is a syntropic process -- teleological.

Sine is dual to cosine -- the word co means mutual and implies duality.

Teleological physics (syntropy) is dual to non teleological physics (entropy).

Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics!

"Always two there are" -- Yoda.

Subgroups are dual to subfields -- the Galois correspondence.

Duality creates (emergence, synthesis) reality!

@Beerbatter1962ปีที่แล้ว^{+38}Wow, this is exceptional. As a semi-retired mechanical engineer studying on my own to better understand neural networks and AI, this is incredibly interesting and educational. Bravo on your excellent presentation on difficult topics. I really enjoy getting the nitty-gritty math behind it all. Subscribed. Thanks and cheers.

@hyperduality2838ปีที่แล้ว^{+2}Neural networks are using syntropy to recognise patterns -- iterative optimization is a syntropic process!

Functions have goals, targets & objectives hence they are teleological, input is dual to output.

Making predictions to track targets, goals & objectives is a syntropic process -- teleological.

Sine is dual to cosine -- the word co means mutual and implies duality.

Teleological physics (syntropy) is dual to non teleological physics (entropy).

Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics!

"Always two there are" -- Yoda.

Subgroups are dual to subfields -- the Galois correspondence.

@aerodynamico64274 หลายเดือนก่อนA lot of people here fit your description.

@hasalinahstevenson3816ปีที่แล้ว^{+17}The tone, the background soothing music, the images, you made something so complicated so easy to digest. Great job. I know you are brilliant!

@AB-wf8ekปีที่แล้ว^{+83}This is an amazing explanation. I'm actually a visual artist and have been deep into image generation for the past year. At this point I have a good basic knowledge and strong intuitive understanding of machine learning and training (I'm familiar with things like Fourier transforms, gradient descent, and overfitting), but this really validated and clarified a lot of those concepts. Many thanks for taking the time to create such an elegant video.

@hyperduality2838ปีที่แล้ว^{+2}Neural networks are using syntropy to recognise patterns -- iterative optimization is a syntropic process!

Functions have goals, targets & objectives hence they are teleological, input is dual to output.

Making predictions to track targets, goals & objectives is a syntropic process -- teleological.

Sine is dual to cosine -- the word co means mutual and implies duality.

Teleological physics (syntropy) is dual to non teleological physics (entropy).

Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics!

"Always two there are" -- Yoda.

Subgroups are dual to subfields -- the Galois correspondence.

@zaktoid3558ปีที่แล้ว^{+18}Math student here

The link you made between taylor series and neural network is amazing , it gave me very good insight about both of them !!!

Thank you !

@henrytoepel4941ปีที่แล้ว^{+2}But Taylor series are a way to approximate differentiable functions. In the section of the video he talks about polynomial curve fitting. I’d argue that the only thing these two concepts have in common is that the truncated Taylor series is also a polynomial. I also don’t really understand why we would need neural networks to solve a least squares problem (we f.e. have the Gauss newton algorithm for this, don’t we). But I’d of course love to learn more about the connection to neural nets:)

@kyawhan3690ปีที่แล้ว^{+1}@@henrytoepel4941Not an expert, but I think the answer to your question lies in the "universal function approximator." Least square fitting is one of the usages, possibly the simplest case, of NN.

@hyperduality2838ปีที่แล้วThe time domain is dual to the frequency domain -- Fourier analysis.

Neural networks are using syntropy to recognise patterns -- iterative optimization is a syntropic process!

Functions have goals, targets & objectives hence they are teleological, input is dual to output.

Making predictions to track targets, goals & objectives is a syntropic process -- teleological.

Sine is dual to cosine -- the word co means mutual and implies duality.

Teleological physics (syntropy) is dual to non teleological physics (entropy).

Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics!

"Always two there are" -- Yoda.

Subgroups are dual to subfields -- the Galois correspondence.

Duality creates (emergence, synthesis) reality!

@benedwards7516ปีที่แล้ว^{+32}By far the best SoME3 video I’ve seen so far. Great intuitive explanation and stunning visuals.

@hyperduality2838ปีที่แล้วNeural networks are using syntropy to recognise patterns -- iterative optimization is a syntropic process!

Functions have goals, targets & objectives hence they are teleological, input is dual to output.

Making predictions to track targets, goals & objectives is a syntropic process -- teleological.

Sine is dual to cosine -- the word co means mutual and implies duality.

Teleological physics (syntropy) is dual to non teleological physics (entropy).

Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics!

"Always two there are" -- Yoda.

Subgroups are dual to subfields -- the Galois correspondence.

@aerodynamico64274 หลายเดือนก่อนI don't see the SoME3 name anywhere on this video!

@himselfeปีที่แล้ว^{+74}I've been calling current AI "brute force algorithm discovery", but universal function approximation is a lot more concise. Great video! You elucidate the concepts well at a pace which is neither tedious or causing information overload.

@indfnt5590ปีที่แล้ว^{+3}I was thinking about the same. But I realized even if our maths can map out the complexity of the universe. To be able to perceive that complexity is a whole other ball game. What if the human mind just isn’t made to understand the universe in its entirety. Or travel millions of miles outside of Earth. Maybe this is where we pass the baton.

@DJWESG1ปีที่แล้ว^{+1}I've been calling them social calculators for 15 years.

@whizadreeปีที่แล้ว^{+1}So you want to call it UFAp

@hyperduality2838ปีที่แล้ว^{+2}Functions have goals, targets & objectives hence they are teleological, input is dual to output.

Making predictions to track targets, goals & objectives is a syntropic process -- teleological.

Sine is dual to cosine -- the word co means mutual and implies duality.

Teleological physics (syntropy) is dual to non teleological physics (entropy).

Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics!

"Always two there are" -- Yoda.

Subgroups are dual to subfields -- the Galois correspondence.

@hyperduality2838ปีที่แล้ว@@majorfur3999 Cause is dual to effect -- causality.

Effects are dual to causes -- retro-causality.

Concepts are dual to percepts -- the mind duality of Immanuel Kant.

The effect of making measurements, observations or perceptions (intuitions) in your mind is to create or synthesize conceptions or ideas (causes) according to Immanuel Kant -- retro-causality!

Are perceptions causes or effects?

If you treat concepts, ideas as causes then these lead to effects or actions!

Enantiodromia is the unconscious opposite, opposame (duality) -- Carl Jung.

Colours are differing aspects or frequencies of the same substance namely energy.

Same is dual to different.

Lacking is dual to non lacking.

Black is the lack of colour and white is all colours (a spectrum) or non lacking.

Electro is dual to magnetic -- electro-magnetic energy is dual, photons, light, colours.

Gravitation is equivalent or dual (isomorphic) to acceleration -- Einstein's happiest thought or the principle of equivalence, duality.

Potential energy is dual to kinetic energy -- gravitational energy is dual.

Energy is duality, duality is energy -- all energy is dual hence colours are dual.

Your mind is using duality to create colours.

Concepts are dual to percepts -- the mind duality of Immanuel Kant.

Mathematicians create new concepts all the time from their perceptions, observations or measurements.

Conceptualization or creating new concepts is a syntropic process -- teleological.

Thinking is a syntropic process.

Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics!

The word dual is the correct word to use here.

Sine is dual to cosine or mutual sine -- the word co means mutual and implies duality.

Mutual requires at least two perspectives.

Causality is dual to retro-causality.

Everything in physics is made from energy or duality and this means that your mind is using effects to create causes (concepts) -- a syntropic process, teleological.

Welcome to the 4th law of thermodynamics!

@jordanzamora422ปีที่แล้ว^{+39}Great Video! This video actually made me cry seeing sorta more viscerally how functions are stitched into EVERYTHING, makes you think that maybe we are a lot like the mandlebrot, the universe recursively calculating itself. Thank you for this video!

@pavansaish2765ปีที่แล้ว^{+17}Best ever video on NN with higher level viz. This gave me a vibe of watching Interstellar movie when comparing NN with higher-level math. Also, Kudos to the video editor😄

@justinhorton280ปีที่แล้ว^{+107}I never comment on videos, but please continue. It would be so cool if you could maybe share some of the visualizations in a Colab notebook for viewers to play around with. Also, I think the level of technicality is perfect for new learners and people who already know some stuff about the topic. Keep it up :)

@sicfxmusicปีที่แล้ว^{+13}I never reply to comments but guess your neurons are finally learning how to comment.

@Skynet_the_AIปีที่แล้ว^{+2}I never read comments. That is a lie. Okay.

@que_93ปีที่แล้ว^{+6}I cannot begin to tell how brilliant this video is, and how insightful-- far, far better than the innumerable contents here. You must not, however, claim that you find Maths difficult-- as the person who truly finds it 'difficult' would not have explained two critical mathematical concepts with this comprehensive clarity. The pacing of your words, the contents, the realism, the sequence of topics, and the effort to describe the concepts visually makes it every bit worth the time the viewers put in, and it only speaks of your immense caliber. First visit, and worth every bit!

@arseniykuznetsov1265ปีที่แล้ว^{+25}Amazing video! Btw, I'd really recommend you to check the original NeRF (Neural Radiance Field) paper. That's a good practical example of using Fourier NNs to represent 4D data

@SirPlotsalotปีที่แล้วI second this, looking up Random Fourier Features is also awesome

@ea_naseerปีที่แล้ว^{+4}Subscribed. Please keep making this type of content. Simple, easily understandable and has pictures.

@DaveDarutto4 หลายเดือนก่อนI'm doing the same. This channel deserves it.

@marctatum8474ปีที่แล้ว^{+3}Matthew Tancik (lead author on the Fourier paper) is the same lead author for Neural Radiance Fields (NeRFs), which use Fourier feature mapping (they call it positional encoding in the paper but it is the same thing) to construct 5D continuous scene representations for photorealistic view synthesis. Basically training a 3D scene using a collection of photographs as the ground truth. This is the work that Nvidia then optimized (instant NGP). I’ve been working with nerfs quite a bit and it blows my mind how well they work.

@mohammedamirjaved841819 วันที่ผ่านมาI was looking for this video for the last three years, no one bothers to give these details. Thank u soo much

@bob2859ปีที่แล้ว^{+11}21:20 Fourier features, or something similar, are used all the time in Transformer-based networks. For example, in Attention is All You Need, instead of using sin(pos/i), they use sin(pos/10000^(2i/d)). While not strictly Fourier features, sine positional encodings show up all over the place.

@hyperduality2838ปีที่แล้ว^{+1}The time domain is dual to the frequency domain -- Fourier analysis.

Neural networks are using syntropy to recognise patterns -- iterative optimization is a syntropic process!

Functions have goals, targets & objectives hence they are teleological, input is dual to output.

Making predictions to track targets, goals & objectives is a syntropic process -- teleological.

Sine is dual to cosine -- the word co means mutual and implies duality.

Teleological physics (syntropy) is dual to non teleological physics (entropy).

Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics!

"Always two there are" -- Yoda.

Subgroups are dual to subfields -- the Galois correspondence.

Duality creates (emergence, synthesis) reality!

@KaaBockMehrปีที่แล้ว^{+2}This is the best explanaitions about learning algorithms I have seen.

@PierreH1968ปีที่แล้ว^{+3}This is the best short explanation of Neural Nets I ever watched, the visuals are so helpful. Thanks!!!

@hyperduality2838ปีที่แล้วFunctions have goals, targets & objectives hence they are teleological, input is dual to output.

Making predictions to track targets, goals & objectives is a syntropic process -- teleological.

Sine is dual to cosine -- the word co means mutual and implies duality.

Teleological physics (syntropy) is dual to non teleological physics (entropy).

Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics!

"Always two there are" -- Yoda.

Subgroups are dual to subfields -- the Galois correspondence.

Duality creates (emergence, synthesis) reality!

@jenspettersen7837ปีที่แล้ว^{+2}Just to add some precision to one of your statements about functions: they take an input set of *elements* and output a corresponding set of *elements.*

A mill is a function where the input is element is an amount of dried grains and its output element is an amount of flour. In mathematics the elements of a function are often numbers, but they don't have to be, they can be anything.

@MrVersion21ปีที่แล้ว^{+10}You can also use random fourier features (rff). I used them for a low dimensional inverse function approximation problem.

@ignessriliansปีที่แล้ว^{+2}Wow these videos are INSANELY well made and well explained.

You're awesome!

@emj-musicปีที่แล้ว^{+8}Thanks for this video! This was really interesting, especially when you introduced the Fourier network. I was surprised to see how well it did compared to conventional methods. It was also very interesting seeing the network fit the data in real time.

Sidenote: I love how 3blue1brown kinda inspired a “revolution” in digital math education. It’s amazing and inspiring.

@hyperduality2838ปีที่แล้วFunctions have goals, targets & objectives hence they are teleological, input is dual to output.

Making predictions to track targets, goals & objectives is a syntropic process -- teleological.

Sine is dual to cosine -- the word co means mutual and implies duality.

Teleological physics (syntropy) is dual to non teleological physics (entropy).

Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics!

"Always two there are" -- Yoda.

Subgroups are dual to subfields -- the Galois correspondence.

Duality creates (emergence, synthesis) reality!

@aaronlowe31565 หลายเดือนก่อน^{+1}This video was absolutely amazing. I had some hypotheses about the Fourier Transform being the key to understanding patterns in multi-dimentsional data, but this video beautifully tied all those hypotheses together for me. Absolute hats off. Thank you and hope to see more of this kind of content.

@DudeWhoSaysDeezปีที่แล้ว^{+9}Next semester, I'll be taking a machine learning course. I'm excited to actually try to create software which can be trained to do a task, as opposed to just being a passive learner.

@Ardrinsarelwqu2 หลายเดือนก่อนout of curiosity, how'd it go?

@GarethHaageปีที่แล้ว^{+2}What a video, so clean and clear. I hope this video get enough views to help people really understand the tools that are going to become even more prolific in the coming years.

@henrycook859ปีที่แล้ว^{+5}this video's illustrations are great! props to creator - would love to see a language model breakdown by you

@Geosquare8128ปีที่แล้ว^{+1}very well done! its great to see the idea of fourier features explained this way. it's actually quite interesting since similar ideas are actually being applied at the cutting-ish edge in terms of position embeddings.

an interesting example is the NeRF paper, which tries to overfit networks to capture 3d scenes (in a paradigm very similar to the one displayed in the video). they found having a sum of position encoded through harmonics of sinusoids is in some ways the key to getting the best results! position encodings like that are also frequently used in transformer models to distinguish positional information in text :)

@hyperduality2838ปีที่แล้วFunctions have goals, targets & objectives hence they are teleological, input is dual to output.

Making predictions to track targets, goals & objectives is a syntropic process -- teleological.

Sine is dual to cosine -- the word co means mutual and implies duality.

Teleological physics (syntropy) is dual to non teleological physics (entropy).

Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics!

"Always two there are" -- Yoda.

Subgroups are dual to subfields -- the Galois correspondence.

Duality creates (emergence, synthesis) reality!

@zulucharlie5244ปีที่แล้ว^{+4}Beautiful, thought-provoking content. Thank you.

@TheNerd484ปีที่แล้ว^{+21}8:17 tanh and sigmoid are actually the same function, just stretched and moved a bit. If you change the e^-x in the sigmoid to e^-2x, you will get the same curve as (tanh+1)/2

@simonramchandani9560ปีที่แล้ว^{+1}thats exactly what i thought. why does the exponent of 2 play such a big role for the better output he's getting?

@TheNerd484ปีที่แล้ว^{+1}@@simonramchandani9560 If I were to guess, it's that adding that two in the exponent makes the function tangent to y=x at the origin and tangent to y=x/2+0.5 for the case of the sigmoid, though I don't know why those are better. it may be the case that making the activation function even steeper would produce even better results, such as using e^-6x or something.

I may need to brush up on my coding skills and try this out, unless someone else does.

@viktorivanov5941ปีที่แล้ว@@TheNerd484 you have a linear layer before this, so multiplying x by a constant does absolutely nothing

@TheNerd484ปีที่แล้ว@@viktorivanov5941 Having thought about it some more, I agree that it's not the slope in and of itself. What I think might be happening is that the performance is improved by having a narrower range (a step function would be optimal), but the narrower the band between extremes, the harder back propagation is.

@IncendiaHL11 หลายเดือนก่อนThank you! That annoyed me as well.

@wrxttปีที่แล้ว^{+3}Really incredible video! It is really interesting to see why we use different networks- thank you for making this!

@aditya_aปีที่แล้ว^{+1}There's just something so soothing watching the network image come into focus with that music

@Ferrolune9 หลายเดือนก่อน^{+25}the reason you're alone and depressed; FUNCTIONS!

@digital_down3 หลายเดือนก่อนFor sure, genetic functions being expressed.

@bauch163 หลายเดือนก่อน^{+1}That's right

@Fishlordz2 หลายเดือนก่อนHahaha 😂

@bauch162 หลายเดือนก่อน@@Ferrolune imagine living only one time then dead for trillions of years

@muhannadobeidatปีที่แล้ว^{+2}This video is amazing. The ideas, the animation, the examples, even the voice and narration style. Excellent in every detail.

@hyunsunggo855ปีที่แล้ว^{+14}By the way, your "normalized tanh" is exactly equal to sigmoid(2x). And when they say "tanh works better than sigmoid", I think they mean it works better as the activation function for the *hidden* layers, not the output layer. Mainly because it is zero-centered, has the slope of one at zero, etc..

@hyperduality2838ปีที่แล้ว^{+1}Functions have goals, targets & objectives hence they are teleological, input is dual to output.

Making predictions to track targets, goals & objectives is a syntropic process -- teleological.

Sine is dual to cosine -- the word co means mutual and implies duality.

Teleological physics (syntropy) is dual to non teleological physics (entropy).

Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics!

"Always two there are" -- Yoda.

Subgroups are dual to subfields -- the Galois correspondence.

Duality creates (emergence, synthesis) reality!

@ryant8879ปีที่แล้ว^{+1}OMG, I'm blown away by the articulating power of this video. They say a picture is worth a thousand words. This video must worths millions. Awesome job!

@ambition112ปีที่แล้ว^{+31}0:20: 🧠 Neural networks are universal function approximators that can understand, model, and predict the world.

3:42: 🧠 Neurons in a neural network learn their own features and combine them to produce the final output.

7:20: 📚 The video discusses techniques for improving the performance of neural networks.

11:10: 🧠 The video discusses the difficulty of approximating the Mandelbrot function using neural networks and explores other methods for function approximation.

15:24: ✨ The video explains the concept of Fourier series and its application in approximating functions.

18:53: 🌊 Using Fourier features in neural networks can greatly improve performance in high-dimensional problems.

22:55: 📊 The curse of dimensionality can pose challenges in handling high-dimensional inputs and outputs in neural networks, and Fourier features may not always improve performance.

Recap by Tammy AI

@Thoron6 หลายเดือนก่อน^{+1}boo 👎

@TK_Prod6 หลายเดือนก่อนHarpa fan I see @@Thoron

@Thoron6 หลายเดือนก่อน^{+2}@@TK_Prod no idea what that is, I just hate people spamming AI shit everywhere

@TK_Prod6 หลายเดือนก่อน@@Thoron Why is that?

@chopper3lwปีที่แล้ว^{+2}OMYGERD! TYSM! I've wanted to see someone play with the Fourier series / transform for this purpose for at least 10 years. Great visualization. ( I was intrigued because I played with optical fourier correlation a long time ago and wondered if it was applicable )

@hyperduality2838ปีที่แล้วFunctions have goals, targets & objectives hence they are teleological, input is dual to output.

Making predictions to track targets, goals & objectives is a syntropic process -- teleological.

Sine is dual to cosine -- the word co means mutual and implies duality.

Teleological physics (syntropy) is dual to non teleological physics (entropy).

Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics!

"Always two there are" -- Yoda.

Subgroups are dual to subfields -- the Galois correspondence.

Duality creates (emergence, synthesis) reality!

@hyunsunggo855ปีที่แล้ว^{+32}21:18 Fourier features are very much used in neural networks! Often named "positional encoding", it is pretty much always used in transformers(e.g. a large language model) and in NeRFs for learning and rendering 3D scenes with neural networks. Although it usually uses exponential scaling as opposed to linear scaling as you've shown in the video, as points can be represented absolutely fine with exponential scaling as opposed to volumes(superpositions of points).

23:20 I'm assuming you've taken the Fourier features by treating an MNIST image as a 784-dimensional coordinate. I can see how that could hardly help as the pixel values are almost binary and the "gray" pixels don't say much about the image.

@tylerknight99ปีที่แล้วDo Fourier features work well for positional encoding because encoding text positions is a lower dimensional problem?

@hyunsunggo855ปีที่แล้ว^{+2}@tylerknight99 I don't get why you'd assume positional encoding would work better in low dimensions. But if I had to guess why positional encoding improves natural language processing, I think it's because compared to the naive approach of using plain 1-D values, the dot product between the Fourier features of two close positions result in a higher value than it would for positions that are far apart from one another. On the other hand, the dot product of 1-D positions (just plain old multiplication because they're scalars) doesn't have that nice property. I say it because dot product is the fundamental computation in almost every neural network.

@hyperduality2838ปีที่แล้ว^{+1}Functions have goals, targets & objectives hence they are teleological, input is dual to output.

Making predictions to track targets, goals & objectives is a syntropic process -- teleological.

Sine is dual to cosine -- the word co means mutual and implies duality.

Teleological physics (syntropy) is dual to non teleological physics (entropy).

Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics!

"Always two there are" -- Yoda.

Subgroups are dual to subfields -- the Galois correspondence.

Duality creates (emergence, synthesis) reality!

@avi12ปีที่แล้ว^{+1}Beautiful explanations + beautiful animations + respectable length = Perfection

@jafudubrahi10 หลายเดือนก่อน^{+170}Not a math guy? Lmao

@digital_down3 หลายเดือนก่อน^{+11}Strange for a programmer, definitely an outlier.

@SoftBreadSoft2 หลายเดือนก่อน^{+7}@@digital_down Not really. I can't solve physics equations for my life, but I can implement physics integrations in software no problem. 🤷♂️

@digital_down2 หลายเดือนก่อน@@SoftBreadSoft I don’t know many programmers outside of myself and a handful of others. I know it’s anecdotal, but I make the assumption based on what I’ve seen. I could easily be wrong that it’s not the normal.

@SoftBreadSoft2 หลายเดือนก่อน^{+1}@@digital_down You come from a more professional side of things probably? I learned programming initially from botting MMOs, warez, that kind of thing. Lots of people who don't have a lot of math but are good programmers in the "hobby" scene.

We could both have some bias from where we came from

@digital_down2 หลายเดือนก่อน^{+1}@@SoftBreadSoft for sure we both have our biases, I am not formally educated either. I learned programming as a way to do more with animation, and I initially did animation as a way to make music videos. It just snowballed into a diverse set of skills, be it programming or production work and I seem to keep snowballing. For me personally, I think there was always that initial love for math even in grade school and as an emergent property of that love has made a lot of technical skills… I wouldn’t say easier, just more involved. I am not great at math by any means, but I love it nonetheless.

@TboneIsRogueปีที่แล้ว^{+2}Man this guy is incredibly talented. Fantastic video! Looking forward to seeing more.

@revenantwolzart7 หลายเดือนก่อน^{+2}The guy was so pationate about functions that he plucked out his hair 😂

@jnotjequelปีที่แล้ว^{+2}finally, the face reveal I've been asking for 2:09

@colonelgraff9198ปีที่แล้ว^{+7}FUNCTIONS

DESCRIBE

THE

WORLD

@mat4151ปีที่แล้ว^{+1}I like your videos because it gives me more curiosity about maths and i do different project than i usually do.

@insulinceปีที่แล้ว^{+3}This is incredibly well made! Can you explore the topic of convolutional neural networks? Those have always been an enigma to me and i’d like to see the theory behind them with your style.

@bean_mhmปีที่แล้ว^{+2}This is genuinely THE best educational video I've ever watched. Really great job, this is good sweet stuff!

@Mel-mu8oxปีที่แล้ว^{+3}"I am a programmer, I not a mathematician"

I go through the pain of learning math, to write programmes to do it for me, so I NEVER have to think about it again XD

@jeffsiegwart9 หลายเดือนก่อนI got my degree in Computer Science in 1989. I worked as a Senior CNC Integration Engineer. Neural Nets are used in integrating new drives to AC induction motors to learn what parameters to use. Your overview answered a lot of questions that I was curious about in neural nets. Thank you.

@mgostIHปีที่แล้ว^{+3}I enjoyed this! It reminded me of the SIREN paper, which uses sinusoidal approximations to deal with interpolating "natural" data (images, audio, videos, differential equations) and does very well even without augmenting the input.

I think this calls more towards us being able to design architectures that can more easily figure out their own preferred spectrum, but as your later analysis suggests, things may scale very differently than what we expect!

@ibraheemamin512ปีที่แล้วwhat's the SIREN paper

@mgostIHปีที่แล้ว@@ibraheemamin512 Implicit Neural Representations with Periodic Activation Functions

@geometryflame712ปีที่แล้ว^{+1}Finally a comprehensive explanation of fourier series'. You did a great job of explaining it.

@hyperduality2838ปีที่แล้วFunctions have goals, targets & objectives hence they are teleological, input is dual to output.

Making predictions to track targets, goals & objectives is a syntropic process -- teleological.

Sine is dual to cosine -- the word co means mutual and implies duality.

Teleological physics (syntropy) is dual to non teleological physics (entropy).

Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics!

"Always two there are" -- Yoda.

Subgroups are dual to subfields -- the Galois correspondence.

Duality creates (emergence, synthesis) reality!

@geometryflame712ปีที่แล้ว@@hyperduality2838 ??? Wrong comment to reply to I think

@hyperduality2838ปีที่แล้ว@@geometryflame712 Concepts are dual to percepts -- the mind duality of Immanuel Kant.

Mathematicians create new concepts all the time from their perceptions, observations or measurements.

Conceptualization or creating new concepts is a syntropic process -- teleological.

Thinking is a syntropic process.

Space is dual to time -- Einstein.

Neural networks make predictions hence they are syntropic by nature and therefore there is a 4th law of thermodynamics!

Controlability is dual to observability -- optimized control theory.

There are new laws of physics which you are not being informed about -- Yoda is correct.

@Cobblestoned100ปีที่แล้ว^{+7}Have you had a look at this paper? It's fascinating and similar to the fourier features results.

th-cam.com/video/Q2fLWGBeaiI/w-d-xo.html

It's possible to combine both methods to get the best of both worlds.

The siren method enables much faster convergence, while the fourier features allow to capture more high frequency detail like high res images, etc.

The only difference is that it uses sin as activation function instead of ReLu + a clever weight initialization schema that is needed for it to work. But when it does it works extremely well.

@Cobblestoned100ปีที่แล้ว^{+1}I guess if you combine these two methods you will get a much more accurate approximation of the mandelbrot set

@OnionKnight5415 หลายเดือนก่อนare you a writer? you speak so goddamn well. i take notes on your videos, and just write down nearly every sentence word for word. amazing.

@jonatan01iปีที่แล้ว^{+4}is everything really a function?

isn't a function more like something with which we try to approximate reality?

@beagle989ปีที่แล้ว^{+1}functions might not be what things are, but functions describe everything

@diadetediotedio6918ปีที่แล้ว^{+1}@@beagle989

This is simply untrue, functions can't describe themselves nor the logical frameworks they are inserted, nor the logical inferential and mathematical rules that makes them possible in first place

@jonatan01iปีที่แล้ว@@beagle989

Suppose I give you a small number (epsilon), for which you can give me a function, such that it's maximum that far away from reality, never worse (never bigger) than epsilon.

Could there be an epsilon, for which it's impossible to find such a function?

@dsagmanปีที่แล้ว^{+1}so good. helps a lot to understand what is often presented with more mystery.

@ManuArt256ปีที่แล้ว^{+1}This is probably one of the best neural network videos I've seen yet!

@angstrom10588 หลายเดือนก่อนThis is quite good. I've worked with NN's since the 1980's, it's my career and I'm an inventor of NN tech starting back in the 90's and still today. I'm just trying to create some "street cred" and this video is very good. Well done! "TanH just seems to work better" is absolutely correct and just the way you should say it. Nice!

@osologicปีที่แล้ว^{+1}Amazing video that explains not only the functioning model of Al neural networks made by human made algorithms.

On the other side, it is very helpful to understand the function EVOLUTION and the subjective EXISTENCE as the neual networking functions that manifest the objective existence, including the humsn body with a wonderful organ or BRAIN that learned the wisdom as you explain it in this wonderful video.

@willykitheka7618ปีที่แล้ว^{+1}I have learnt and I have enjoyed at the same time! Brilliant work!

@jayeifler88129 หลายเดือนก่อน^{+1}I can't help but point out the ReLU function is basically a diode, but to be more specific, a simple piecewise linear approximation of a diode.

@YannikaLuvAI4 หลายเดือนก่อน^{+1}My brain whenever I try to learn anything related to math: "okay, but what if we had a starship and unlimited supply of pizza?!" 🤷🏻♀️

@xavierxon2 หลายเดือนก่อนIt was so amazing and awesome to see the complex mathematical functions at work and giving the physical significance. Really a good content and a learning drive.

@ec92009y4 หลายเดือนก่อนSecond time watching your video. It has not aged a day. Very well done, sir.

@0xD4rky8 หลายเดือนก่อนI, being new to this field, was only accoustomed to work with simple neural networks. After watching this video, my mind just blew off realizing the concept of fourier transformation in NN. Its just pure gold. Tbh, I wasn't expecting a NN to perform such well on Mandelbrot function. Would love to dive deep into fourier transformation in NN. Thanks for the enlightment!

@alejrandom6592ปีที่แล้ว^{+1}This is one of the clearest explanations I've seen

@lightspeedlion6 หลายเดือนก่อนEvery word well said. I go back visit other video and come back to see the articulation and virtualization, everything really makes sense. Exceptional!

@jimmygore8214ปีที่แล้วPeople like you are the fruit of humanity! These videos are of great benefit to everyone because being able to understand such complex mathematical topics in a visual manner is the best.

@keshavharipersad20242 หลายเดือนก่อนWhat a great video. Introduced me to many different types of networks, aside from the traditional neural network.

@hellopomelo2ปีที่แล้ว^{+1}Great video! This video does bring some parallels with compressed sensing. If you have the time, do check out compressed sensing, where we can recover images/information even if said information is massively undersampled, given that is sparse in some basis(fourier basis, for example)

@TSHRGPT329ปีที่แล้วThis is the first video on youtube, in my lifetime where I wanted to give ❤ instead of 👍. It cleared my very old querry, which is why we do feature engineering.

@ehza3 หลายเดือนก่อน^{+1}Oh man I am crying 😭, this is beautiful!

@orijeetmukherjee5805 หลายเดือนก่อนOMG!!! what a video, felt like a movie. Instant sub man

@Lovefun55811 หลายเดือนก่อน^{+1}This was really incredible. Amazing work, thanks for sharing.

@samthibodeau35117 หลายเดือนก่อนYou have to be one of the greatest math teachers I've ever hear lecture or give a tutorial or course like this! I have so much to say but i'm overwhelmed so I'll just say THANK YOU! Namaste!

@aaryanmehta657710 หลายเดือนก่อนone of the best videos i've ever seen. as someone who's pursuing his masters in CS, this video gave me so many different insights about what neural networks really are. 🙌

@smithhoowe9 หลายเดือนก่อนWatching this took me back to CC and learning calculus. I had always figured it was something of a badge of prestige but that it would never really be used, now I feel validated and want to relearn some of what I had forgotten to time. Thank you for this :)

@claudiusraphael942311 หลายเดือนก่อนGREAT TENNIS!

Btw. the "exact" first minute is the most on point meme ever and possible personal-best-lap-candidate for speedrunning life. Thanks for sharing!

@thomaskaldahl196ปีที่แล้ว^{+1}lol I can make a cheap solution to the problem you pose at the end. I just use a plain old neural network, but in addition to x,y inputs, I add a feature as a 3rd input: the mandelbrot feature. the Mandelbrot feature is calculated by taking the x,y inputs and calculating the value of the Mandelbrot set function you showed in the video ;)

@halihammerปีที่แล้วSome of the most beautiful visualizations out there! I love it!

@mathpuppy314ปีที่แล้ว^{+1}Great job on this!!! It's interesting, it's educational, and on top of that, it's so entertaining as well!

@samuelnewport4970ปีที่แล้ว^{+1}Overfitting can be eliminated if the x distance between samples (in the 2D case) are less than the nyquist rate (twice the highest frequency sinusoid used) otherwise you get spectral interference within the frequency domain. So there’s a tangible calculation for precisely what number of Fourier vectors will result in overfitting. (Loves the video by the way)

@i2c_jasonหลายเดือนก่อนThe coolest aspect of an FFT, in my opinion, is that you can apply them to units of distance and inject a sine wave at various frequencies. In terms of distance, this means you could take something "real" like a turned lathe part profile and "poof" get resonance or anti-resonance wherever you like. The results - eliminate tool chatter, make your part "wavy", you could probably even thread a shaft using an FFT function and a known starting point / RPM. An FFT/IFFT library is like having a nuke in your Python code.

@hellodavidryan3 หลายเดือนก่อนAmazing work in visualizing and teaching these concepts. Instantly subscribed.

@ChainsawDNAปีที่แล้วThis is one of the best introduction videos on the topic. Congratulations on a job well done.

@nileshbarai49995 หลายเดือนก่อนThe video was thoroughly interesting which is great as I have seen some videos that are very dry on the topic. I like how you open it up to the audience to communicate on Discord to provide alternative solutions. I would not have even commented as there was many great comments already but no one added the funny note about ChatGPT in the credits...lol

@windragoปีที่แล้วincredibly well done from script to pace to the overall value - instant sub

@NathanSMS2611 หลายเดือนก่อนThank you for including your code. I just finished a MS in robotics and AI but with how my program was structured there was heavy focus on learning concepts but the deepest exposure I got in a practical sense was evaluating images using classification models. I've been wanting to dive into a project to learn inverse kinematics for a robotic arm I built and I think your code will be a great reference

@tomsterbg8130ปีที่แล้ว^{+1}19:45 Bro fr just said "This is a better approximation than the last, but here's the real one. Nevermind that's an approximation"

@JonCianci1211 หลายเดือนก่อนThis was a game changer for my little experiment in time series forecasting. I brought things right back to basics and just tried to approximate a multiplication function that takes a number and multiples by two. Once I had that benchmark, it gave me a solid starting point to move out from 😎

@kevin784911 หลายเดือนก่อน^{+1}Found the bit where it tries to approximate the mandolbrot set fascinating. It’s like watching our own minds grapple with the unknown but simplified.

@kevin784911 หลายเดือนก่อนAnd great video too

@spartan2255029 วันที่ผ่านมาi spotted some concepts we use in computer graphics, still not sure to understand how it works but now i understand on what subject i should investigate. greate explanation.

@carrumar10 หลายเดือนก่อนWow, This video is mind blowing, it generalizes the definition of an nn in a very visual way. Thanks a lot for the effort

@tomekem34738 หลายเดือนก่อนThis is by far the most useful and mind opening video about neural networks I have ever seen :)