But what is a neural network REALLY?

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 มิ.ย. 2024
  • My submission for 2022 #SoME2. In this video I try to explain what a neural network is in the simplest way possible. That means no linear algebra, no calculus, and definitely no statistics. The aim is to be accessible to absolutely anyone.
    00:00 Intro
    00:47 Gauss & Parametric Regression
    02:59 Fitting a Straight Line
    06:39 Defining a 1-layer Neural Network
    09:29 Defining a 2-layer Neural Network
    Part of the motivation for making this video is to try to dispel some of the misunderstandings around #deeplearning and to highlight 1) just how simple the neural network algorithm actually is and 2) just how NOT like a human brain it is.
    I also haven't seen Gauss's original discovery of parametric regression presented anywhere before, and I think its a fun story to highlight just how far (and how little) data science has come in 200 years.
    ***************************
    In full disclosure, planets do not orbit in straight lines, and Gauss did not fit a straight line to Ceres' positions, but rather an ellipse (in 3d).

ความคิดเห็น • 224

  • @dsagman
    @dsagman ปีที่แล้ว +206

    “Do neural networks work because they reason like a human. No. They work because they fit the data.” You should have added “boom. mic drop.”. Excellent video!

    • @LuisPereira-bn8jq
      @LuisPereira-bn8jq ปีที่แล้ว +33

      Can't say I agree. I really liked the video as a whole, but that "drop" was the worst part of the video to me, since it's a bit of a strawman, for at least two reasons:
      - knowing what a complex system does "at a foundational level" is very far from allowing you to understand the system. After all, Biology is "just" applied Chemistry which in turn is "just" applied Physics, but good luck explaining any complex biological system from physical principles alone.
      - much of what humans do doesn't use "reason" at all. A few years back I decided to start learning Japanese. And I recall that for the first few months of listening to random native Japanese speakers I'd have trouble even correctly identifying the syllables of their words. But after some time and more exposure to the sounds, grammar, and speech patterns, that gradually improved. Yet that improvement had little to do with me *reasoning* about the language, and was largely an unconscious process of my brain getting better at pattern recognition in the language.
      At least when it comes to "pattern recognition" I see no compelling reason to declare that humans (and animals, for that matter) are doing anything fundamentally different from neural networks.

    • @algorithmicsimplicity
      @algorithmicsimplicity  ปีที่แล้ว +43

      My comments about neural networks reasoning were in response to some of the recent discussions about large language models being conscious. My impression is that these discussions give people a wildly inaccurate view of what neural networks actually do. I just wanted to make it clear that all neural networks do is curve fitting.
      Sure you can say "neural networks are a function that map inputs to outputs" and "humans are a function that map inputs to outputs", therefore they are fundamentally doing the same thing. But there are important differences between humans and neural networks. For one thing, in the human's case the function is not learned by curve fitting. It is learned by Bayesian inference. Humans are born with an incredible amount of prior knowledge about the world, including what types of sounds human language can contain. This is why you were able to learn to recognize Japanese sounds in a few months, where it would take a neural network the equivalent of thousands of years worth of examples.
      If you want to say that neural networks are doing the same thing as humans that's fine, but you should equally be comfortable saying that random forests are doing the same thing as humans.

    • @danielguy3581
      @danielguy3581 ปีที่แล้ว +16

      @@algorithmicsimplicity Whatever mechanism underlies human cognition, if it begets the same results as a neural network, then it can be said to also "merely" perform curve fitting. Whether that can also be described in terms of Bayesian inference would not invalidate that. Similarly, it is not helpful stating there's nothing to understand or use as a model in neurobiology since it is just atoms minimizing energy states.

    • @charletsebeth
      @charletsebeth ปีที่แล้ว +1

      Why ruin a good story with the truth?

    • @revimfadli4666
      @revimfadli4666 ปีที่แล้ว +2

      @@LuisPereira-bn8jq aren't you making a strawman yourself?
      Also wouldn't your language example still count as his "learning abstract hierarchies and concepts"?

  • @BurkeMcCabe
    @BurkeMcCabe ปีที่แล้ว +176

    BRUH. This video gave me that amazing feeling when something clicks in your brain and everything all of a sudden makes sense! Thank you I have never seen neural networks explained in this way before.

  • @sissyrecovery
    @sissyrecovery 10 หลายเดือนก่อน +12

    DUUUUUUDE. I watched all the people you'd expect, 3Blue1Brown, StatQuest etc. I was so lost. Then I was ranting to a friend about how exasperated I was, and he sent me this video. BAM. Everything clicked. You're the man. Also, love how you ended it with a banger.

  • @Weberbros1
    @Weberbros1 ปีที่แล้ว +136

    I was expecting a duplicate of many other nueral network videos, but this was a perspective that I have not seen before! Awesome Video!

  • @newchaoz
    @newchaoz ปีที่แล้ว +18

    This is THE best intuition behind neural networks I have ever seen. Thanks for the great video!

  • @AdobadoFantastico
    @AdobadoFantastico ปีที่แล้ว +53

    Getting some bit of math history context makes these extra enjoyable. Great video, explanation, and visualization.

  • @orik737
    @orik737 ปีที่แล้ว +11

    Oh my lord. I've been struggling with neural networks for awhile and I've always felt like I have a decent grasp on them but this video finally brought everything together. Beautiful introduction

  • @DanaTheLateBloomingFruitLoop
    @DanaTheLateBloomingFruitLoop ปีที่แล้ว +29

    Simply and elegantly explained. The bit at the end was superb.

    • @zenithparsec
      @zenithparsec ปีที่แล้ว

      Except this just described one activation function, and did not show it generalized to all neural networks. Being so accessible means it couldn't explain ReLU in context.
      Don't get me wrong, it's a good explanation of how some variants of the ReLU activation function works, but it doesn't explain what a neural network really is, nor prove that your brain doesn't work by fitting data in a similar way.

  • @AyrtonGomesz
    @AyrtonGomesz 2 วันที่ผ่านมา +1

    Great work. This video just updated my definition of awesomeness.

  • @dfparker2002
    @dfparker2002 4 หลายเดือนก่อน +1

    Best explanation of parametric calcs ever! Bias & weights has new meaning

  • @Stone87148
    @Stone87148 ปีที่แล้ว +14

    Building an intuitive understanding of the math behind Neural Network is so important.
    Understand the application of NN gets the job done; understand the math behind of NN makes the job fun. This video helps the latter! Nice video!

  • @igorbesel4910
    @igorbesel4910 ปีที่แล้ว +5

    Made my brain goes boom. Seriously thanks for sharing this perspective!

  • @paulhamacher773
    @paulhamacher773 4 วันที่ผ่านมา +1

    Brilliant explanation! Very glad I stumbled on your channel!

  • @dineshkumarramasamy9849
    @dineshkumarramasamy9849 หลายเดือนก่อน +1

    I love to get the history lesson first always, Excellent.

  • @videos4mydad
    @videos4mydad 5 หลายเดือนก่อน +1

    This is the best video I have ever seen on the internet that describes what a neural network is actually
    The best and most powerful explanations are those that give you the intuitive meaning behind the math and this video does it perfectly
    When a video describes a neural network by jumping into matrices and talking about subscripts i's and J's, they're just talking about the mechanics and do absolutely nothing about making you understand what you're reading
    Unfortunately, this is how most textbooks approach the subject and it's also how many content creators approach the subject as well
    This type of video only comes from someone who understands things so deeply that they're able to explain it in a way that involves almost zero math
    I consider this video one of the true treasures of TH-cam involving artificial intelligence education

  • @gregor-alic
    @gregor-alic ปีที่แล้ว +26

    Great video!
    I think this video finally shows what I was waiting for, namely what is the purpose of multiple neurons / layers in a neural network intuitively.
    This is the first time i have actually seen it explained clearly, good job!

  • @PolyRocketMatt
    @PolyRocketMatt ปีที่แล้ว +5

    This might actually be the clearest perspective on neural networks I have seen yet!

  • @illeto
    @illeto 15 วันที่ผ่านมา +1

    Fantastic video! I have been working with econometrics, data science, neural networks, and various kind of ML for 20 years but never thought of the ReLU neural networks as just a series of linear regressions until now!

  • @Deletaste
    @Deletaste ปีที่แล้ว +5

    And with this single video, you earned my subscription.

  • @stefanzander5956
    @stefanzander5956 ปีที่แล้ว +7

    One of the best introductory explanations about the foundational principles of neural networks. Well done and keep up the good work!

  • @sharkbaitquinnbarbossa3162
    @sharkbaitquinnbarbossa3162 ปีที่แล้ว +11

    This is a really great video!! Love the approach with parametric regression.

  • @aloufin
    @aloufin 7 วันที่ผ่านมา +1

    amazing viewpoint of explanation. Would've loved an additional segment using this viewpoint to do the MNIST image recognition

    • @algorithmicsimplicity
      @algorithmicsimplicity  7 วันที่ผ่านมา +1

      I explain how this viewpoint applies in the case of image classification in my video on CNNs: th-cam.com/video/8iIdWHjleIs/w-d-xo.html

  • @PowerhouseCell
    @PowerhouseCell ปีที่แล้ว +13

    This is such a cool way of thinking about it! You did an amazing job discussing a popular topic in a refreshing way. I can't believe I just found your channel - as a video creator myself, I understand how much time this must have taken. Liked and subscribed 💛

  • @williamwilkinson2748
    @williamwilkinson2748 ปีที่แล้ว +3

    The best video I have seen in giving one an understanding of neural nets. Thank you. Excellent, looking for more from you.

  • @metrix7513
    @metrix7513 ปีที่แล้ว +5

    Like someone else said, I expected the video to be similar to all the others, but this one gave me so much more, very nice.

  • @KIRA_VX
    @KIRA_VX 3 หลายเดือนก่อน

    IMO one of the best explanation when it comes the idea/fundamental concept of the NN, Please make more 🙏

    • @algorithmicsimplicity
      @algorithmicsimplicity  3 หลายเดือนก่อน +1

      Thank you so much! Don't worry, more videos are on the way!

  • @karkunow
    @karkunow 9 หลายเดือนก่อน +3

    Thank you! That is a really brilliant video!
    I have been using regressions often, but never knew that Neural Network is kinda the same idea.
    Very enlightening!

  • @srbox2
    @srbox2 5 หลายเดือนก่อน

    This is flat out the best video on neural networks on the internet, provided you are not a complete newbie. Never have I had such an "ahaaa" moment. Clear, consize, easy to follow, going from 0 to hero effortlessly. Bravo.

  • @ArghyadebBandyopadhyay
    @ArghyadebBandyopadhyay ปีที่แล้ว +2

    THIS was the missing piece of the puzzle I was looking for. This video helped me a lot. Thanks.

  • @symbolsforpangaea6951
    @symbolsforpangaea6951 ปีที่แล้ว +2

    Amazing explanations!! Thank you!!

  • @napomokoetle
    @napomokoetle 7 หลายเดือนก่อน

    This the clearest video I've ever seen on TH-cam on what a Neural Network is. Thank you so much... you are a star. Could I perhaps ask or encourage you to please create for many of us keen on learning Neural networks on our own, a video practically illustrating the fundamental difference between supervised, unsupervised and reinforcement learning.

  • @jcorey333
    @jcorey333 3 หลายเดือนก่อน +1

    Your channel is really amazing! Thanks for making videos.

  • @jorgesolorio620
    @jorgesolorio620 ปีที่แล้ว +1

    Where has this video been all my life! amazing simply amazing! we need more please

  • @geekinasuit8333
    @geekinasuit8333 7 หลายเดือนก่อน

    I was wondering myself exactly what a simulated NN actually is doing (not what it is, but what it is doing) and this explanation is the best by far, if not THE answer. One adjustment I will suggest, is at the end explain that a simulated NN is not required at all, and explain that alternative systems can also perform the same function, which begs the question, what exactly are the fundamental requirements needed for line fitting to occur? Yes I like to generalize and get to the fundamentals.

  • @geekinasuit8333
    @geekinasuit8333 7 หลายเดือนก่อน

    Another explanation that's needed is to explain the concept of gradient descent (GD), that's the generalized method used to figure out the best fit. Lot's of systems use GD including natural evolution, it's basically trial and error with adjustments, although there are various ways to make it work more efficiently which can become quite complicated. You can even use GD to figure out better forms of the GD algorithm, that is it can be used recursively on itself.

  • @ChrisJV1883
    @ChrisJV1883 9 หลายเดือนก่อน

    I've loved all three of your videos, looking forward to more!

  • @DmitryRomanov
    @DmitryRomanov ปีที่แล้ว +1

    Thank you!
    Really beautiful point about layers and exp growth of number of a segments one can make!

  • @ultraFilmwatch
    @ultraFilmwatch 9 หลายเดือนก่อน +1

    Thank you thousands of times, you excellent teacher. Finally, I saw a high-quality and clear explanation of neural networks.

  • @some1rational
    @some1rational ปีที่แล้ว

    Great video, this is an explanation I have not heard before. Also I don't know if that abrupt ending was purposefully sarcastic, but I thoroughly enjoyed it lol

  • @Muuip
    @Muuip 8 หลายเดือนก่อน

    Another great concise visual explanation!
    Thank you!👍

  • @MrLegarcia
    @MrLegarcia ปีที่แล้ว +2

    This straight forward explaining method can save thousands of kids from dropping school "due to math"

  • @ButcherTTV
    @ButcherTTV 12 วันที่ผ่านมา +1

    good video! very easy to follow.

  • @doublynegative9015
    @doublynegative9015 ปีที่แล้ว +12

    Just watched Sebastian Lague's video on Neural Networks the other day, and whilst great as always, it was _such_ a standard method of explaining them. Because mostly I just see this explained in the same way each time. This was such a nice change, and really provided me with a different way to look at this. Seeing 'no lin-alg, no calc, no stats' really concerned me, but, you did a great job, just by trying to explain different parts. Such a great explanation - would recommend to others.

  • @martinsanchez-hw4fi
    @martinsanchez-hw4fi ปีที่แล้ว +3

    Good one! Nice video. In the regression line of Gauss one is not taking the perpendicular distances, though. But very cool video!

  • @garagedoorvideos
    @garagedoorvideos ปีที่แล้ว +2

    8:47 --> 9:21 Is like watching my brain while I predict some trades. 🤣🤣🤣 "The reason why neural networks work....is that they fit the data" sweet stuff.

  • @andanssas
    @andanssas ปีที่แล้ว +1

    Great concise explanation, and it does works: it fits at least my brain's data like a glove! Not that I have a head shaped like a hand (or do I?), but you did light up some bulbs in there after watching those lines animations fitting better and better.
    However, what happens when the neural network fits too well?
    If you can briefly mention the overfitting problem in one of your next episodes, I''d greatly appreciate. Looking forward to the CNNs and transformer ones! 🦾🤖

  • @DaryonGaming
    @DaryonGaming ปีที่แล้ว

    i'm positive I only got this recommended because of Veritasium's FFT video, but thank you youtube algorithm nonetheless. What a brilliant explanation!

  • @aravindr7422
    @aravindr7422 9 หลายเดือนก่อน

    wow. very good. keep posting great content like this. you have an better potential to explain complex topics to simpler versions. and there are people who just post content just for the sake of posting and minting money. we need more people like you.

  • @orangemash
    @orangemash ปีที่แล้ว

    Excellent! First time I've seen it explained like this.

  • @Justarandomguyonyoutube12345
    @Justarandomguyonyoutube12345 8 หลายเดือนก่อน +1

    I wish I could like the video more than once..
    Great job buddy

  • @bisaster5471
    @bisaster5471 ปีที่แล้ว

    480p in 2022 surely takes me back in time. i love it!!

  • @tunafllsh
    @tunafllsh ปีที่แล้ว +2

    Wow this is a really interesting view to the neural networks and what role do layers play in it.

  • @gbeziuk
    @gbeziuk ปีที่แล้ว

    Great video. Special thanks for the historical background.

  • @lewismassie
    @lewismassie ปีที่แล้ว +1

    Oh wow. This was so much more than I was expecting. And then it all clicked right in at about 9:45

  • @yonnn7523
    @yonnn7523 7 หลายเดือนก่อน

    wow, ReLU is an unexpected starting point to explain NNs, but nicely demonstrates the flexibility of summing up weighted non-linear functions.such a refreshing way!

  • @lollmao249
    @lollmao249 2 หลายเดือนก่อน +1

    This EXCELLENT and the best video explaining intuitively what a neural network does. You are seriously brilliant

  • @metanick1837
    @metanick1837 9 หลายเดือนก่อน

    Nicely explained!

  • @SiimKoger
    @SiimKoger ปีที่แล้ว

    Might be the best and most rational neural networks video on TH-cam that I've seen 🤘🤘

  • @sciencely8601
    @sciencely8601 9 วันที่ผ่านมา +1

    god bless you for this work

  • @karlbooklover
    @karlbooklover ปีที่แล้ว

    most intuitive explanation ive seen

  • @is_this_youtube
    @is_this_youtube ปีที่แล้ว +1

    This is such a good explanation

  • @colebrzezinski4059
    @colebrzezinski4059 7 หลายเดือนก่อน

    This is a really good explanation

  • @xt3708
    @xt3708 9 หลายเดือนก่อน +1

    This makes total sense thank you. With the last observation of the video, how does that reconcile with statements from the openai team regarding emergent properties of GPT4, that they didn't expect, or don't comprehend. I might be mixing apples and oranges, but if it's just curve fitting then why has some thing substantially changed?

  • @koderksix
    @koderksix 7 หลายเดือนก่อน

    I like this video so much
    It really shows that ANNs are really just, at the end of the day, glorified multivariate regression models.

  • @ward_heimdal
    @ward_heimdal 9 หลายเดือนก่อน

    Hands down the most enlightening ANN series on the net from my perspective, afaik. I'd be happy to pay 5 USD for the next video in the series.

  • @master11111
    @master11111 ปีที่แล้ว +1

    That's a great explanation

  • @Gravitation.
    @Gravitation. ปีที่แล้ว +5

    beautiful! could you do this type of videos on other machine learning models such as convolution?

  • @scarletsence
    @scarletsence ปีที่แล้ว

    Actually adding a bit of math to this video won't hurt while you add to them visual representation of graphs and formulas. But any way one of the most accessible explanation i have ever seen.

  • @4.0.4
    @4.0.4 8 หลายเดือนก่อน

    This is the second video of yours that I watch that gives me an eureka moment. Fantastic content. One thing I don't get is, people used to use the sigmoid function before ReLU, right? Was it just because natural neurons work like that and artificial ones were inspired by them?

    • @algorithmicsimplicity
      @algorithmicsimplicity  8 หลายเดือนก่อน

      Yes sigmoid was the most common activation function up until around 2010. The very earliest neural networks back in the 1950s all used sigmoid, supposedly to better model real neurons, and nobody questioned this choice for a long time. Interestingly, the very first convolutional neural network paper in 1980 used ReLU, and even though it was already clear that ReLU performed better than sigmoid back then, it still took another 30 years for ReLU to catch on and become the most popular choice.

  • @saysoy1
    @saysoy1 ปีที่แล้ว +2

    I loved the video, would you please make another one explaining the back propagation?

    • @algorithmicsimplicity
      @algorithmicsimplicity  ปีที่แล้ว +5

      Hopefully I will get around to making a back propagation video sometime, but my immediate plans are to make videos for CNNs and transformers.

    • @saysoy1
      @saysoy1 ปีที่แล้ว +1

      @@algorithmicsimplicity just don't stop man!

  • @borisbadinoff1291
    @borisbadinoff1291 6 หลายเดือนก่อน

    Brilliant! Lots to unload from the concluding sentence: neural networks works because they fit the data. Sounds like an even deeper issue than misalignment due to proxy-based training.

  • @johnchessant3012
    @johnchessant3012 ปีที่แล้ว +1

    Great video!

  • @Scrawlerism
    @Scrawlerism ปีที่แล้ว

    Damn you need and deserve more subscribers!

  • @redpanda8961
    @redpanda8961 ปีที่แล้ว +1

    great video!

  • @hibamajdy9769
    @hibamajdy9769 7 หลายเดือนก่อน

    Nice interpretation 😊, please can you make a video explaining how neural networks used in for example digit recognition

  • @LegenDUS2
    @LegenDUS2 ปีที่แล้ว

    Really nice video!

  • @willturner1105
    @willturner1105 ปีที่แล้ว +1

    Love this!

  • @qrubmeeaz
    @qrubmeeaz 8 หลายเดือนก่อน

    Careful there! You should explicitly mention that you are taking the absolute values of the errors. (Usually we use squares). Without the squares (or abs), the positive and negative errors will kill each other off, and the simple regression does not have a unique solution. Without the squares (or abs), you can start with any intercept, and find a slope that will give you ZERO total error!!

  • @StephenGillie
    @StephenGillie ปีที่แล้ว

    Having worked with a simple single-layer 2-synapse neuron in a spreadsheet, I find this video vastly overexplains the topic at a high level, while not going into enough detail. It does, however, go over the linear regression needed for the synapse weight updates. Also it treats the massive regression testing as a benefit instead of a cost.
    One synapse per neuron in the layer above, or per input if the top layer.
    One neuron per output if the bottom layer.
    Middle layers define resolution, from this video at a rate of (neurons per layer)^(layers).
    Fun fact: Neural MAC (multiply-accumulate) chips can perform whole racks worth of computation. The efficiency gain here isn't so much in speed as it is reduction of power and space, by rearranging the compute units and using analog accumulation. In this way the MAC units more closely resemble our own neurons too.

  • @bassemmansour3163
    @bassemmansour3163 ปีที่แล้ว

    👍 Super demonstration! how did you generate the graphics? Thanks!

  • @oleksandrkatrusha9882
    @oleksandrkatrusha9882 9 หลายเดือนก่อน

    Amazing!

  • @justaname999
    @justaname999 หลายเดือนก่อน +1

    This is a really cool explanation I haven't seen before.
    But I have two questions:
    Where does overfitting fit in here? more neurons would mean higher risk of overfitting? do layers help or are they unrelated?
    And where would co-activation of multiple neurons fit in this explanation? e.g., combination of information from multiple sensory sources?

    • @algorithmicsimplicity
      @algorithmicsimplicity  หลายเดือนก่อน

      My video on CNNs talks about overfitting and how neural networks avoid it (th-cam.com/video/8iIdWHjleIs/w-d-xo.html ) . It turns out that actually the more neurons and layers there are, the LESS neural nets overfit, but the reason is pretty unintuitive.
      From the neural nets perspective, there is no such thing as multiple sensory sources. Even if your input to the NN combines images and text, the neural net still just sees a vector as input, and it is still doing curve fitting just in a higher dimensional space (dimensions from image + dimensions from text).

    • @justaname999
      @justaname999 หลายเดือนก่อน

      @@algorithmicsimplicity Thank you! I had read the more neurons lead to less overfitting and thought it was counterintuitive but I guess that must have carried over from the regular modeling approach where variables remain (or should) interpretable.
      I'll have a look at the other videos! Thanks
      I guess my confusion stems from what you address at the end. We can fairly simply imitate some things via principles like Hebbian learning but the fact that in actual brains it involves different interconnected systems makes me stumble. (and it shouldn't because obviously these models are not actually like real brains)

  • @dasanoneia4730
    @dasanoneia4730 8 หลายเดือนก่อน

    Thanks needed this

  • @vasilin97
    @vasilin97 8 หลายเดือนก่อน +1

    Great video! I am left with a question though. If the number of straight line segments in an NN with n neurons in each of the L layers is n^L, then why would we ever use n > 2? If we are constrained by the total number of neurons n*L, than n = 2 maximizes n^L. I have two guesses why use n>2:
    1. (Hardware) Linear algebra is fast, especially on a GPU. We want to use vectors of larger sizes to make use of the parallelism.
    2. (Math) Maybe the number of gradient descent steps needed to fit a deeper neural network is larger than to fit a shallower NN with wider layers?
    If you plan to make any more videos about this, this question would be great to address. If not, maybe you can reply here with your thoughts? Thank you!

    • @algorithmicsimplicity
      @algorithmicsimplicity  8 หลายเดือนก่อน +2

      Really good question. There are 2 reasons why neural networks tend to use very large n (usually several thousand) even though this means less representation capacity. The first is, as you guessed, it makes better use of GPU accelerators. You can't parallelize computation across layers, but you can parallelize computation across neurons within the same layer.
      The second, and more important reason, is that in practice we don't care that much about representation power. Realistically, as soon as you have 10 layers with a few hundred neurons in each, you already have enough representation power to fit any function in the universe.
      What we actually care about is generalization performance. Just because your network has the capacity to represent the target function, doesn't mean that it will learn the correct target function from the training data. It is much more likely that the network will just overfit to the training dataset.
      It turns out to be the case that the wider a neural network is, the better it generalizes. It is still an open area of research why this is the case, but there are a few hypotheses floating around. My other video on convolutional neural networks actually goes into one of the hypotheses a bit, in it I explain that the more neurons you have the more likely it is that they have good initializations, but it was a bit hand-wavy. I was planning to do a more in-depth video on this topic at some point.

    • @vasilin97
      @vasilin97 8 หลายเดือนก่อน

      @@algorithmicsimplicity thank you for such a thoughtful reply! I'll watch your CNN video and all other videos you'll produce on this topic!
      I thought that the whole overfitting business is kind of obsolete nowadays, with LLMs having more neurons than the number of training data samples. This is only a rough understanding I've gained from some random articles, and would love to learn more about it. Do you have any suggestions for what to read or watch in this direction? As you noted in the video, there is lots of low-quality content about NNs out there, which makes it hard to find answers to even rather straightforward questions like whether overfitting is "still a thing" in large models.

    • @algorithmicsimplicity
      @algorithmicsimplicity  8 หลายเดือนก่อน +1

      @@vasilin97 The reason why we use such large neural networks is precisely because larger neural networks overfit less than smaller neural networks. This is a pretty counter-intuitive result, and is contrary to what traditional statistical learning theory predicts, but it is empirically observed over and over again. This phenomenon is known as "double descent", you should be able to find some good resources on this topic searching for that term, for example www.lesswrong.com/posts/FRv7ryoqtvSuqBxuT/understanding-deep-double-descent , medium.com/mlearning-ai/double-descent-8f92dfdc442f . The Wikipedia page on double descent is pretty good too.

  • @stefanrigger7675
    @stefanrigger7675 ปีที่แล้ว

    Top notch video, one thing you might have mentioned is that you only deal with the one-dimensional case here.

  • @MARTIN-101
    @MARTIN-101 ปีที่แล้ว

    phenomenal

  • @HitAndMissLab
    @HitAndMissLab 9 หลายเดือนก่อน

    You are math God! . . . subscribed

  • @abdulhakim4639
    @abdulhakim4639 ปีที่แล้ว

    Whoa, easy to understand for me.

  • @spadress
    @spadress 7 หลายเดือนก่อน

    Very good video

  • @dann_y5319
    @dann_y5319 หลายเดือนก่อน +1

    Omg great video

  • @TheCebulon
    @TheCebulon ปีที่แล้ว

    Do you have a video of how to apply this to train a neural network?
    Would be awesome.

  • @Null_Simplex
    @Null_Simplex 3 หลายเดือนก่อน

    Thank you. This is far more intuitive than the usual interpretation with nodes and edges graph with the inputs bouncing back and forth between layers of the graph until it finally gets an output.
    What are the advantages and disadvantages between this method of approximating a function and polynomial interpolation?

    • @algorithmicsimplicity
      @algorithmicsimplicity  3 หลายเดือนก่อน +1

      For 1-dimension inputs and outputs, there isn't much difference between them. For higher dimensional inputs polynomials become infeasible, since a polynomial would need coefficients for all of the interaction terms between the input variables (of which there are exponentially many). For this reason, neural nets are preferred when input is high dimensional as they simply apply a linear function to the input variables, and then an activation function to the result of that.

  • @kwinvdv
    @kwinvdv ปีที่แล้ว

    Neural network "training" is just model fitting. It is just that the proposed structure of it is just quite versatile.

  • @willowarkan2263
    @willowarkan2263 ปีที่แล้ว +2

    Might have been useful to explain why a nonlinear function like the ReLU is used as the transfer function, or that it's not the only common transfer function, which is kind of implied since you use the identity in the beginning, though that isn't nonlinear.
    Also the link to the human brain is in the structure of neurons, the conceptual foundation of a weighted sum transformed into a single value by a nonlinear process, dendritic signals combined into a single signal down the axon, at the end of which lie new connections to further neurons. Furthermore in the case of computer vision the structure of the visual cortex served as an inspiration for the neural networks in that field. If memory serves the neocognitron was one such network, not a learning network as it's parameters were tweeked by humans till it did as desired, but foundational to convolutional neural networks.
    Otherwise interesting enough, stressing the behind the scenes nature of neural networks, though maybe mentioning how classification relates to those regressions would have been cool too.
    Btw what learning scheme were you using, as far as I could tell it was some small random jump first added and then subtracted if the result got worse? I assume if the substraction is even worse it rolls back the entire thing and rolls a new jump? I ask as it neither sounded nor looked like back propagation was used.
    The thumbnail still bugs me, the graph representation of the network isn't wrong it just shows something different, the nature of the interconnections between neurons in the network that are hard to see in the graph representation of the resulting regression. It's like saying the map of the route is wrong because it's not a photo of the destination.

    • @algorithmicsimplicity
      @algorithmicsimplicity  ปีที่แล้ว +6

      All commonly used activations are either ReLU or smooth approximations to ReLU, so I disagree that it is useful to talk about alternatives in an introductory video.
      It is not helpful to think of modern neural networks as having anything to do with real brains. Yes early neural networks were inspired by neuroscience experiments from the 1950s. But our modern understanding is that the way the brain actually works is vastly more complex than those 1950s experiments let on. Not to mention modern neural networks architectures are even less inspired by those experiments (e.g. transformers).
      The learning scheme I used is iterate through parameters, increase parameter value by 0.1, if new loss is worse than old loss then decrease parameter value by 0.2. That's it. I just repeated that operation. No backpropagation was used, as the purpose of backprop is only to speed up the computation (which was unnecessary for these toy examples).
      The graph representation of the network is absolutely wrong, as it misleads people into thinking the interconnections between neurons are relevant at all, when they have nothing to do with how or why neural nets work.

    • @willowarkan2263
      @willowarkan2263 ปีที่แล้ว

      @@algorithmicsimplicity i would argue that modern neural networks developed from the more primitive neural networks based on the then understanding of the brain. Also I didn't write neural networks work like the brain, just that their basic building blocks were inspired by the basic building blocks of the brain and some structural aspects, as understood at the time, inspired the structure of NN.
      So you are claiming that neither sigmoid, hyperbolic, nor any of the transfer functions common to RBFNN are used? Or are so rarely used as to be negligible?
      Yes ReLU and related functions are currently popular as the transfer function of hidden layers in deep learning and CNNs. Although some of the related functions start to depart quite a bit from the original, mostly keeping linearity in the positive domain, approximations not withstanding. Looking through them it feels like calling sigmoid and tanh functions related to the step function, which they kinda are, similarly solving issues with differentiability as the approximations to ReLU do.
      So you kind of used a grid search, on a 0.1 sized grid to discretize the parameter space.
      Pretty sure a NN without connections isn't a network. Especially for back propagation they are inherent in it's functioning, after all it needs to propagate over those connections.
      I don't see what you mean by it misleads people or how the interconnectedness is meaningless. The fact that a single layer wasn't enough to solve a non linear separable problem is what kept the field inactive for at least a decade.

    • @algorithmicsimplicity
      @algorithmicsimplicity  ปีที่แล้ว +6

      "So you are claiming that neither sigmoid, hyperbolic, nor any of the transfer functions common to RBFNN are used? Or are so rarely used as to be negligible?"

    • @kalisticmodiani2613
      @kalisticmodiani2613 8 หลายเดือนก่อน

      @@algorithmicsimplicity if we didn't need performance then we wouldn't have made so much progress in the last few years. Stars aligned with performant hardware and efficient algorithms.

  • @TheEmrobe
    @TheEmrobe ปีที่แล้ว

    Brilliant.

  • @talsheaffer9988
    @talsheaffer9988 27 วันที่ผ่านมา +1

    Thanks for the vid! At about 10:30 you say a NN with n neurons in each of L layers expresses ~ n^L linear segments. Could this be a mistake? I think it's more like n^2 * L

    • @algorithmicsimplicity
      @algorithmicsimplicity  27 วันที่ผ่านมา

      The number of different linear segments is definitely at least exponential in the number of layers, e.g. proceedings.neurips.cc/paper_files/paper/2014/file/109d2dd3608f669ca17920c511c2a41e-Paper.pdf

  • @vitorschroederdosanjos6539
    @vitorschroederdosanjos6539 ปีที่แล้ว

    The reason why they work is that they fit the data... brilliant

  • @computerconcepts3352
    @computerconcepts3352 ปีที่แล้ว

    extremely underrated

  • @promethful
    @promethful ปีที่แล้ว

    Is this piecewise linear approximation of a network a feature of using the ReLU activation function? What if we use a sigmoid activation function instead?

  • @theplotproject5911
    @theplotproject5911 ปีที่แล้ว

    this is gonna blow up