Building makemore Part 3: Activations & Gradients, BatchNorm

แชร์
ฝัง
  • เผยแพร่เมื่อ 6 มิ.ย. 2024
  • We dive into some of the internals of MLPs with multiple layers and scrutinize the statistics of the forward pass activations, backward pass gradients, and some of the pitfalls when they are improperly scaled. We also look at the typical diagnostic tools and visualizations you'd want to use to understand the health of your deep network. We learn why training deep neural nets can be fragile and introduce the first modern innovation that made doing so much easier: Batch Normalization. Residual connections and the Adam optimizer remain notable todos for later video.
    Links:
    - makemore on github: github.com/karpathy/makemore
    - jupyter notebook I built in this video: github.com/karpathy/nn-zero-t...
    - collab notebook: colab.research.google.com/dri...
    - my website: karpathy.ai
    - my twitter: / karpathy
    - Discord channel: / discord
    Useful links:
    - "Kaiming init" paper: arxiv.org/abs/1502.01852
    - BatchNorm paper: arxiv.org/abs/1502.03167
    - Bengio et al. 2003 MLP language model paper (pdf): www.jmlr.org/papers/volume3/b...
    - Good paper illustrating some of the problems with batchnorm in practice: arxiv.org/abs/2105.07576
    Exercises:
    - E01: I did not get around to seeing what happens when you initialize all weights and biases to zero. Try this and train the neural net. You might think either that 1) the network trains just fine or 2) the network doesn't train at all, but actually it is 3) the network trains but only partially, and achieves a pretty bad final performance. Inspect the gradients and activations to figure out what is happening and why the network is only partially training, and what part is being trained exactly.
    - E02: BatchNorm, unlike other normalization layers like LayerNorm/GroupNorm etc. has the big advantage that after training, the batchnorm gamma/beta can be "folded into" the weights of the preceeding Linear layers, effectively erasing the need to forward it at test time. Set up a small 3-layer MLP with batchnorms, train the network, then "fold" the batchnorm gamma/beta into the preceeding Linear layer's W,b by creating a new W2, b2 and erasing the batch norm. Verify that this gives the same forward pass during inference. i.e. we see that the batchnorm is there just for stabilizing the training, and can be thrown out after training is done! pretty cool.
    Chapters:
    00:00:00 intro
    00:01:22 starter code
    00:04:19 fixing the initial loss
    00:12:59 fixing the saturated tanh
    00:27:53 calculating the init scale: “Kaiming init”
    00:40:40 batch normalization
    01:03:07 batch normalization: summary
    01:04:50 real example: resnet50 walkthrough
    01:14:10 summary of the lecture
    01:18:35 just kidding: part2: PyTorch-ifying the code
    01:26:51 viz #1: forward pass activations statistics
    01:30:54 viz #2: backward pass gradient statistics
    01:32:07 the fully linear case of no non-linearities
    01:36:15 viz #3: parameter activation and gradient statistics
    01:39:55 viz #4: update:data ratio over time
    01:46:04 bringing back batchnorm, looking at the visualizations
    01:51:34 summary of the lecture for real this time
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 314

  • @mileseverett
    @mileseverett ปีที่แล้ว +318

    Andrej, as a third year PhD student this video series has given me so much more understanding of the systems I take for granted. You're doing incredible work here!

  • @leopetrini
    @leopetrini ปีที่แล้ว +255

    1:30:10 The 5/3 gain in the tanh comes for the average value of tanh^2(x) where x is distributed as a Gaussian, i.e.
    integrate (tanh x)^2*exp(-x^2/2)/sqrt(2*pi) from -inf to inf ~= 0.39
    The square root of this value is how much the tanh squeezes the variance of the incoming variable: 0.39 ** .5 ~= 0.63 ~= 3/5 (hence 5/3 is just an approximation of the exact gain).
    We then multiply by the gain to keep the output variance 1.

    • @peace-wink
      @peace-wink ปีที่แล้ว +2

      Thank you : )

    • @MonkeyKong21
      @MonkeyKong21 ปีที่แล้ว

      i hope they're using the actual value and just writing 5/3 in the docs as slang

    • @Abhishekkumar-qj6hb
      @Abhishekkumar-qj6hb 11 หลายเดือนก่อน

      Hi leopetrini could you check my comment above n answer if possible

    • @Zoronoa01
      @Zoronoa01 9 หลายเดือนก่อน

      Thank you for the insight!

    • @lennixplayzpokemon1239
      @lennixplayzpokemon1239 9 หลายเดือนก่อน

      @leopetrini Can you explain how you calculated the integral?

  • @Erosis
    @Erosis ปีที่แล้ว +146

    This has to be the best hands-on coding tutorial for these small yet super-important deep learning fundamentals online. Absolutely great job!

  • @ragibshahriyear3682
    @ragibshahriyear3682 หลายเดือนก่อน +3

    Thank you Mr. Karpathy. I am in love with your teaching. Given how accomplished and experienced you are, you are still teaching with such grace. I dream about sitting in one of your classes and learning from you, and I hope to meet you one day. May you be blessed with good health. Lots of love and respect.

  • @thodorispaparrigopoulos8542
    @thodorispaparrigopoulos8542 5 หลายเดือนก่อน +12

    I cannot fathom that this video only has 4k likes... He is literally explaining stuff that no one else goes through, because they simply don't know them, but they are crucial!

    • @adamskrodzki6152
      @adamskrodzki6152 4 หลายเดือนก่อน +3

      thanks for reminding to gie a like ;)

  • @vpamula1
    @vpamula1 2 หลายเดือนก่อน +4

    A lot of material packed in this video; I envision understanding the mechanics of these networks to take many years, even with some prior experience.

  • @hungrydeal6154
    @hungrydeal6154 ปีที่แล้ว +19

    To put BatchNorm into perspective, I am going through Geoffrey Hinton's 2012 lecture notes on bag of tricks for mini-batch descent, it was when AlexNet was first published. Hinton was saying there was no one best way for learning method/gradient descent with mini-batches. Well, here it is BatchNorm. Hinton: "Just think of how much better neural nets will work, once we've got this sorted out". We are living in that future :)

    • @amgad_hasan
      @amgad_hasan ปีที่แล้ว

      What did he mean by "Hinton was saying there was no one best way for learning method/gradient descent with mini-batches"?
      Did he mean initializing them?

  • @parent5x
    @parent5x ปีที่แล้ว +57

    Andrej you have a wonderful gift for educating others. I’m a self learner of NNs and it’s a painful process but you seriously help ease that suffering… much appreciated! Ty

    • @Umar-Ateeq
      @Umar-Ateeq 3 หลายเดือนก่อน

      Great, I am also going through this same painful process. Can you suggest something that can help ease this pain?

    • @hetanshpatel8521
      @hetanshpatel8521 2 หลายเดือนก่อน

      If you want to learn the theory. Try Soheil feizi . He is professor at UMD. Amazing teacher. And the course content is just top notch.

  • @nkhuang1390
    @nkhuang1390 ปีที่แล้ว +38

    Every time another Andrej Karpathy video drops, its like Christmas for me. This video series has helped me to develop genuine Intuition about how neural networks work. I hope you continue to put these out, its making a massive impact on making these "black box" technologies accessible to the anyone and everyone!

  • @philipwoods6720
    @philipwoods6720 ปีที่แล้ว +1

    These videos have been incredible. Thank you so much for taking the time to make them, and I look forward to all the future ones!!!

  • @hlinc2
    @hlinc2 ปีที่แล้ว +13

    This series is definitely the clearest presentation of ML concepts I have seen, but the teaching generalizes so well. I’ll be using this step-by-step intuition-building approach for all complicated things from now on. Nice that the approach also gives a confidence that I can understand stuff with enough time. Truly appreciate your doing this.

  • @siddharth-gandhi
    @siddharth-gandhi ปีที่แล้ว

    It's so nice to have you back on TH-cam! Thanks for teaching me Rubik's Cube back in the day and thanks for teaching us deep learning now!

  • @eitanporat9892
    @eitanporat9892 ปีที่แล้ว

    Thank you for delving deep into the nitty-gritty details. I appreciate the exercises!

  • @chanep1
    @chanep1 11 หลายเดือนก่อน +5

    I like that not even the smallest detail is pulled out of thin air, everything is completely explained

  • @yonastesh7830
    @yonastesh7830 ปีที่แล้ว

    Andrej, thank you so much for your tutorials here. You've no idea how much your videos helped me. Please keep doing more videos.

  • @MLwithAmir
    @MLwithAmir ปีที่แล้ว

    These videos are so useful, Andrej thank you so much. The parts when you wrap up the lecture, and then change your mind to add more content are my fav. 😄

  • @project-hq
    @project-hq ปีที่แล้ว +5

    This whole series is absolutely amazing. Thank you very much Andrej! Being able to code along with you, improving a system as my own knowledge improves is fantastic

  • @1997benjaminvh
    @1997benjaminvh ปีที่แล้ว +2

    This is awsome!!
    Thank you so much for taking the time to do this Andrej.
    Please keep this going, I am learning so much from you.

  • @rafaelsouza4575
    @rafaelsouza4575 ปีที่แล้ว

    oh man, this is top-notch content! Not sure if there are other available contents on these topics with so much clearness about its inner gears with reproducible examples. Thank you so much! You're a DL hero.

  • @CuriousAnonDev
    @CuriousAnonDev ปีที่แล้ว +7

    I really enjoy learning and listening to people like Andrej who love what they do and aren't doing it just for money. Shared the channel with my friends ☺️
    Keep up the great work Andrej!

  • @1knmd
    @1knmd ปีที่แล้ว +5

    The quality of these lectures is out of the charts. This channel is a gold mine!. Andrej, thank you, thank you very much for these lectures.

  • @mrmiyagi26
    @mrmiyagi26 ปีที่แล้ว +1

    Thank you for the deep dive into batch normalization and diagnostic approaches! Really useful to see it explained from the paper with the code.

  • @8eck
    @8eck ปีที่แล้ว

    Thank you Andrej, i have finally found time to go through your lectures. I have learn and understood a lot more than before, thank you.

  • @IchibanKanobee
    @IchibanKanobee 6 หลายเดือนก่อน +4

    These video series are exceptional. The clarity and practicality of the tutorials are unmatched by any other course. Thank you for the invaluable help for all practitioners of deep learning!

  • @enchanted_swiftie
    @enchanted_swiftie ปีที่แล้ว +2

    So many small things, scrutinzers and how easily he has pointed them out one by one, step by step, problem to solution is just amazing. Love your work Andrej. You are amazing.

  • @sanjaybhatikar
    @sanjaybhatikar 4 หลายเดือนก่อน +1

    I keep coming back to these videos again and again. Andrej is legend!

  • @khuongtranhoang9197
    @khuongtranhoang9197 ปีที่แล้ว +12

    Turns out this should be the way to teach machine learning: a combination of theory reference and actual coding. Thank you Andrej!

    • @american-professor
      @american-professor 5 หลายเดือนก่อน +1

      Exactly. My DL course back in 2015 had a ton of obscure math and no coding. I had no idea how to train NNs after that course. I'm rediscovering and learning a ton of stuff from this video alone, way more than from my course.

  • @ptrckqnln
    @ptrckqnln ปีที่แล้ว +5

    This is filling in a lot of gaps for me, thank you! I especially appreciate your insights about reading a network's behavior during training; they gave me a few epiphanies.

  • @the-hanhpham8950
    @the-hanhpham8950 ปีที่แล้ว

    Thank you so much for the time and effort put into the videos of this series. Appreciate it very much.

  • @rmajdodin
    @rmajdodin ปีที่แล้ว +8

    1:33:30 The reason the gradients of the higer layers have a bigger deviation (in the absence of tanh layer), is that you can write the whole NN as a sum of products, and it is easy to see that each weight of Layer 0 appears in 1 term, of layer 1 in 30 terms, of layer 2 in 3000 terms and so on. Therefore a small change of a weight in higer layers changes the output more.

  • @lkothari
    @lkothari ปีที่แล้ว +6

    You're a great teacher Andrej. This has been by far the most interesting ML course/training I have come across. Keep up the good work!

  • @minhajulhoque2113
    @minhajulhoque2113 ปีที่แล้ว +2

    The batch normalization explanation was amazing! Thank you for your hard work and concise and clear explanations.

  • @divelix2666
    @divelix2666 ปีที่แล้ว

    Can't even explain how impactful this video for my understanding of nns... Thank you so much!

  • @kimiochang
    @kimiochang ปีที่แล้ว +1

    I completed this one today, and I just want to show Andrej my gratitude. Looking forward to the next one. Thank you very much, Andrej. Thank you!

  • @fhools
    @fhools ปีที่แล้ว

    These are some of the best lectures i've ever seen. I love the explaination in the first part about tanh saturation. Really trying to get the viewer to develop intuition.

  • @moalimus
    @moalimus ปีที่แล้ว +4

    Thanks very much for this, please keep them coming, you are changing lives.

  • @PureArtMV
    @PureArtMV 6 หลายเดือนก่อน +2

    What a gem. Thanks for the lectures, Andrej

  • @cktse_jp
    @cktse_jp 6 หลายเดือนก่อน

    Your choice of visualizations as diagnostic tool is super insightful. Thanks so much for sharing your experience.

  • @scottsun345
    @scottsun345 ปีที่แล้ว +2

    Thank you, Andrej, amazing content! As a beginner in deep learning and even in programing, I find most materials out there are either pure theories or pure API applications, and they rarely come this deep and detailed. Your videos cover not just the knowledge of this field, but also so many empirical insights that came from working on actual projects. Just fantastic! Please make more of these lessons!

  • @ernietam6202
    @ernietam6202 10 หลายเดือนก่อน

    Really enjoy your classes. I learnt a lot of tips for training and feel comfortable now. Will continue finishing this series.

  • @swarajnanda5990
    @swarajnanda5990 ปีที่แล้ว +6

    My mind is totally blown at the detail I am getting. Feel like this is an ivy league level course, with the content so meticulously covered.

  • @AbhishekSingh-ee2bo
    @AbhishekSingh-ee2bo ปีที่แล้ว

    Thank you.. I am a self learner and your series has been a milestone for me.

  • @american-professor
    @american-professor 5 หลายเดือนก่อน +2

    You are such a gold mine of knowledge, it's insane. I wish you were my DL professor during my PhD.

  • @theusualcouple
    @theusualcouple 4 หลายเดือนก่อน

    Thank you @Andrej for bringing this series. You are a great teacher, the way you have simplified such seemingly complex topics is valuable to all the students like me. 🙏

  • @AboutOliver
    @AboutOliver 6 หลายเดือนก่อน +1

    You're an incredible teacher. You really have a gift. Thanks for these lectures!!!!

  • @tsunamidestructor
    @tsunamidestructor ปีที่แล้ว +37

    Thank you so much for this, Andrej! Your series single-handedly revitalized my love for deep learning! Please keep this series going :)

    • @eustin
      @eustin ปีที่แล้ว +6

      it's done the same for me. i'm excitedly going through each video. it feels good to be back!

  • @AlexTang99
    @AlexTang99 หลายเดือนก่อน

    This is the most amazing video on neural network mathematics knowledge I've ever seen; thank you very much, Andrej!

  • @styssine
    @styssine 4 หลายเดือนก่อน

    This is a great lecture, especially the second half building intuition about diagnostics. Amazing stuff.

  • @greatwall2003
    @greatwall2003 2 หลายเดือนก่อน

    Great material on the intricacies of how neural networks work. Until now, I hadn't paid attention to the distribution of values entering the activation layers, and as it turns out, this is an extremely important issue. Thanks!

  • @lucianovidal8721
    @lucianovidal8721 4 หลายเดือนก่อน

    The amount of useful information in this video is impressive. Thanks for such good content.

  • @juleswombat5309
    @juleswombat5309 ปีที่แล้ว

    These lectures are an awesome gift to us mortals. Such a clear explanation on the principles of neural networks.
    I only need to be able to afford access to massive TPU cloud compute and huge corpus, but at least I can now gain insight and understand the principles of these technologies.

  • @vil9386
    @vil9386 3 หลายเดือนก่อน

    I don't think any other book or blog or videos cover what Andrej has covered. Awesome insights. THANK YOU Andrej.

  • @dengzhonghan5125
    @dengzhonghan5125 11 หลายเดือนก่อน

    Your lecture is so amazing. Please keep updating, thanks for sharing and educating.

  • @dang242
    @dang242 10 หลายเดือนก่อน

    Thank you for explaining everything in such detail. It makes everything much more understandable

  • @Lauren-qj6ti
    @Lauren-qj6ti 6 หลายเดือนก่อน +1

    What an inspiration. Like others have alluded in the comments, I find Andrej's teaching so remarkably therapeutic.

  • @houbenbub
    @houbenbub ปีที่แล้ว

    following your lectures is a delight! Thanks for taking the time to make them :)

  • @gonzaloalbornoz7279
    @gonzaloalbornoz7279 ปีที่แล้ว +1

    You explain very simple things that I know and always give me a new perspective on it!. Your way of transmitting knowledge is incredible.

    • @parent5x
      @parent5x ปีที่แล้ว

      Exactly!

  • @vitorzucher435
    @vitorzucher435 5 หลายเดือนก่อน +1

    You have some much depth in your knowledge,
    Yet, you manage to explain complex concepts with such and incredible didactics.
    This is someone who truly understands his field. Andrej, thank you so much and even more for the humility in how you do it so.
    You explain how libraries and languages like python and pytorch work and dive into the WHYs on why things are happening.
    This is absolute priceless.

  • @afbf6522
    @afbf6522 ปีที่แล้ว +1

    Amazing explanation about the nitty gritty details of Deep Learning, the "dark arts" of the trade.

  • @peterhojnos6705
    @peterhojnos6705 ปีที่แล้ว +1

    Zdravím, díky, že si sa dal na tvorbu videí pre širšiu verejnosť!

  • @jeanchristophe15
    @jeanchristophe15 3 หลายเดือนก่อน

    Thank you so much for your clear and thoughtful explanation Andrej!

  • @yazanmaarouf48
    @yazanmaarouf48 5 หลายเดือนก่อน

    I have shot myself in the foot multiple times before these videos. Training big models are much more difficult than I initially anticipated. Time wasted sadly. But I have more confidence in myself thanks to these video. Thanks Andrej

  • @mPajuhaan
    @mPajuhaan ปีที่แล้ว

    Very interesting how you've described the concept of pre-tuning NN.

  • @user-og6xo6iw9h
    @user-og6xo6iw9h ปีที่แล้ว +1

    love the short snippets about how to implement these tools in production.

  • @ShouryanNikam
    @ShouryanNikam 5 หลายเดือนก่อน

    Thanks for making this! It's such an honor to be learning from you!

  • @kemalware4912
    @kemalware4912 ปีที่แล้ว

    My life is much better now because of your videos.

  • @hole62
    @hole62 ปีที่แล้ว

    As a former HTML nerd, I am forever indebted to the amount of precise calculations and their limits, as is expressed in this video…

  • @JuliusSmith
    @JuliusSmith 4 หลายเดือนก่อน

    Thanks for the fantastic download! You have changed my learning_rate in this area from 0.1 to something >1!

  • @punto-y-coma7890
    @punto-y-coma7890 2 หลายเดือนก่อน

    Awesome explanation Andrej! Than you very much for sharing your knowledge.

  • @adamskrodzki6152
    @adamskrodzki6152 4 หลายเดือนก่อน

    Amazing, knowledge that is hell hard to find in other videos and also, you have AMAZING skill in clearly explaining complex stuff.

  • @alexandermedina4950
    @alexandermedina4950 ปีที่แล้ว

    Amazing video tutorial, thank you professor Andrej.

  • @ukaszlipniak2365
    @ukaszlipniak2365 ปีที่แล้ว

    I'm learning a lot. Thanks for these lectures!

  • @8eck
    @8eck ปีที่แล้ว

    Wow, didn't knew that there is such content and from Karpathy himself. Thank you!

  • @tildarusso
    @tildarusso ปีที่แล้ว

    Nice to be lectured again after watching the Stanford CS231 multiple times!

  • @sarai3538
    @sarai3538 ปีที่แล้ว

    Thanks for the great explanation of activation,gradient as well as histogram.

  • @riponsaha8320
    @riponsaha8320 ปีที่แล้ว

    This tutorial is so engaging and insightful.

  • @TheMato1112
    @TheMato1112 ปีที่แล้ว

    Dakujem ti Andrej, je to vazne na inej urovni :)

  • @sauloviedo2677
    @sauloviedo2677 ปีที่แล้ว

    Finallyyy!!! I was nervious waiting for the new video! Thank you Andrej!!!

  • @ramboli4118
    @ramboli4118 ปีที่แล้ว

    I finally understand when Statistics come into play in machine learning. It's when you introduce the randomized weights(matrices)!

  • @EmileAI
    @EmileAI ปีที่แล้ว +1

    You are an Angel sir
    The land of AI is blessed and the harvest is plenty. New AI warriors will rise from this
    Thanks

  • @robertcowher
    @robertcowher 2 หลายเดือนก่อน

    If you decide to make more content, a video series like this with a focus on self-driving or RL for robotics would be awesome. Not that you don't have enough going on, but that's my wish-list item :) Thanks for putting an incredibly in-depth resource out here free on the internet.

  • @mcnica89
    @mcnica89 ปีที่แล้ว +10

    I believe you can calculate the gain by doing Gain = 1/sqrt(E[ f(Z)^2 ]) where Z is a standard Gaussian (so that Gain*f(Z) will have unit variance when Z is a standard Gaussian). If you do this for tanh you ~=1.592 which I guess is close to 5/3?

  • @josephngari1659
    @josephngari1659 ปีที่แล้ว

    I absolutely LOVE this series!!!!

  • @aureliencobb199
    @aureliencobb199 ปีที่แล้ว

    Thank you for the lectures. I really appreciate the effort.

  • @ianfukushima1316
    @ianfukushima1316 ปีที่แล้ว

    Thank you! You made me happy with these lectures :)

  • @smithwill9952
    @smithwill9952 ปีที่แล้ว

    Better than study in University. Keep going A.K. Share your video for sure.

  • @ayogheswaran9270
    @ayogheswaran9270 ปีที่แล้ว

    Thanks a lot Andrej!! Thank you for all your efforts❤.

  • @casvanmarcel
    @casvanmarcel ปีที่แล้ว

    I love these videos! Thanks for making them!

  • @billykotsos4642
    @billykotsos4642 ปีที่แล้ว

    Dives straight in and kills the presentation…. Another banger…. Can make old papers fun to go through….

  • @venkateshmunagala205
    @venkateshmunagala205 ปีที่แล้ว

    Thanks Andrej :) . It gave me a better understanding on the concept

  • @mikehoops
    @mikehoops ปีที่แล้ว

    Thank you Andrej, fantastic tutorial

  • @eduardtsuranov712
    @eduardtsuranov712 ปีที่แล้ว

    incredibly cool videos, huge thanks, all the best!

  • @starshipx1282
    @starshipx1282 ปีที่แล้ว

    Love you Andrej❤❤. I hope you keep going with this

  • @fetulhakabdurahman4720
    @fetulhakabdurahman4720 ปีที่แล้ว

    Your lectures are the best

  • @chonmon
    @chonmon ปีที่แล้ว

    lol although I don't really understand what's going on here, but I'm just liking and commenting to support Andrej! Keep it up Andrej!

  • @GregorUrbanekmenticorpAG
    @GregorUrbanekmenticorpAG ปีที่แล้ว

    Andrej thanks very much for your wonderful, educational videos. they are here to stay. A suggestion: wouldn't it be nice to prop up the resolution and contrast of the screen capture to make it more easy on the spectators eyes?

  • @4mb127
    @4mb127 ปีที่แล้ว

    Thanks so much for this series.

  • @karolduracz4008
    @karolduracz4008 ปีที่แล้ว

    This is very helpful and interesting. Thank you.

  • @DrKnowitallKnows
    @DrKnowitallKnows ปีที่แล้ว

    Amazing work. Thank you for explaining things so Clearly!

  • @aayushsmarten
    @aayushsmarten ปีที่แล้ว +1

    Andrej's transformation between 15:14 and 15:16 was pretty quick 😉