Building makemore Part 5: Building a WaveNet

แชร์
ฝัง
  • เผยแพร่เมื่อ 17 ธ.ค. 2024

ความคิดเห็น •

  • @antonclaessonn
    @antonclaessonn 2 ปีที่แล้ว +94

    This series is the most interesting resource for DL I've come across, being a junior ML engineer myself. To be able to watch such a knowledgeable domain expert as Andrej explaining everything in the most understandable ways is a real privilege. A million thanks for you time and effort, looking forward to the next one and hopefully many more.

  • @khisrowhashimi
    @khisrowhashimi 2 ปีที่แล้ว +224

    I love how we are all so stressed and worried that Andrej might grow apathetic to his TH-cam channel, so everyone wants to be extra supportive 😆 Really shows how awesome of a communicator he is.

    • @TL-fe9si
      @TL-fe9si ปีที่แล้ว +2

      I'm literally thinking about it when I saw this comment

    • @jordankuzmanovik5297
      @jordankuzmanovik5297 ปีที่แล้ว +5

      Unfortunately he did it :(

    • @isaac10231
      @isaac10231 ปีที่แล้ว

      ​@@jordankuzmanovik5297Hopefully he comes back.

    • @Тима-щ2ю
      @Тима-щ2ю หลายเดือนก่อน

      haha, so true!

  • @GlennGasner
    @GlennGasner ปีที่แล้ว +24

    I really, really appreciate you putting in the work to create these lectures. I hope you can really feel the weight of the nearly hundred thousand humans who pushed through 12 hours of lectures on this because you've made it accessible. And that's just through now. These videos are such an incredible gift. Half of the views are me because I needed to watch each so many times in order to understand what's happening, because I started from so little. Also, it's super weird how different you are from other TH-camrs and yet how likable you become as a human during this series. You are doing this right, and I appreciate it.

  • @crayc3
    @crayc3 2 ปีที่แล้ว +37

    Notification for a new andrej video guide feels like a new season of game of thrones just dropped at this point.

  • @nervoushero1391
    @nervoushero1391 2 ปีที่แล้ว +114

    As a independent deep learning undergrad student ur videos helps me a lot. Thank u andrej Never stop this series.

    • @anrilombard1121
      @anrilombard1121 2 ปีที่แล้ว +8

      We're on the same road!

    • @tanguyrenaudie1261
      @tanguyrenaudie1261 ปีที่แล้ว +2

      Love the series as well ! Coding through all of it. Would love to get together with people to replicate deep learning papers, like Andrej does here, to learn faster and not by myself.

    • @raghavravishankar6262
      @raghavravishankar6262 ปีที่แล้ว +1

      @@tanguyrenaudie1261 I'm in the same boat as well do you have a discord or something where we can talk further?

    • @raghavravishankar6262
      @raghavravishankar6262 ปีที่แล้ว

      @Anri Lombard @ Nervous Hero

    • @Katatonya
      @Katatonya 8 หลายเดือนก่อน

      @@raghavravishankar6262Andrej does have a server, we could meet there and then start our own. My handle is vady. (with a dot) if anyone wants to add me, or ping me in Andrej's server.

  • @timelapseguys4042
    @timelapseguys4042 2 ปีที่แล้ว +21

    Andrej, thanks a lot for the video! Please do not stop continuing the series. It's an honor to learn from you.

  • @RishikeshS-nv7ol
    @RishikeshS-nv7ol 7 หลายเดือนก่อน +12

    Please don't stop making these videos, these are gold !

  • @takeiteasydragon
    @takeiteasydragon 12 วันที่ผ่านมา

    He even switched equipment after realizing the recording quality wasn’t great. How can someone be so talented and still so humble and thoughtful? It’s just incredible!

  • @maestbobo
    @maestbobo ปีที่แล้ว +10

    Best resource by far for this content. Please keep making more of these; I feel I'm learning a huge amount from each video.

  • @hintzod
    @hintzod 2 ปีที่แล้ว +9

    Thank you so much for these videos. I really enjoy these deep dives, things make so much more sense when you're hand coding all the functions and running through examples. It's less of a black box and more intuitive. I hope this comment will encourage you to keep this going!

  • @Jmelly11
    @Jmelly11 2 หลายเดือนก่อน +1

    I rarely comment on videos, but thank you so much for this series, Andrej. I love learning, and learn many things that interest me. The reason I say that is that I have experienced a lot tutors/educators over time. And for what it's worth, I'd like you to know you're truly gifted when it comes to your understand of AI development and communicating that understanding.

  • @rajeshparekh
    @rajeshparekh 11 หลายเดือนก่อน +1

    Thank you so much for creating this video lecture series. Your passion for this topic comes through so vividly in your lectures. I learned so much from every lecture and especially appreciated how the lectures started from the foundational concepts and built up to the state-of-the art techniques. Thank you!

  • @mipmap256
    @mipmap256 ปีที่แล้ว +7

    Can't wait for part 6! So clear and I can follow step by step. Thanks so much

  • @vivekpadman5248
    @vivekpadman5248 2 ปีที่แล้ว +6

    Absolutely love this series Andrej sir... It not only teaches me stuff but gives me confidence to work even harder to share whatever I know already.. 🧡

  • @Zaphod42Beeblebrox
    @Zaphod42Beeblebrox 2 ปีที่แล้ว +57

    I experimented a bit with the MLP with 1 hidden layer and managed to scale it up to your fancy hierarchical model. :)
    Here is what i got:
    MLP(105k parameters):
    block_size = 10
    emb_dim = 18
    n_hidden = 500
    lr = 0.1 # used the same learning rate decay as in the video
    epochs = 200000
    mini_batch = 32
    lambd = 1 ### added L2 regularization
    seed is 42
    Training error: 1.7801
    Dev error: 1.9884
    Test error: 1.9863 (I checked this only becouse I was worried that somehow I overfitted the dev set)
    Some examples generated from the model that I kinda liked:
    Angelise
    Fantumrise
    Bowin
    Xian
    Jaydan

    • @oklm2109
      @oklm2109 ปีที่แล้ว

      What's the formula to calculate the number of parameters of an MLP model?

    • @amgad_hasan
      @amgad_hasan ปีที่แล้ว +2

      @@oklm2109 You just add the trainable parameters of every layer.
      If the model contains only Fully connected layers (aka linear in pytorch or dense in tf), the number of parameters for each layer is:
      n_weights = n_in*n_hidden_units
      n_biases = n_hidden units
      n_params = n_weights + n_biases = (1+n_input)*(n_hidden_units)
      n_in: number of inputs (think of it as the number of outputs(or hidden units) from the last layer.
      This formula is valid for Linear layers, other types of layers may have different formula.

    • @glebzarin2619
      @glebzarin2619 ปีที่แล้ว +1

      I'd say that it is slightly not fair not to compare models with different block sizes. Because it not only influences the number of parameters but also the amount of information given as input.

    • @vhiremath4
      @vhiremath4 6 หลายเดือนก่อน

      @Zaphod42Breeblebrox out of curiosity, what do your losses and examples look like without the L2 regularization?
      Also, love the username :P

    • @vhiremath4
      @vhiremath4 6 หลายเดือนก่อน +2

      ^ Update - just tried this architecture locally and got the following without L2 regularization:
      train 1.7468901872634888
      val 1.9970593452453613
      How were you able to validate that there was an overfitting to the training dataset?
      Some examples:
      arkan.
      calani.
      ellizee.
      coralym.
      atrajnie.
      ndity.
      dina.
      jenelle.
      lennec.
      laleah.
      thali.
      nell.
      drequon.
      grayson.
      kayton.
      sypa.
      caila.
      jaycee.
      kendique.
      javion.

  • @S0meM0thersson
    @S0meM0thersson 3 หลายเดือนก่อน

    Your work ethic and happy personality really move me. Respect to you, Andrej, you are great.🖖

  • @stracci_5698
    @stracci_5698 ปีที่แล้ว +1

    This is truly the best dl content out there. Most courses just focus on the theory but lack deep understanding.

  • @brittaruiters6309
    @brittaruiters6309 2 ปีที่แล้ว +5

    I love this series so much :) it has profoundly deepened my understanding of neural networks and especially backpropagation. Thank you

  • @NarendraBME
    @NarendraBME 11 หลายเดือนก่อน

    So far THE BEST lecture series I came across on TH-cam. Along side learning the neural networks in this series, I have learned the PyTorch more than learning it by waching a PyTorch video series of 26 hrs from a youtuber.

  • @aurelienmontmejat1077
    @aurelienmontmejat1077 2 ปีที่แล้ว +2

    This is the best deep learning course I've followed! Even better than the one on Coursera. Thanks!

  • @1knmd
    @1knmd ปีที่แล้ว +2

    Everytime a new video is out is like christmas for me!, please don't stop doing this, best ML content out there.

  • @stanislawcronberg3271
    @stanislawcronberg3271 2 ปีที่แล้ว +1

    My favorite way to start a Monday morning is to wake up to a new lecture in Andrej's masterclass :)

  • @willr0073
    @willr0073 4 หลายเดือนก่อน

    All the makemore lessons have been awesome Andrej! Huge thanks for helping me understand better how this world works.

  • @aanchalagarwal6886
    @aanchalagarwal6886 ปีที่แล้ว +2

    Thank you Andrej for creating this series. It has been very helpful. I just hope you get the time to continue with it.

  • @gurujicoder
    @gurujicoder 2 ปีที่แล้ว

    My best way to learn is to learn from one of the most experienced person in the field. Thanks for everything Andrej

  • @sakthigeek2458
    @sakthigeek2458 9 หลายเดือนก่อน

    Learned a lot of practical tips and theoretical knowledge of why we do what we do and also the history of how Deep Learning evolved. Thanks a lot for this series. Requesting you to continue the series.

  • @ShinShanIV
    @ShinShanIV 2 ปีที่แล้ว +1

    Thank you so much Andrej for the series, it helps me a lot. You are one of the reasons I was able to get into ML and build a career there. I admire your teaching skills!
    I didn't get why the sequence dim has to be part of the batch dimension, and I didn't hear Andrej talk about it explicitly, so here is my reasoning:
    The sequence dimension is an additional batch dimension because the output before batch norm is created by a linear layer with (32, 4, 20) @ (20, 68) + (68) which performs the matrix multiplication only with the last dimension (.., .., 20) and in parallel on the first two. So, the matrix multiplication is performed 32 * 4 times with (20) @ (20, 68). Thus, it's the same as having a (128, 20) @ (20, 68) calculation, where (32 * 4) = 128 is the batch dimension. So, the sequence dimension is treated effectively as if it was a "batch dimensions" in the linear layer and must be treated that way in batch norm too.
    (would be great if someone could confirm)

  • @eustin
    @eustin 2 ปีที่แล้ว +3

    Yes! I've been telling everyone about these videos. I've been checking whether you posted the next video everyday. Thank you.

  • @Leon-yp9yw
    @Leon-yp9yw 2 ปีที่แล้ว +3

    I was worried I was going to have to wait a couple of months for the next video as I finished part 4 just last week. Can't wait to get into this one, thanks a lot for this series Andrej

  • @timowidyanvolta
    @timowidyanvolta ปีที่แล้ว +3

    Please continue, I really like this series. You are an awesome teacher!

  • @kshitijbanerjee6927
    @kshitijbanerjee6927 ปีที่แล้ว +37

    Hey Andrej! I hope you continue and give us the RNN, GRU & Transformer lectures as well! The chatGPT one is great, but I feel like we missed the story in the middle, and jumped the story because of ChatGPT

    • @SupeHero00
      @SupeHero00 ปีที่แล้ว +1

      The ChatGPT lecture is the Transformer lecture.. And regarding RNNs, I don't see why would anyone still use it...

    • @kshitijbanerjee6927
      @kshitijbanerjee6927 ปีที่แล้ว +6

      transformers yes . but it’s not like anyone will build bigrams either, it’s about learning the concepts like BPTT etc from roots

    • @SupeHero00
      @SupeHero00 ปีที่แล้ว +1

      @kshitijbanerjee6927 Bigrams and MLPs help you understand Transformers (which is the SOA).. Anyway IMO it would be a waste of time creating a lecture on RNNs, but if the majority want it, then maybe he should do it.. I don't care

    • @kshitijbanerjee6927
      @kshitijbanerjee6927 ปีที่แล้ว +5

      Fully disagree that it’s not useful.
      I think the concepts of how they came up unrolling and BPTT, the gates used to solve long term memory problems are invaluable to appreciate and understand why transformers are such a big deal.

    • @attilakun7850
      @attilakun7850 ปีที่แล้ว +1

      @@SupeHero00 RNNs are coming back due to SSMs like Mamba.

  • @brianwhite9137
    @brianwhite9137 2 ปีที่แล้ว

    Very grateful for these. An early endearing moment was in the Spelled-Out Intro when you took a moment to find the missing parentheses for 'print.'

  • @phrasedparasail9685
    @phrasedparasail9685 หลายเดือนก่อน

    These videos are amazing, please never stop making this type of content!

  • @cktse_jp
    @cktse_jp 10 หลายเดือนก่อน

    Just wanna say thank you for sharing your experience -- love this from-scratch series starting from first principles!

  • @AndrewOrtman
    @AndrewOrtman ปีที่แล้ว +3

    When I did the mean() trick at ~8:50 I left out an audible gasp! That was such a neat trick, going to use that one in the future

  • @thanikhurshid7403
    @thanikhurshid7403 2 ปีที่แล้ว +2

    Andrej you are the absolute greatest. Keep making your videos. Anxiously waiting to implement Transformers with you

  • @bensphysique6633
    @bensphysique6633 4 หลายเดือนก่อน

    This guy is literally my definition of wholesomeness. Again, Thank you, Andrej!

  • @wolpumba4099
    @wolpumba4099 7 หลายเดือนก่อน

    *Abstract*
    This video continues the "makemore" series, focusing on improving the character-level language model by transitioning from a simple multi-layer perceptron (MLP) to a deeper, tree-like architecture inspired by WaveNet. The video delves into the implementation details, discussing PyTorch modules, containers, and debugging challenges encountered along the way. A key focus is understanding how to progressively fuse information from input characters to predict the next character in a sequence. While the video doesn't implement the exact WaveNet architecture with dilated causal convolutions, it lays the groundwork for future explorations in that direction. Additionally, the video provides insights into the typical development process of building deep neural networks, including reading documentation, managing tensor shapes, and using tools like Jupyter notebooks and VS Code.
    *Summary*
    *Starter Code Walkthrough (**1:43**)*
    - The starting point is similar to Part 3, with minor modifications.
    - Data generation code remains unchanged, providing examples of three characters to predict the fourth.
    - Layer modules like Linear, BatchNorm1D, and Tanh are reviewed.
    - The video emphasizes the importance of setting BatchNorm layers to training=False during evaluation.
    - Loss function visualization is improved by averaging values.
    *PyTorchifying Our Code: Layers, Containers, Torch.nn, Fun Bugs (**9:19**)*
    - Embedding table and view operations are encapsulated into custom Embedding and Flatten modules.
    - A Sequential container is created to organize layers, similar to torch.nn.Sequential.
    - The forward pass is simplified using these new modules and container.
    - A bug related to BatchNorm in training mode with single-example batches is identified and fixed.
    *Overview: WaveNet (**17:12**)*
    - The limitations of the current MLP architecture are discussed, particularly the issue of squashing information too quickly.
    - The video introduces the WaveNet architecture, which progressively fuses information in a tree-like structure.
    - The concept of dilated causal convolutions is briefly mentioned as an implementation detail for efficiency.
    *Implementing WaveNet (**19:35**)*
    - The dataset block size is increased to 8 to provide more context for predictions.
    - The limitations of directly scaling up the context length in the MLP are highlighted.
    - A hierarchical model is implemented using FlattenConsecutive layers to group and process characters in pairs.
    - The shapes of tensors at each layer are inspected to ensure the network functions as intended.
    - A bug in the BatchNorm1D implementation is identified and fixed to correctly handle multi-dimensional inputs.
    *Re-training the WaveNet with Bug Fix (**45:25**)*
    - The network is retrained with the BatchNorm1D bug fix, resulting in a slight performance improvement.
    - The video notes that PyTorch's BatchNorm1D has a different API and behavior compared to the custom implementation.
    *Scaling up Our WaveNet (**46:07**)*
    - The number of embedding and hidden units are increased, leading to a model with 76,000 parameters.
    - Despite longer training times, the validation performance improves to 1.993.
    - The need for an experimental harness to efficiently conduct hyperparameter searches is emphasized.
    *Experimental Harness (**46:59**)*
    - The lack of a proper experimental setup is acknowledged as a limitation of the current approach.
    - Potential future topics are discussed, including:
    - Implementing dilated causal convolutions
    - Exploring residual and skip connections
    - Setting up an evaluation harness
    - Covering recurrent neural networks and transformers
    *Improve on My Loss! How Far Can We Improve a WaveNet on This Data? (**55:27**)*
    - The video concludes with a challenge to the viewers to further improve the WaveNet model's performance.
    - Suggestions for exploration include:
    - Trying different channel allocations
    - Experimenting with embedding dimensions
    - Comparing the hierarchical network to a large MLP
    - Implementing layers from the WaveNet paper
    - Tuning initialization and optimization parameters
    i summarized the transcript with gemini 1.5 pro

  • @panagiotistseles1118
    @panagiotistseles1118 ปีที่แล้ว

    Totally amazed by the amount of good work you put in. You've helped a lot of people Andrej. Keep up the good work

  • @WarrenLacefield
    @WarrenLacefield 2 ปีที่แล้ว +1

    Enjoying these video so much. To refresh most of what I've forgotten about Python and to begin playing with pytorch. Last I did this stuff myself was with C# and CNTK. Now going back to rebuild and rerun old models and data (much faster even & "better" results). Thank you.

  • @art4eigen93
    @art4eigen93 ปีที่แล้ว

    Please continue this series Sir Andrje. You are the savior!

  • @meisherenow
    @meisherenow ปีที่แล้ว

    How cool is it that anyone with an internet connection has access to such a great teacher? (answer: very)

  • @ephemer
    @ephemer 2 ปีที่แล้ว

    Thanks so much for this series, I feel like this is the most important skill I might ever learn and it’s never been more accessible than in your lectures. Thank you!

  • @uncoded0
    @uncoded0 2 ปีที่แล้ว +2

    Thanks again Andrej! Love these videos! Dream come true to watch and learn these!
    Thanks for all you do to help people! You're helpfulness ripples throughout the world!
    Thanks again! lol

  • @newbie6036
    @newbie6036 10 วันที่ผ่านมา +1

    I just completed parts 1 to 5 on neural networks, and I can’t express how much I’ve learned. Huge thanks! Would you consider creating a playlist on Reinforcement Learning as well?

  • @milankordic
    @milankordic 2 ปีที่แล้ว +3

    Was looking forward to this one. Thanks, Andrej!

  • @kimiochang
    @kimiochang 2 ปีที่แล้ว

    Finally Completed this one. As always thank you Andrej for your generosity! Next I will practice through all five parts again and learn how to accelerate the training process by using GPUs.

  • @ERRORfred2458
    @ERRORfred2458 ปีที่แล้ว

    Andrej, thanks for all you do for us. You're the best.

  • @nikitaandriievskyi3448
    @nikitaandriievskyi3448 2 ปีที่แล้ว

    I just found your youtube channel, and this is just amazing, please do not stop doing these videos, they are incredible

  • @timandersen8030
    @timandersen8030 ปีที่แล้ว +1

    Thank you, Andrej! Looking forward to the rest of the series!

  • @AlienLogic775
    @AlienLogic775 ปีที่แล้ว +2

    Thanks so much Andrej! Hope to see a Part 6

  • @ThemeParkTeslaCamping360
    @ThemeParkTeslaCamping360 2 ปีที่แล้ว

    Incredible video this helps a lot. Thank you for videos, especially I loved your Stanford videos regarding machine learning from scratch and that's how you do it without any libraries like tensorflow and pytorch. Keep going and thank you for helping hungry learners like me!!! Cheers 🥂

  • @enchanted_swiftie
    @enchanted_swiftie ปีที่แล้ว

    The sentence that Anderej said at 49:26 made me realize something, something very deep. 🔥

  • @yanazarov
    @yanazarov 2 ปีที่แล้ว +1

    Absolutely awesome stuff Andrej. Thank you for doing this.

  • @sunderrajan6172
    @sunderrajan6172 2 ปีที่แล้ว

    Beautifully explained as always - thanks. It shows how much passion you have to come up with these awesome videos. We all blessed!

  • @flwi
    @flwi 2 ปีที่แล้ว +1

    Great series! I really enjoy the progress and good explanations.

  • @thehazarika
    @thehazarika ปีที่แล้ว +1

    This is philanthropy! I love you man!

  • @ayogheswaran9270
    @ayogheswaran9270 ปีที่แล้ว

    @Andrej thank you for making this. Please continue making such videos. It really helps beginners like me. If possible, could you please make a series of how actual development and production is done.

  • @kindoblue
    @kindoblue 2 ปีที่แล้ว

    Every video another solid pure gold bar

  • @SK-ke8nu
    @SK-ke8nu หลายเดือนก่อน

    Great video again Andrej, keep up the good work and thank you as always!

  • @kemalware4912
    @kemalware4912 ปีที่แล้ว

    Deliberate errors on the right spot.. Your lectures are great.

  • @pablofernandez2671
    @pablofernandez2671 ปีที่แล้ว

    Andrej, we all love you. You're amazing!

  • @kaushik333ify
    @kaushik333ify ปีที่แล้ว +4

    Thank you so much for these lectures ! Can you please make a video on the “experimental harness” you mention at the end of the video? It would be super helpful and informative.

  • @SupratimSamanta
    @SupratimSamanta 4 หลายเดือนก่อน +1

    Andrej is literally the bridge between worried senior engineers and the world of gen ai.

  • @veeramahendranathreddygang1086
    @veeramahendranathreddygang1086 2 ปีที่แล้ว +2

    Thank you Sir. Have been waiting for this.

  • @davidespinosa1910
    @davidespinosa1910 3 หลายเดือนก่อน

    At 38:00, it sounds like we compared two architectures, both with 22k parameters and an 8 character window:
    * 1 layer, full connectivity
    * 3 layers, tree-like connectivity
    In a single layer, full connectivity outperforms partial connectivity.
    But partial connectivity uses fewer parameters, so we can afford to build more layers.

  • @Leo-sy4vu
    @Leo-sy4vu 2 ปีที่แล้ว

    Thank you soo much for the series i recently started it and its the best thing on the entire youtube. keep it up

  • @creatureOfnature1
    @creatureOfnature1 2 ปีที่แล้ว

    Much appreciated, Andrej. Your tutorials are gem!

  • @chineduezeofor2481
    @chineduezeofor2481 6 หลายเดือนก่อน +1

    Thank you for this beautiful tutorial 🔥

  • @EsdrasSoutoCosta
    @EsdrasSoutoCosta 2 ปีที่แล้ว

    Awesome! Well explained and clear what's being done. Please keep doing this fantastic videos!!!

  • @kaenovama
    @kaenovama ปีที่แล้ว

    Thank you! Love the series! Helped me a lot with my learning experience with PyTorch

  • @4mb127
    @4mb127 2 ปีที่แล้ว

    Thanks for continuing this fantastic series.

  • @michaelmuller136
    @michaelmuller136 11 หลายเดือนก่อน

    That was a very great playlist, easy to understand and very helpfull, thank you very much!!

  • @Abhishekkumar-qj6hb
    @Abhishekkumar-qj6hb ปีที่แล้ว

    So I ended up this lecture series and I was expecting RNN/LSTM/GRU but was not there however throughout learnt a lot that can definitely on my own. Thanks Andrej

  • @vivekpandit7417
    @vivekpandit7417 2 ปีที่แล้ว +1

    Been waiting for awhile. Thankyouuu !!

  • @Anfera236
    @Anfera236 2 ปีที่แล้ว +1

    Great content, Andrej! Keep them coming!

  • @utkarshsingh1663
    @utkarshsingh1663 2 ปีที่แล้ว

    Thanks Andrej this course is awesome for base building..

  • @adsuabeakufea
    @adsuabeakufea 11 หลายเดือนก่อน

    great video, been learning a ton from you recently. thank you andrej!

  • @mellyb.1347
    @mellyb.1347 9 หลายเดือนก่อน +1

    Loved this series. Would you please be willing to continue it so we get to work through the rest of CNN, RNN, and LSTM? Thanks!

  • @wholenutsanddonuts5741
    @wholenutsanddonuts5741 2 ปีที่แล้ว

    Fant wait for this next step in the process!

  • @Erosis
    @Erosis 2 ปีที่แล้ว +2

    Numpy / torch / tf tensor reshaping always feels like handwaivy magic.

  • @Joker1531993
    @Joker1531993 2 ปีที่แล้ว +1

    I am subscribing Andrej, just to support someone from our country, Slovakia. Even I don't understand nothing from the video >D

  • @fajarsuharyanto8871
    @fajarsuharyanto8871 2 ปีที่แล้ว

    Rarely finish entire episode. He'i Andrej 👌

  • @8eck
    @8eck ปีที่แล้ว

    Finally finished all the lectures and i understood that i have a bad math understanding and bad understanding of dimensionality and operations over it. Anyways, thank you for helping out with the rest concepts and practices, i do better understand now of how backprop is working and what it is doing and what for.

    • @Ali-lm7uw
      @Ali-lm7uw ปีที่แล้ว

      Jon Krohn has some a full playlist of algebra and calculus before starting machine learning

  • @mobkiller111
    @mobkiller111 2 ปีที่แล้ว

    Thanks for the content & explanations Andrej and have a great time in Kyoto :)

  • @BlockDesignz
    @BlockDesignz 2 ปีที่แล้ว +1

    Please keep these coming!

  • @zz79ya
    @zz79ya 11 หลายเดือนก่อน +2

    Um, can I find Part 6 somewhere?(RNN, LSTM, GRU..) I was under the impression that the next video in the playlist is about building GPT from skretch.

  • @shouryamann7830
    @shouryamann7830 ปีที่แล้ว +3

    ive been using this step loss function and I've been consistently getting slight better training and validation losses. for this i got 1.98 val loss.
    lr = 0.1 if i < 100000 else (0.01 if i < 150000 else 0.001)

  •  5 หลายเดือนก่อน

    Thanks. Very helpful and intuitive.

  • @nickgannon7466
    @nickgannon7466 2 ปีที่แล้ว

    You're crushing it, thanks a bunch.

  • @venkateshmunagala205
    @venkateshmunagala205 2 ปีที่แล้ว

    AI Devil is back . Thanks for the video @Andrej Karpathy.

  • @DanteNoguez
    @DanteNoguez 2 ปีที่แล้ว

    Thanks, Andrej, you're awesome!

  • @aga1nstall0dds
    @aga1nstall0dds 4 หลายเดือนก่อน

    Thanks for the masterclass!!!!!! .... btw i found u through an interview of geohotz with lex.... i heard u like to teach and they r right about that statement :)

  • @philipwoods6720
    @philipwoods6720 2 ปีที่แล้ว

    SO EXCITED TO SEE THIS POSTED LEEEEETS GOOOOOOOO

  • @arielfayol7198
    @arielfayol7198 ปีที่แล้ว

    Please don't stop the series😢

  • @sam.rodriguez
    @sam.rodriguez ปีที่แล้ว +1

    How can we help you keep putting these treasures out Andrej? I think the expected value of helping hundreds of thousands of ML practitioners improve their understanding of the building blocks might be proportional (or even outweigh) the value of your individual contributions at OpenAI. Thats not to say that your technical contributions are not valuable, on the contrary, I'm using their value as a point of comparison because I want to emphasise how amazingly valuable I think your work on education is. A useful analogy would be to ask which ended up having more impact on our progress in the field of physics: Richard Feynman's Lectures that motivated many to pursue science and improved the intuitions of everyone OR his individual contributions to the field?. At the end of the day is not about one or the other but just finding the right balance given the effective impact of each and, of course, your personal enjoyment.

  • @lotfullahandishmand4973
    @lotfullahandishmand4973 2 ปีที่แล้ว

    Dear Andrej your work is amazing, we are here to share and have a beautiful world all together and you are doing that.
    If you could make a video about Convolution NNs, or Image net top architectures, any thing deep related to vision, that would be great
    Thank you !

  • @colehoward5144
    @colehoward5144 ปีที่แล้ว +1

    Great video! In your next video, would you be able add a section where you show how to matrix multiply n-dimensional tensors? I am a little confused by what the output/shape should be for something like (6, 3, 9, 9) @ (3, 9, 3)

    • @milosz7
      @milosz7 ปีที่แล้ว

      multiplying matrces with these shapes is not possible

    • @colehoward5144
      @colehoward5144 ปีที่แล้ว

      @@milosz7yeah it doesn't look like it at first, but they are compatible. Results in output shape (6,3,9,3)

    • @ChrisNienart
      @ChrisNienart 6 หลายเดือนก่อน

      This is batched matrix multiplication. For M @ N to be valid, dimension -1 of M needs to match dimension -2 of N. Then you also need all your dimensions from dim = -3 up to your first dimension of the smaller tensor to match. The leading dimensions of M and the last dimension of N can be anything.
      In your example, M.shape = (6,3,9,9) and N.shape = (3,9,3).The leading dimension of M is 6, M.shape(-3) = N.shape(-3) = 3, M.shape(-1) = N.shape(-2) = 9, and the last dimension of N is 3. The output shape will be the first 3 dimensions of the first tensor and the last dimension of the second tensor.
      There are also broadcasting rules. You can check that if M.shape = (6,4,8,9) and N.shape = (2,1,4,9,3) then (M@N).shape = (2,6,4,8,3)

  • @mynameisZhenyaArt_
    @mynameisZhenyaArt_ ปีที่แล้ว +1

    Hi Andrej. Is there going to be RNN, LSTN, GRU video? or maybe even part 2 on the topic of WaveNet with the residual connections?

  • @daniellu8104
    @daniellu8104 9 หลายเดือนก่อน +1

    Haven't watched this video (yet) but i'm wondering if Andrej discussed WaveNet vs transformer. I know that the WaveNet paper came about around the same time as Attention is All You Need. It seems like both WaveNet and transformers can do sequence prediction/generation, but transformers have taken off. Is that because of transformers' better performance in most problem domains? Does WaveNet still outperform transformers in certain situations?