Implement Llama 3 From Scratch - PyTorch

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 ธ.ค. 2024

ความคิดเห็น • 30

  • @nbvcxz098
    @nbvcxz098 2 หลายเดือนก่อน +5

    WOW! You are something else dude! No one provides content like you! Exceptional!

    • @uygarkurtai
      @uygarkurtai  2 หลายเดือนก่อน

      Thank you!

  • @aykutcayir64
    @aykutcayir64 2 หลายเดือนก่อน

    As always, great job!👏🏻

    • @uygarkurtai
      @uygarkurtai  2 หลายเดือนก่อน

      @@aykutcayir64 thank you!

  • @කැලණිකුප්පි
    @කැලණිකුප්පි 2 หลายเดือนก่อน +2

    Woowwww awesome thanks for this ❤❤

    • @uygarkurtai
      @uygarkurtai  2 หลายเดือนก่อน

      Thank you!

  • @binyiu5353
    @binyiu5353 2 หลายเดือนก่อน

    Many thanks for this! It gives a much better understanding before reading the paper.

    • @uygarkurtai
      @uygarkurtai  2 หลายเดือนก่อน

      I'm glad to hear that!

  • @Ece-kx6qk
    @Ece-kx6qk 2 หลายเดือนก่อน

    The Video that I have been waiting for !!! Thank you 🙏🏻

    • @uygarkurtai
      @uygarkurtai  2 หลายเดือนก่อน

      Thank you!

  • @learntestenglish
    @learntestenglish 2 หลายเดือนก่อน

    I was waiting for new video. Thanks for awesome work ❤😊

    • @uygarkurtai
      @uygarkurtai  2 หลายเดือนก่อน +1

      Thank you!

  • @gustavojuantorena
    @gustavojuantorena 2 หลายเดือนก่อน

    Awesome!

    • @uygarkurtai
      @uygarkurtai  2 หลายเดือนก่อน

      Thank you!

  • @flashlin1
    @flashlin1 2 หลายเดือนก่อน +1

    Data, algorithms, and computational power are the three key elements. Why hasn't anyone added more complex connection models to Transformers? We should consider increasing the algorithmic complexity of large language models (LLMs), which can be likened to the complexity of connections in the human brain. This way, we wouldn't need to endlessly increase the number of parameters, especially since the number of artificial neurons already exceeds that of human neurons. Moreover, we haven't seen designs similar to the short-term memory neuron models from the runtime period.
    We should aim to design a model that can, like humans, quickly read relevant articles when faced with a problem. During the reading process, it could summarize related content into short-term memory and continuously update it. Then, based on this short-term memory, the model could verify the correctness of answers, for instance, by writing code to check the answers. Wouldn't this approach allow us to make the model smaller?

    • @uygarkurtai
      @uygarkurtai  2 หลายเดือนก่อน

      It's a very good research question. Attention mechanism can be viewed like the "short-term" memory you mentioned too. I remember some articles to make NN's like human brain sinapses. However the problem is that they didn't perform that well.

    • @flashlin1
      @flashlin1 2 หลายเดือนก่อน

      @@uygarkurtai The variety of neurons in the human brain far exceeds the range of functions used in artificial neural networks. How can we expect a single model, like the transformer, to handle everything? Shouldn't we focus on designing more diverse neural functions to better reflect the complexity of the brain?

    • @uygarkurtai
      @uygarkurtai  2 หลายเดือนก่อน

      @@flashlin1 in that case we again end up with a computationally expensive model. There's such a trade-off that is difficult to overcome. You may want to check multi-models that's closest to what you mention. Combination of several models. If you're curious about mimicking the brain also check out spiking neural networks.

    • @flashlin1
      @flashlin1 2 หลายเดือนก่อน

      @@uygarkurtai Why haven't we seen much progress with Spiking Neural Networks? My ideal concept of short-term memory should function during the inference phase, not be fixed during the training phase. Specifically, as the model processes an input question or reads through a large volume of articles, it should be able to summarize and store useful and relevant information in short-term memory, and only then generate an answer based on that.
      Moreover, during the process of generating an answer, the model should be able to dynamically update the short-term memory. For example, if later predictions impact the earlier generated content, the model should revise the previous answers based on the new information before producing the final result.
      Is there any model that works like this?

    • @uygarkurtai
      @uygarkurtai  2 หลายเดือนก่อน

      @@flashlin1 we haven't seem them because usually there're points where they fall short compared to regular MLPs. To me what you mentioned seems a bit like RAG applications.

  • @en-iyi-benim
    @en-iyi-benim หลายเดือนก่อน

    Hi! I really enjoy your videos and the way you explain concepts. I recently implemented the Qwen-2 Vision model using pure PyTorch. There’s a small error I’m working through at the moment, but I’d love to know if you’d be open to making a video using my code to explain the process. I think it could be really helpful for others who are interested in vision language models. Let me know what you think

    • @uygarkurtai
      @uygarkurtai  หลายเดือนก่อน

      @@en-iyi-benim hey thank you! I may look Qwen-2 model in the future. You can share your repository here too when it's done

  • @sarangzambare4646
    @sarangzambare4646 11 วันที่ผ่านมา

    Its surprising that torch.load_state_dict worked perfectly without needing to rename the layer names. Did you reverse engineer the exact classes from the weight names?

    • @uygarkurtai
      @uygarkurtai  9 วันที่ผ่านมา

      Nope. Since the architecture is the same I would expect it to load just fine

    • @sarangzambare4646
      @sarangzambare4646 8 วันที่ผ่านมา

      @@uygarkurtai Makes sense! Nice work.
      Would be great if you could cover llama 3.2 architecture. Seems like they have inserted a cross attention layer in the 1st transformer block, such that it can cross attend to text embeddings and image embeddings (from Vit-h/14) - if you could make a video that builds that arch and then also loads the 3.2 vision weights - that would spread like fire!
      since the Llama 3.2 vision is an 11B model, for demonstration purposes you could shave off the transformer blocks to just 2 or 3 and load the weights with strict=False - i believe it can then run on your setup (unless you have enough VRAM for the full 11B)

  • @abhijoy.sarkar
    @abhijoy.sarkar 2 หลายเดือนก่อน

    Let’s make llama4 before llama4 🤝

    • @uygarkurtai
      @uygarkurtai  2 หลายเดือนก่อน

      With enough gpus 🤝