The ML Tech Lead!
The ML Tech Lead!
  • 46
  • 65 317
Understanding How LoRA Adapters Work!
LoRA Adapters are, to me, one of the smartest strategies used in Machine Learning in recent years! LoRA came as a very natural strategy for fine-tuning models. In my opinion, if you want to work with large language models, knowing how to fine-tune models is one of the most important skills to have these days as a machine learning engineer.
So, let me show you the mathematical foundation for those LoRa adapters, why they are useful, and how they are used. Let's get into it
มุมมอง: 584

วีดีโอ

The Backpropagation Algorithm Explained!
มุมมอง 52512 ชั่วโมงที่ผ่านมา
The backpropagation algorithm is the heart of deep learning! That is the core reason why we can have those advanced models like LLMs. In a previous video, we saw we can use the computational graph that is built as part of deep learning models to compute any derivatives of the network outputs with respect to the network inputs. I'll put the link in the description. Now we are going to see how we...
Understanding The Computational Graph in Neural Networks
มุมมอง 87021 ชั่วโมงที่ผ่านมา
Do you know what is this computational graph used by deep learning frameworks like TensorFlow or PyTorch? No? Let me tell you then! The whole logic behind how neural networks function is the back-propagation algorithm. This algorithm allows to update the weights of the network so that it can learn. The key aspect of this algorithm is to make sure we can compute the derivatives or the gradients ...
How to Approach Model Optimization for AutoML
มุมมอง 55114 วันที่ผ่านมา
Since I started my career in Machine learning, I have worked hard to automate every aspect of my work. If I couldn't develop a fully production-ready machine learning at the click of a button, I was doing something wrong! I find it funny how you can recognize a senior machine learning engineer by how little he works to achieve the same results as a junior one working 10 times as hard! AutoML ha...
Understanding CatBoost!
มุมมอง 55814 วันที่ผ่านมา
CatBoost was developed by Yandex in 2017: CatBoost: unbiased boosting with categorical features. They realized that the boosting process induces a special case of data leakage. To prevent that, they developed two new techniques, the expanding mean target encoding and the ordered boosting. - The Gradient Boosted Algorithm Explained: th-cam.com/video/XWQ0Fd_xiBE/w-d-xo.html - Understanding XGBoos...
Implementing the Self-Attention Mechanism from Scratch in PyTorch!
มุมมอง 61621 วันที่ผ่านมา
Let’s implement the self-attention layer! Here is the video where you can find the logic behind it: th-cam.com/video/W28LfOld44Y/w-d-xo.html
What is the Vision Transformer?
มุมมอง 55521 วันที่ผ่านมา
I find the Vision Transformer to be quite an interesting model! The self-attention mechanism and the transformer architecture were designed to help fix some of the flaws we saw in previous models that had applications in natural language processing. With the Vision Transformer, a few scientists at Google realized they could take images instead of text as input data and use that architecture as ...
Understanding XGBoost From A to Z!
มุมมอง 92828 วันที่ผ่านมา
I often say that I some point in my career, I became more of a XGBoost modeler than a Machine Learning modeler. That's because if you were working on large tabular datasets, there was no point to try another algorithm, it would provide close to optimum results without much effort. Yeah ok, LightGBM and Catboost are obviously as good and sometimes better, but I will always keep a special place i...
The Gradient Boosted Algorithm Explained!
มุมมอง 1.1Kหลายเดือนก่อน
In the gradient-boosted trees algorithm, we iterate the following: - We train a tree on the errors made at the previous iteration - We add the tree to the ensemble, and we predict with the new model - We compute the errors made for this iteration.
How Can We Generate BETTER Sequences with LLMs?
มุมมอง 383หลายเดือนก่อน
We know that LLMs are trained to predict the next word. When we decode the output sequence, we use the tokens of the prompt and the previously predicted tokens to predict the next word. With greedy decoding or multinomial sampling decoding, we use those predictions to output the next token in an autoregressive manner. But is this the sequence we are looking for, considering the prompt? Do we ac...
What is this Temperature for a Large Language Model?
มุมมอง 634หลายเดือนก่อน
How do LLMs generate text in a Stochastic manner!
From Words to Tokens: The Byte-Pair Encoding Algorithm
มุมมอง 495หลายเดือนก่อน
Why do we keep talking about "tokens" in LLMs instead of words? It happens to be much more efficient to break the words into sub-words (tokens) for model performance!
The Multi-head Attention Mechanism Explained!
มุมมอง 917หลายเดือนก่อน
The Multi-head Attention Mechanism Explained!
What ML Engineer Are You? How To Present Yourself On Your Resume
มุมมอง 295หลายเดือนก่อน
For any engineering domain, hiring managers will typically look at two sets of skills: technical skills and leadership skills.
Understanding How Vector Databases Work!
มุมมอง 15Kหลายเดือนก่อน
Today, we dive into the subject of vector databases. Those databases are often used in search engines by using the vector representations of the items we are trying to search. We dig into the different algorithms that allow us to search for vectors among billions or trillions of documents.
Understanding the Self-Attention Mechanism in 8 min
มุมมอง 1.2K2 หลายเดือนก่อน
Understanding the Self-Attention Mechanism in 8 min
What is Perplexity for LLMs?
มุมมอง 4142 หลายเดือนก่อน
What is Perplexity for LLMs?
Getting a Job in AI: The Different ML Jobs
มุมมอง 2522 หลายเดือนก่อน
Getting a Job in AI: The Different ML Jobs
Revolutionizing Education with AI: Personalized Learning, Model Challenges, and Finance Insights
มุมมอง 2677 หลายเดือนก่อน
Revolutionizing Education with AI: Personalized Learning, Model Challenges, and Finance Insights
Exploring Data Science Careers and Potential of Large Language Models
มุมมอง 1707 หลายเดือนก่อน
Exploring Data Science Careers and Potential of Large Language Models
Unlocking AI's Secrets: Career Journeys, Challenges, and the Future
มุมมอง 2128 หลายเดือนก่อน
Unlocking AI's Secrets: Career Journeys, Challenges, and the Future
Working in AI as a Software Engineer!
มุมมอง 2678 หลายเดือนก่อน
Working in AI as a Software Engineer!
Let's Talk about AI with Etienne Bernard!
มุมมอง 2309 หลายเดือนก่อน
Let's Talk about AI with Etienne Bernard!

ความคิดเห็น

  • @passportkaya
    @passportkaya วันที่ผ่านมา

    not really. I'm a US citizen been all over Europe. I say it's the same .

    • @TheMLTechLead
      @TheMLTechLead วันที่ผ่านมา

      How long have you lived in Europe and what countries exactly?

  • @sebastianguerrero5626
    @sebastianguerrero5626 วันที่ผ่านมา

    nice content, keep it up!

  • @EmpreendedoresdoBEM
    @EmpreendedoresdoBEM 2 วันที่ผ่านมา

    very clear explanation. thanks

  • @naatcollections7976
    @naatcollections7976 2 วันที่ผ่านมา

    I like your channel

  • @godzilllla2452
    @godzilllla2452 3 วันที่ผ่านมา

    I've got it now. I wonder why we can't calculate the x gradient by starting the backward pass closer to x instead of going through all the activations.

    • @TheMLTechLead
      @TheMLTechLead 3 วันที่ผ่านมา

      I am not sure I understand the question.

  • @mateuszsmendowski2677
    @mateuszsmendowski2677 4 วันที่ผ่านมา

    One of the best explanations on TH-cam. Substantively and visually at the highest level :) Are you able to share those slides e.g. via Git?

    • @TheMLTechLead
      @TheMLTechLead 4 วันที่ผ่านมา

      I cannot share the slide but you can see the diagrams in my newsletter: newsletter.theaiedge.io/p/understanding-the-self-attention

  • @zeeshankhanyousafzai5229
    @zeeshankhanyousafzai5229 5 วันที่ผ่านมา

  • @milleniumsalman1984
    @milleniumsalman1984 5 วันที่ผ่านมา

    too good

  • @milleniumsalman1984
    @milleniumsalman1984 5 วันที่ผ่านมา

    great video

  • @milleniumsalman1984
    @milleniumsalman1984 5 วันที่ผ่านมา

    good video

  • @Snerdy0867
    @Snerdy0867 6 วันที่ผ่านมา

    Phenomenal visuals and explanations. Best video on this concept I've ever seen.

    • @TheMLTechLead
      @TheMLTechLead 5 วันที่ผ่านมา

      I am liking reading that!

  • @IkhukumarHazarika
    @IkhukumarHazarika 8 วันที่ผ่านมา

    Is it rnn 😅

  • @IkhukumarHazarika
    @IkhukumarHazarika 8 วันที่ผ่านมา

    Love the way you teach every point please start teaching this way

  • @IkhukumarHazarika
    @IkhukumarHazarika 8 วันที่ผ่านมา

    More good content indeed good one❤

  • @randomaccessofshortvideos6214
    @randomaccessofshortvideos6214 9 วันที่ผ่านมา

    💯💯💯

  • @faysoufox
    @faysoufox 10 วันที่ผ่านมา

    Thank you for your videos

  • @math_in_cantonese
    @math_in_cantonese 10 วันที่ผ่านมา

    I will use your videos as interview refresher....... It is so easy to forget about the details when everyday work floods in for a period of years.

    • @TheMLTechLead
      @TheMLTechLead 9 วันที่ผ่านมา

      I am glad to read that!

  • @math_in_cantonese
    @math_in_cantonese 10 วันที่ผ่านมา

    Thanks, I forgot some details about Gradient Boosted Algorithm and I was too lazy to look it up.

  • @vivek2319
    @vivek2319 11 วันที่ผ่านมา

    Please make more videos

  • @jairjuliocc
    @jairjuliocc 14 วันที่ผ่านมา

    Thanks You.Can you explain the entire self attention flow? (from postional encode to final next word prediction). I think it will be an entire series 😅

    • @TheMLTechLead
      @TheMLTechLead 14 วันที่ผ่านมา

      It is coming! It will take time

  • @CrypticPulsar
    @CrypticPulsar 14 วันที่ผ่านมา

    Thank you, Damien!!

  • @va940
    @va940 16 วันที่ผ่านมา

    Very good advice ❤

  • @va940
    @va940 16 วันที่ผ่านมา

    Awesome

  • @elmoreglidingclub3030
    @elmoreglidingclub3030 16 วันที่ผ่านมา

    Excellent!! Very good explanation. I need to work on my ear for French. But pausing and backing up the video helped. Great stuff!!

    • @TheMLTechLead
      @TheMLTechLead 16 วันที่ผ่านมา

      My accent + my speaking skills are my weaknesses. Working on it and I think I am improving!

    • @elmoreglidingclub3030
      @elmoreglidingclub3030 15 วันที่ผ่านมา

      @@TheMLTechLead Thanks for your reply but absolutely no apology necessary!! I think it is an excellent video and helpful information. Much appreciation for posting. I am a professor in a business school and always looking for insights into how to teach the technical side of technology in the context of business. Your explanation has been very helpful.

  • @Gowtham25
    @Gowtham25 18 วันที่ผ่านมา

    It's really good and usefull... Expecting for training an llm from the scratch for the next and interested in KAN-FORMER...

  • @astudent8885
    @astudent8885 19 วันที่ผ่านมา

    ML is a black box but boosting seems to be more interpretable (potentially) if we can make the trees more sparse and orthogonal

    • @TheMLTechLead
      @TheMLTechLead 18 วันที่ผ่านมา

      Tree-based method can naturally be used to measure Shapley values without approximation: shap.readthedocs.io/en/latest/tabular_examples.html

  • @astudent8885
    @astudent8885 19 วันที่ผ่านมา

    Do you mean that the new tree is predicting the error? In that case, wouldn't you subtract the new prediction from the previous predictions

    • @TheMLTechLead
      @TheMLTechLead 18 วันที่ผ่านมา

      So we have an ensemble of trees F that predicts y such that F(x) = \hat{y}. The error is y - F(x) = e. We want to add a tree that predicts the error T(x) = \hat{e} = e + error = y - F(x) + error. Therefore F(x) + T(x) = y + error

  • @siddharthsingh7281
    @siddharthsingh7281 19 วันที่ผ่านมา

    share the resources in description

    • @MCroppered
      @MCroppered 12 วันที่ผ่านมา

      Why

    • @MCroppered
      @MCroppered 12 วันที่ผ่านมา

      “Give me the exam solutions pls”

  • @py2992
    @py2992 19 วันที่ผ่านมา

    Thank you for this video !

  • @vedantkulkarni5192
    @vedantkulkarni5192 20 วันที่ผ่านมา

    Hi,we should subtract the target from the cum sum right? I didn't understand where you did it.

    • @TheMLTechLead
      @TheMLTechLead 20 วันที่ผ่านมา

      It is in the script shown in the video and I am not adding the target of the current row, which is equivalent to subtracting it if I were to add it.

    • @vedantkulkarni5192
      @vedantkulkarni5192 18 วันที่ผ่านมา

      ​@@TheMLTechLeadUnderstood ! Thanks for replying .

  • @jaskiratbenipal8255
    @jaskiratbenipal8255 21 วันที่ผ่านมา

    Can you help me understand how and why are the positional embeddings effective in transformers (vision or text). Can't the model just learn that through its existing weights. How does adding extra positional embeddings to the vision/text embeddings help? Even if we have a unique vector for each position, when we add those to the text embeddings the result won't be unique. Would the result after addition even have useful information, since we can get same addition values from multiple combinations. Let's say we have a text model that has input limit of only two tokens. and the embedding size is 3. Text embeddings: [0, 1.1, 0.3], [0, 0.1,1.3] position embeddings: [0, 0, 1], [0, 1, 0] Embeddings after addition: [0, 1.1, 1.3], [0, 1, 1.3] We get same vectors Is the magic in the actual function that we use for embeddings or is it just empirically better and we can't understand it fully.

    • @TheMLTechLead
      @TheMLTechLead 20 วันที่ผ่านมา

      Inside the model, we compute the self-attentions. They are pretty much just a measure of interaction between the different tokens in the input sequence. Inside the attention layer, we have the queries, the keys and the values. The keys and queries are used to compute the self-attentions and the resulting hidden states is the weighted average of the values where we use the attentions as weights. At that point, the order of the tokens in completely lost because we are just summing stuff together without knowing in what order they were before the sum. That is why we keep the position information through the positional encoding. We systematically add the same vector for the same position so the model starts to understand how that shift relates to that position. The value of the same token varies depending on its position. To be fair, we do it a bit differently in 2024. Video coming!

    • @jaskiratbenipal8255
      @jaskiratbenipal8255 20 วันที่ผ่านมา

      @@TheMLTechLead Looking forward to it! After commenting, I read about RoPE (can't say fully understood it) and learnable positional embeddings. P.S. I really liked your idea of using routing in attention, a bit ambitious goal, but I want to use it to train a small language model or I will see if it is possible to simply add it in a pre-trained model without losing the learned weights.

    • @TheMLTechLead
      @TheMLTechLead 19 วันที่ผ่านมา

      @@jaskiratbenipal8255 I may not make a video about RoPE but I wrote something about it here: www.linkedin.com/posts/damienbenveniste_most-modern-llms-are-built-using-the-rope-activity-7188571849084096515-mmUk. For the routed self-attentions, I am looking forward to see somebody implementing those and training a model with it.

    • @jaskiratbenipal8255
      @jaskiratbenipal8255 7 วันที่ผ่านมา

      @@TheMLTechLead I tried it, I trained a language model from scratch for next character prediction (to have small vocabulary). The results were good using normal attention, the model was able to form words and phrases and some jibberish that looked like words. With the routed attention (I tried 0.1 and 0.3 sparcity values), it started to diverge and model was not converging at all after first epoch. The training time did decrease from 34 to 24 mins.

  • @shklbor
    @shklbor 22 วันที่ผ่านมา

    Very nice video, thanks. The second type of NLP Engineer falls under the emerging AI Engineer role. An AI Engineer can use APIs to develop NLP as well as CV based applications (e.g. using stable diffusion APIs)

  • @Akashkv-ke3ff
    @Akashkv-ke3ff 22 วันที่ผ่านมา

    accent is tough to understand

    • @TheMLTechLead
      @TheMLTechLead 22 วันที่ผ่านมา

      That is my weakness!

    • @jeromeeusebius
      @jeromeeusebius 17 วันที่ผ่านมา

      @@TheMLTechLead It's not that bad. As a English speaker that knows/speaks French, I can tell you are French. It's not had to hear and understand what you are saying. I'd say keep going and make any improvements along the way.

    • @aabishkarwagle8708
      @aabishkarwagle8708 16 วันที่ผ่านมา

      not true at all, i am watching in 1.5x and can still understand everything

    • @ankurgupta8678
      @ankurgupta8678 11 วันที่ผ่านมา

      Thanks @TheMLTechLead! You did a great job in summarizing the key ideas.

  • @howardsmith4128
    @howardsmith4128 23 วันที่ผ่านมา

    Awesome work. Thanks so much.

  • @marthalanaveen
    @marthalanaveen 23 วันที่ผ่านมา

    Thank you so much for this. You don’t know how badly I needed this right now. Please extend this series to transformers, if possibly any LLM as well.

  • @Aniket_0314
    @Aniket_0314 23 วันที่ผ่านมา

    lovely video , helpful for me in getting started with vector-db

  • @MFaizan-fy8dk
    @MFaizan-fy8dk 24 วันที่ผ่านมา

    Your way explaining using animation is very good. It is requested please stop backend sound which is distracting.

    • @TheMLTechLead
      @TheMLTechLead 23 วันที่ผ่านมา

      Ok! Yeah I figured it was annoying!

  • @marthasamuel
    @marthasamuel 27 วันที่ผ่านมา

    So this is basically used for classification? For example, cat and dogs, right?

    • @TheMLTechLead
      @TheMLTechLead 26 วันที่ผ่านมา

      It can be used for any computer vision ML task.

    • @marthasamuel
      @marthasamuel 26 วันที่ผ่านมา

      @TheMLTechLead great! I was thinking of image generation for a given prompt or user input.., what would the process be?

    • @TheMLTechLead
      @TheMLTechLead 26 วันที่ผ่านมา

      Oh no, you would need a very different model for that. Although, the vision transformer can be an element of it.

    • @marthasamuel
      @marthasamuel 26 วันที่ผ่านมา

      @@TheMLTechLead got you!

  • @jacquesgouimenou9668
    @jacquesgouimenou9668 27 วันที่ผ่านมา

    Good job.

  • @eddysaoudi253
    @eddysaoudi253 27 วันที่ผ่านมา

    Merci pour ton travail. C'est vachement bien ce que tu fais.

  • @alexanderkozlov2539
    @alexanderkozlov2539 29 วันที่ผ่านมา

    Great intro into XGBoost theory! Not been new to the subject, in the past I used the official docs when I needed to refresh my knowledge. Will be using your video now, thank you Damien!

  • @siddharthvj1
    @siddharthvj1 หลายเดือนก่อน

    so far the best teacher .....if possible I would love to join you ...

    • @TheMLTechLead
      @TheMLTechLead หลายเดือนก่อน

      If I need somebody, I know where to look!

  • @chenmarkson7413
    @chenmarkson7413 หลายเดือนก่อน

    Great explanations! Easy to understand.

  • @antoniopiemontese6078
    @antoniopiemontese6078 หลายเดือนก่อน

    What's the difference of this algorithm from Boosting as explained in Hastie & Tibshirani's book published in 2013 (first version). It does seem the same.

    • @TheMLTechLead
      @TheMLTechLead หลายเดือนก่อน

      Why do you expect them to be different?

    • @TheMLTechLead
      @TheMLTechLead หลายเดือนก่อน

      Maybe you are asking what is the difference between boosting in general and gradient boosting in particular? To be fair, my video in not going deep enough to highlight the differences. In a coming video, I am going to go into the details of how XGBoost works and I believe that should clear up the confusion.

  • @SoimulPatriei
    @SoimulPatriei หลายเดือนก่อน

    Very good and intuitive explanation of the algorithm. Thank-you!

  • @tripathi26
    @tripathi26 หลายเดือนก่อน

    Informative! 🙏

  • @SHAILENDRAUPADHYAY-ok4yz
    @SHAILENDRAUPADHYAY-ok4yz หลายเดือนก่อน

    Absolute master piece

  • @ShihgianLee
    @ShihgianLee หลายเดือนก่อน

    These few recent videos are great! They are short and to the points with clear explanations!

    • @TheMLTechLead
      @TheMLTechLead หลายเดือนก่อน

      Happy to hear that!

  • @drewgalbraith4362
    @drewgalbraith4362 หลายเดือนก่อน

    One question: when adding T into the numerator, do we also add it into ‘C’, the denominator?

  • @SHAILENDRAUPADHYAY-ok4yz
    @SHAILENDRAUPADHYAY-ok4yz หลายเดือนก่อน

    Great explanation