Reading GPT-2 source code

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 ก.ย. 2024

ความคิดเห็น • 11

  • @asadhayat9473
    @asadhayat9473 4 หลายเดือนก่อน +1

    First of all great intiative adn series, the last pasrt was a bit unclear in terms of finetuning why it was being done that way when we have labels. Just an overview level detail seems like missing but the rest was pretty good in explanation

    • @makgaiduk
      @makgaiduk  4 หลายเดือนก่อน

      A little disclaimer - it was 3 am by the time I've finished filming, so I was visibly exhausted and confused, I am sorry for that ) Hoping to reshoot bad moments like these someday.
      Regarding finetuning and loss: in the video, I was covering "finetuning on an CLM (causal language modeling) loss. I.e., this is the same loss and target that was used during pretraining, but the code itself is much simpler and less scalable than pretraining code. It can still be treated as a valuable fine-tuning technique, for example, for domain adaptation - finetuning your model to work better with specific sorts of texts you are interested in.
      In GPT-1 paper, OpenAI researchers also mention finetuning on different sorts of targets, i.e., actually replacing next-token-prediction heads of the model with something else, like sentiment analysis classification heads, though no code was released for that. GPT-2 paper was entirely focused on pretraining and zero-shot transfer, and as far as I know, in later models OpenAI also chose to pursue purely language modeling objectives and zero-shot transfer.
      Hope I clarified some things

    • @asadhayat9473
      @asadhayat9473 4 หลายเดือนก่อน

      @@makgaiduk Much clear now, I would be interested to join this series in terms of collaboration if you are up for it.
      One suggestion would be for tokenization i guess Kerpathy's video is quite informing so you can also mention that in resources in your blog.
      Thanks again for the great work :)

    • @makgaiduk
      @makgaiduk  4 หลายเดือนก่อน

      @@asadhayat9473write me to adensur@gmail.com with some contact info, like telegram, and let’s talk!

  • @davidro00
    @davidro00 4 หลายเดือนก่อน +1

    Why does the inference return a tensor of shape sequence length x vocab size? I thought the model only predicts the next token for the whole sequence, so i am confused that it predicts probabilities for the next token of every entry of the sequence. Hope my question is clear enough

    • @makgaiduk
      @makgaiduk  4 หลายเดือนก่อน

      I am guessing this helps to "squeeze out" more signal from the data. With sequence length of 1024, you effectively get 1023 training examples per chunk of text instead of just one.
      The last token is used in the generation script though, so during generation everything is happening as expected

    • @davidro00
      @davidro00 4 หลายเดือนก่อน +1

      @@makgaiduk okay so it basically allows for parallel loss computation for each token in the sequence during training, instead of processing each token sequentially. But i dont get the Point in throwing these massive outputs into a MLP + Softmax if we only need the last token logits

    • @makgaiduk
      @makgaiduk  4 หลายเดือนก่อน

      @@davidro00 Good question. Huggingface does have this optimization for some finetuned versions of models, like the one used for classification: github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_gpt2.py#L1678
      Can't see anything like that specifically for text generation though. Maybe it is because gpt2 is not really used actively anymore. Maybe it's that the MLP head inference cost is negligible compared to transformers

  • @pestlewebengland1346
    @pestlewebengland1346 หลายเดือนก่อน

    Thank you for the video. .. a quick question please ... At 36:01 you open a file called "sandbox.ipynb", which, from the file path, looks as if it is in the folder ".\transformers\examples\pytorch" .. but I can't find the file in the git download at that location or any other. Is this something you have written to demonstrate calling the libraries .. or has the huggingface library been updated and this has been changed / added?

    • @makgaiduk
      @makgaiduk  หลายเดือนก่อน

      Hello there! The notebook was written by me, though unfortunately I forgot to commit it and it was lost. I will try to recover/rewrite it, though it might take a few days

    • @makgaiduk
      @makgaiduk  หลายเดือนก่อน

      And here it is: github.com/adensur/blog/tree/main/nlp/00_reading_gpt2_source_code