Reading BERT source code

DINO - DETR with Improved DeNoising AnchorBoxes for End-to-End Object Detection

What are AI Agents?

Players vs Corner Flags 🤯

สรุปๆ โค้งสุดท้าย!! เม็ดเดรียว อ.กอบกุล/ลุงนิรนาม/พี่สุรีย์/อ.ประกอบ/อ.ทาเย/เหน่ง ครฐมฯ 16 ก.ย.67

แจ๊สต้องมอบ 😂😂😂 #แจงปุณณาสา #แจ๊สแจง #แจ๊สชวนชื่น #ก็มาดิครับ #short #ตลก

Reading GPT-2 source code

Mak Gaiduk

มุมมอง 705

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 15 ก.ย. 2024

ความคิดเห็น • 11

@asadhayat9473 4 หลายเดือนก่อน ⁺¹
First of all great intiative adn series, the last pasrt was a bit unclear in terms of finetuning why it was being done that way when we have labels. Just an overview level detail seems like missing but the rest was pretty good in explanation
@makgaiduk 4 หลายเดือนก่อน
A little disclaimer - it was 3 am by the time I've finished filming, so I was visibly exhausted and confused, I am sorry for that ) Hoping to reshoot bad moments like these someday.
Regarding finetuning and loss: in the video, I was covering "finetuning on an CLM (causal language modeling) loss. I.e., this is the same loss and target that was used during pretraining, but the code itself is much simpler and less scalable than pretraining code. It can still be treated as a valuable fine-tuning technique, for example, for domain adaptation - finetuning your model to work better with specific sorts of texts you are interested in.
In GPT-1 paper, OpenAI researchers also mention finetuning on different sorts of targets, i.e., actually replacing next-token-prediction heads of the model with something else, like sentiment analysis classification heads, though no code was released for that. GPT-2 paper was entirely focused on pretraining and zero-shot transfer, and as far as I know, in later models OpenAI also chose to pursue purely language modeling objectives and zero-shot transfer.
Hope I clarified some things
@asadhayat9473 4 หลายเดือนก่อน
@@makgaiduk Much clear now, I would be interested to join this series in terms of collaboration if you are up for it.
One suggestion would be for tokenization i guess Kerpathy's video is quite informing so you can also mention that in resources in your blog.
Thanks again for the great work :)
@makgaiduk 4 หลายเดือนก่อน
@@asadhayat9473write me to adensur@gmail.com with some contact info, like telegram, and let’s talk!
@davidro00 4 หลายเดือนก่อน ⁺¹
Why does the inference return a tensor of shape sequence length x vocab size? I thought the model only predicts the next token for the whole sequence, so i am confused that it predicts probabilities for the next token of every entry of the sequence. Hope my question is clear enough
@makgaiduk 4 หลายเดือนก่อน
I am guessing this helps to "squeeze out" more signal from the data. With sequence length of 1024, you effectively get 1023 training examples per chunk of text instead of just one.
The last token is used in the generation script though, so during generation everything is happening as expected
@davidro00 4 หลายเดือนก่อน ⁺¹
@@makgaiduk okay so it basically allows for parallel loss computation for each token in the sequence during training, instead of processing each token sequentially. But i dont get the Point in throwing these massive outputs into a MLP + Softmax if we only need the last token logits
@makgaiduk 4 หลายเดือนก่อน
@@davidro00 Good question. Huggingface does have this optimization for some finetuned versions of models, like the one used for classification: github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_gpt2.py#L1678
Can't see anything like that specifically for text generation though. Maybe it is because gpt2 is not really used actively anymore. Maybe it's that the MLP head inference cost is negligible compared to transformers
@pestlewebengland1346 หลายเดือนก่อน
Thank you for the video. .. a quick question please ... At 36:01 you open a file called "sandbox.ipynb", which, from the file path, looks as if it is in the folder ".\transformers\examples\pytorch" .. but I can't find the file in the git download at that location or any other. Is this something you have written to demonstrate calling the libraries .. or has the huggingface library been updated and this has been changed / added?
@makgaiduk หลายเดือนก่อน
Hello there! The notebook was written by me, though unfortunately I forgot to commit it and it was lost. I will try to recover/rewrite it, though it might take a few days
@makgaiduk หลายเดือนก่อน
And here it is: github.com/adensur/blog/tree/main/nlp/00_reading_gpt2_source_code

ต่อไป

เล่นอัตโนมัติ

Reading BERT source code

Reading BERT source code

DINO - DETR with Improved DeNoising AnchorBoxes for End-to-End Object Detection

DINO - DETR with Improved DeNoising AnchorBoxes for End-to-End Object Detection

What are AI Agents?

What are AI Agents?

Players vs Corner Flags 🤯

Players vs Corner Flags 🤯

สรุปๆ โค้งสุดท้าย!! เม็ดเดรียว อ.กอบกุล/ลุงนิรนาม/พี่สุรีย์/อ.ประกอบ/อ.ทาเย/เหน่ง ครฐมฯ 16 ก.ย.67

สรุปๆ โค้งสุดท้าย!! เม็ดเดรียว อ.กอบกุล/ลุงนิรนาม/พี่สุรีย์/อ.ประกอบ/อ.ทาเย/เหน่ง ครฐมฯ 16 ก.ย.67

แจ๊สต้องมอบ 😂😂😂 #แจงปุณณาสา #แจ๊สแจง #แจ๊สชวนชื่น #ก็มาดิครับ #short #ตลก

แจ๊สต้องมอบ 😂😂😂 #แจงปุณณาสา #แจ๊สแจง #แจ๊สชวนชื่น #ก็มาดิครับ #short #ตลก

ไฮไลท์ฟุตบอล พรีเมียร์ลีก 2024/25 สัปดาห์ที่ 4 : เซาธ์แฮมป์ตัน พบ แมนเชสเตอร์ ยูไนเต็ด

ไฮไลท์ฟุตบอล พรีเมียร์ลีก 2024/25 สัปดาห์ที่ 4 : เซาธ์แฮมป์ตัน พบ แมนเชสเตอร์ ยูไนเต็ด

ChatGPT: 30 Year History | How AI Learned to Talk

ChatGPT: 30 Year History | How AI Learned to Talk

State of GPT | BRK216HFS

State of GPT | BRK216HFS

Vision Transformer (ViT)

Vision Transformer (ViT)

Simple Code, High Performance

Simple Code, High Performance

ICML 2024 Tutorial: Physics of Language Models

ICML 2024 Tutorial: Physics of Language Models

Reading DINO source code - DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

Reading DINO source code - DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

DN Detr (Denoising Detr for object detection)

DN Detr (Denoising Detr for object detection)

WE GOT ACCESS TO GPT-3! [Epic Special Edition]

WE GOT ACCESS TO GPT-3! [Epic Special Edition]

[1hr Talk] Intro to Large Language Models

[1hr Talk] Intro to Large Language Models

รุกแรงจนเสียอาการ! "นุ่น สุทธิภา" ชอบนะคะ ขอเบิ้ลได้ไหม | ซุป'ตาร์ พาตะลุย

รุกแรงจนเสียอาการ! "นุ่น สุทธิภา" ชอบนะคะ ขอเบิ้ลได้ไหม | ซุป'ตาร์ พาตะลุย

HIGHLIGHTS: Brazil v Cuba | FIFA Futsal World Cup Uzbekistan 2024

HIGHLIGHTS: Brazil v Cuba | FIFA Futsal World Cup Uzbekistan 2024

ไฮไลท์ฟุตบอล พรีเมียร์ลีก 2024/25 สัปดาห์ที่ 4 : เซาธ์แฮมป์ตัน พบ แมนเชสเตอร์ ยูไนเต็ด

ไฮไลท์ฟุตบอล พรีเมียร์ลีก 2024/25 สัปดาห์ที่ 4 : เซาธ์แฮมป์ตัน พบ แมนเชสเตอร์ ยูไนเต็ด

ไฮไลท์ไทยลีก : บีจี ปทุม ยูไนเต็ด พบ หนองบัว พิชญ เอฟซี

ไฮไลท์ไทยลีก : บีจี ปทุม ยูไนเต็ด พบ หนองบัว พิชญ เอฟซี

เอาชีวิตรอดจาก ปู่ผี ที่ปาร์ตี้บ้านเพื่อน!! (SPD RUN)

เอาชีวิตรอดจาก ปู่ผี ที่ปาร์ตี้บ้านเพื่อน!! (SPD RUN)

ไทย ชนะ โครเอเชีย ประเดิมเก็บ 3 แต้มแรก ฟุตซอลโลก 2024 | ลุยสนามข่าวเย็น | 14 ก.ย. 67 | T Sports 7

ไทย ชนะ โครเอเชีย ประเดิมเก็บ 3 แต้มแรก ฟุตซอลโลก 2024 | ลุยสนามข่าวเย็น | 14 ก.ย. 67 | T Sports 7

🔴 LIVE ถ่ายทอดสด! ผลการออกรางวัลสลากกินแบ่งรัฐบาล งวด 16 ก.ย. 2567

🔴 LIVE ถ่ายทอดสด! ผลการออกรางวัลสลากกินแบ่งรัฐบาล งวด 16 ก.ย. 2567