Small Language Models Are Also Few-Shot Learners
ฝัง
- เผยแพร่เมื่อ 28 ก.ค. 2024
- This video explains the latest work in Pattern-Exploiting Training. This paper shows that this distillation scheme from knowledge captured in pre-trained language models to discriminative classifiers can also work in the Few-shot setting. This is compared directly with GPT-3's performance using 32 labeled examples for different tasks like BoolQ or Winograde Schema. This is very interesting, but not a fair, apples-to-apples, comparison with GPT-3. Thanks for watching! Please Subscribe!
Paper Links:
Paper Link: arxiv.org/abs/2009.07118
First PET Paper: arxiv.org/pdf/2001.07676.pdf
Next Word Prediction Demo: github.com/renatoviolin/next_...
Hacker News Reaction: news.ycombinator.com/item?id=...
HuggingFace NLP Viewer: huggingface.co/nlp/viewer/?da...
GPT-3: arxiv.org/pdf/2005.14165.pdf
SimCLRv2 (if curious about semi-supervised knowledge distillation in vision): arxiv.org/pdf/2006.10029.pdf
Measuring Massive Multitask Language Understanding: arxiv.org/pdf/2009.03300.pdf
GenAug: arxiv.org/pdf/2010.01794.pdf
Efficient Transformers Survey: arxiv.org/abs/2009.06732
T5: ai.googleblog.com/2020/02/exp...
Thanks for watching!
Chapters
0:00 Introduction
1:17 Bold Headline on Hacker News
2:16 All Tasks are Language Modeling
3:15 Pattern-Exploiting Training Recap
4:40 Masked Word Prediction Demo
5:56 Iterative PET
6:38 Semi-Supervised Knowledge Distillation
8:05 Text-Input, Text-Output to All Tasks are Language Modeling
9:04 Datasets
13:28 GPT-3 Priming: Recap
14:56 PET vs. GPT-3
17:08 PET with Multiple Masks
18:27 Generative to Discriminative Models - วิทยาศาสตร์และเทคโนโลยี
Welcome back bro :)
Thank you so much!
This is a great paper. I hope NLP heads in this direction (it seems like the most industry needed application)
Thanks for sharing your knowledge and understanding!!! 👍
Thank you so much! I hope you found this useful!