Small Language Models Are Also Few-Shot Learners

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 ก.ค. 2024
  • This video explains the latest work in Pattern-Exploiting Training. This paper shows that this distillation scheme from knowledge captured in pre-trained language models to discriminative classifiers can also work in the Few-shot setting. This is compared directly with GPT-3's performance using 32 labeled examples for different tasks like BoolQ or Winograde Schema. This is very interesting, but not a fair, apples-to-apples, comparison with GPT-3. Thanks for watching! Please Subscribe!
    Paper Links:
    Paper Link: arxiv.org/abs/2009.07118
    First PET Paper: arxiv.org/pdf/2001.07676.pdf
    Next Word Prediction Demo: github.com/renatoviolin/next_...
    Hacker News Reaction: news.ycombinator.com/item?id=...
    HuggingFace NLP Viewer: huggingface.co/nlp/viewer/?da...
    GPT-3: arxiv.org/pdf/2005.14165.pdf
    SimCLRv2 (if curious about semi-supervised knowledge distillation in vision): arxiv.org/pdf/2006.10029.pdf
    Measuring Massive Multitask Language Understanding: arxiv.org/pdf/2009.03300.pdf
    GenAug: arxiv.org/pdf/2010.01794.pdf
    Efficient Transformers Survey: arxiv.org/abs/2009.06732
    T5: ai.googleblog.com/2020/02/exp...
    Thanks for watching!
    Chapters
    0:00 Introduction
    1:17 Bold Headline on Hacker News
    2:16 All Tasks are Language Modeling
    3:15 Pattern-Exploiting Training Recap
    4:40 Masked Word Prediction Demo
    5:56 Iterative PET
    6:38 Semi-Supervised Knowledge Distillation
    8:05 Text-Input, Text-Output to All Tasks are Language Modeling
    9:04 Datasets
    13:28 GPT-3 Priming: Recap
    14:56 PET vs. GPT-3
    17:08 PET with Multiple Masks
    18:27 Generative to Discriminative Models
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 5

  • @dawwdd
    @dawwdd 3 ปีที่แล้ว +10

    Welcome back bro :)

  • @DistortedV12
    @DistortedV12 2 ปีที่แล้ว

    This is a great paper. I hope NLP heads in this direction (it seems like the most industry needed application)

  • @MrjbushM
    @MrjbushM 3 ปีที่แล้ว +2

    Thanks for sharing your knowledge and understanding!!! 👍

    • @connorshorten6311
      @connorshorten6311  3 ปีที่แล้ว

      Thank you so much! I hope you found this useful!