Instruction Fine-Tuning and In-Context Learning of LLM (w/ Symbols)

แชร์
ฝัง
  • เผยแพร่เมื่อ 6 ต.ค. 2024
  • New insights in "Instruction Fine-Tuning" and "In-Context Learning" of LLM: The evolution to "Symbol Fine Tuning" of LLMs. From FLAN-PaLM 8B to FLAN-PaLM 540B models.
    This video summarizes four scientific pre-prints on Instruction fine-tuning LLM and prompt engineering (ICL in-context learning) of Large Language Models (LLMS):
    In-Context Retrieval-Augmented Language Models
    Ori Ram Yoav Levine Itay Dalmedigos Dor Muhlgay Amnon Shashua Kevin Leyton-Brown Yoav Shoham by AI21 Labs
    arxiv.org/abs/...
    What learning algorithm is in-context learning? Investigations with linear models
    Ekin Akyurek Dale Schuurmans Jacob Andreas Tengyu Ma Denny Zhou
    arxiv.org/abs/...
    May 16, 2023
    SYMBOL TUNING IMPROVES IN-CONTEXT LEARNING
    IN LANGUAGE MODELS
    Jerry Wei Le Hou Andrew Lampinen Xiangning Chen Da Huang
    Yi Tay Xinyun Chen Yifeng Lu Denny Zhou Tengyu Ma Quoc V. Le
    arxiv.org/pdf/...
    March 9, 2023
    LARGER LANGUAGE MODELS DO IN-CONTEXT
    LEARNING DIFFERENTLY
    Jerry Wei Jason Wei Yi Tay Dustin Tran Albert Webson
    Yifeng Lu Xinyun Chen Hanxiao Liu Da Huang Denny Zhou
    Tengyu Ma
    arxiv.org/pdf/...

ความคิดเห็น • 13

  • @mulderbm
    @mulderbm 4 หลายเดือนก่อน

    Good to watch this on my Sunday, and to see ICL being interesting, even almost a year ago

  • @Jeratan
    @Jeratan ปีที่แล้ว +6

    The shockingly rapid pace means that it's fairly normal for what was once considered a standard approach to quickly become suboptimal

  • @ericsabbath
    @ericsabbath ปีที่แล้ว +2

    You're the best! Thank you one more time 🙏🏿

  • @umbertosurricchio5365
    @umbertosurricchio5365 ปีที่แล้ว +1

    Thank You so much! Really fascinating video.

  • @kevon217
    @kevon217 ปีที่แล้ว

    Your videos are so illuminating. Thanks!

  • @malikrumi1206
    @malikrumi1206 2 หลายเดือนก่อน

    You did not mention it, so maybe it's not there and I am chasing ghosts, but, why can't this strategy be used in a DDOS style / prompt injection attack that changes everything your model knows and forces you to shut down until you can get a new model up?

  • @theshrubberer
    @theshrubberer ปีที่แล้ว +1

    wonderful video but i have one critical question that was bothering me while i watched. I have been deep diving in fine tuning for a few weeks and have started a project to do text classification. I am using Bert (to start) and dataset consists of "example text" and "class" where the class is a multi class value expressed as a text label.Beautiful
    important point, the multi class text label values DO have semantic meaning vis the text examples, but see below for my confusion.
    i use HFace xformer for TextClassification . Beautiful!
    I designate my class field to be type ClassLabel and write the mapping for my class labels. Beautiful.
    My understanding WAS that during the fine tuning training the system was learning to categorize my example text inputs according to an integer value for each "class label". Is this not correct understanding? thus the int2string and string2int capability of ClassLabel, no?
    So given that background understanding i am watching this video and asking myself how is what I just described not already symbolic fine tuning, since the label that the system is training on is an arbitrary integer value for my ClassLabel and not the semantically meaningful (to me) label string?
    Do you see my confusion? i was assuming that the basic ClassLabel based text classification fine tuning already was symbolic and not semantic. This understanding /misunderstanding was so strong that when my partner doing inference testing mentioned that he thought maybe the system was just looking for proximity between the input text and the set of possible class texts, i said "no it was trainined on a integer stand in for each class, not the class text itself. So the semantics of the class label are irrelevant "
    But now because of this video i am completely confused. What am I misunderstanding? this is killing me with curiosity.

    • @ZZ-yw4xf
      @ZZ-yw4xf ปีที่แล้ว

      I am watching this video and just picked this up. Great question. I am not confident to give an answer but here I try and I'd lover some feedback. My understanding (again which may be wrong) is that BERT and the LLMs covered in the videos are different. You are right in saying that when you train a classifier using BERT, it is essentially treating the labels as a symbol. I.e., it does not understand the meaning of the symbol. But the LLMs covered in the video are generative models, which are different from BERT.
      Generative models essentially tries to model tasks as a next-word prediction task so I would assume that it does learn the meaning of labels and predict labels as if they are 'next words' problems. BERT does not do that. If this understanding is wrong I hope someone can correct me.

  • @소금-v8z
    @소금-v8z 5 หลายเดือนก่อน

    wait jerry wei and jason wei are brothers????

  • @theshrubberer
    @theshrubberer ปีที่แล้ว

    what does it mean to have a training dataset with neither instructions nor labels? My interpretation is that such a dataset us just a corpus of textual data and nothing more. Is that correct?

    • @robxmccarthy
      @robxmccarthy ปีที่แล้ว

      This is found on page 10 of the research paper "Symbol tuning" and refers to using Instructions and Labels in the in context learning. Stating that using both instructions and labels returns better results than if one or both are missing.
      It's not the easiest paper to read though and I may have missed it elsewhere.

  • @luiswhatshisname7667
    @luiswhatshisname7667 ปีที่แล้ว

    @13:00 ... so the larger models are easier to convince that the earth is flat or the sun is a rock? The lesson is that the larger the data training model, the more garbage it contains?

    • @code4AI
      @code4AI  ปีที่แล้ว

      You completely misunderstood the message. But never mind ...