Aligning LLMs with Direct Preference Optimization

แชร์
ฝัง
  • เผยแพร่เมื่อ 7 ก.พ. 2024
  • In this workshop, Lewis Tunstall and Edward Beeching from Hugging Face will discuss a powerful alignment technique called Direct Preference Optimisation (DPO) which was used to train Zephyr (arxiv.org/abs/2310.16944) and is rapidly becoming the de facto method to boost the performance of open chat models.
    By the end of this workshop, attendees will:
    Understand the steps involved in fine-tuning LLMs for chat applications.
    Learn the theory behind Direct Preference Optimisation and how to apply it in practice with the Hugging Face TRL library.
    Know what metrics to consider when evaluating chat models.
    Take a moment to register for our community forum:
    bit.ly/48UIIve
    Take a moment to register for our short courses here:
    bit.ly/420iXHx
    Workshop Notebooks:
    Notebook #1:
    colab.research.google.com/dri...
    Notebook #2:
    colab.research.google.com/dri...
    Slides:
    docs.google.com/presentation/...
    About DeepLearning.AI
    DeepLearning.AI is an education technology company that is empowering the global workforce to build an AI-powered future through world-class education, hands-on training, and a collaborative community. Take your generative AI skills to the next level with short courses help you learn new skills, tools, and concepts efficiently.
    About Hugging Face
    Hugging Face is an AI company specializing in natural language processing (NLP) and machine learning, and is known for its open-source contributions and collaborative approach to AI research and development. The company is famous for developing the Transformers library, which offers a wide range of pretrained models and tools for a variety of NLP tasks, making it easier for researchers and developers to implement state-of-the-art AI solutions. Hugging Face also fosters a vibrant community for AI enthusiasts and professionals, providing a platform for sharing models, datasets, and research, which significantly contributes to the advancement of AI technology.
    Speakers:
    Lewis Tunstall, Machine Learning Engineer, Hugging Face
    / lewis-tunstall
    Edward Beeching, Research Scientist, Hugging Face
    / ed-beeching-3553b468
  • บันเทิง

ความคิดเห็น • 18

  • @eliporter3980
    @eliporter3980 4 หลายเดือนก่อน +2

    I'm learning a lot from these talks, thank you for having them.

  • @NitinPasumarthy
    @NitinPasumarthy 4 หลายเดือนก่อน +3

    The best content I have seen in a while. Enjoyed both the theory and the practical notes from both speakers! Huge thanks dlai for organizing this event

  • @PritishYuvraj
    @PritishYuvraj 3 หลายเดือนก่อน +1

    Excellent description between PPO and DPO! Kudos

  • @vijaybhaskar5333
    @vijaybhaskar5333 4 หลายเดือนก่อน +3

    Excellent topic. Well explained. One of the best videos on this subject I saw recently. Continue your goodwork😊

  • @katie-48
    @katie-48 4 หลายเดือนก่อน +1

    Great presentation, thank you very much!

  • @user-rx5pp3hh1x
    @user-rx5pp3hh1x 4 หลายเดือนก่อน +2

    cut to the chase - 3:30
    questions on DPO - 27:37
    practical deep-dive - 30:19
    question - 53:32

  • @jeankunz5986
    @jeankunz5986 5 หลายเดือนก่อน +1

    great presentation. Congratulations.

  • @amortalbeing
    @amortalbeing 5 หลายเดือนก่อน +2

    This was amazing thank you everyone.
    One thing though, if thats possible, it would be greatly appreciated if you could record in 1080p where the details/text on the slides are visible and easier to consume.
    Thanks a lot again

    • @MatijaGrcic
      @MatijaGrcic 5 หลายเดือนก่อน +3

      Check out notebooks and slides in the description.

    • @amortalbeing
      @amortalbeing 5 หลายเดือนก่อน

      @@MatijaGrcic Thanks a lot, downloaded the slides

  • @PaulaLeonova
    @PaulaLeonova 4 หลายเดือนก่อน +3

    At 29:40 Lewis mentions an algorithm that requires fewer training samples, what is the name of it? I heard "data", but don't think that is correct. If anyone knows, would you mind replying?

    • @user-rx5pp3hh1x
      @user-rx5pp3hh1x 4 หลายเดือนก่อน

      Possibly, this paper, "Rethinking Data Selection for Supervised Fine-Tuning" arxiv.org/pdf/2402.06094.pdf

    • @ralphabrooks
      @ralphabrooks 4 หลายเดือนก่อน

      I am also interested in hearing more about this "data" algorithm. Is there a link to a paper or blog on it?

  • @austinmw89
    @austinmw89 4 หลายเดือนก่อน

    Curious if you compared SFT on all data vs. training on completions only?

  • @TheRilwen
    @TheRilwen 4 หลายเดือนก่อน +1

    I'm wondering why simple techniques, such as sample boosting, increasing errors for high ranked examples, or attention layer wouldn't work in place of RLHF. It seems like a very convoluted and inefficient way of doing a simple thing - which convinces me that I'm missing something :-)

  • @iseminamanim
    @iseminamanim 5 หลายเดือนก่อน

    Interested

  • @MacProUser99876
    @MacProUser99876 4 หลายเดือนก่อน

    How DPO works under the hood: th-cam.com/video/Ju-pFJNfOfY/w-d-xo.html