Multimodal AI from First Principles - Neural Nets that can see, hear, AND write.

แชร์
ฝัง
  • เผยแพร่เมื่อ 10 พ.ย. 2024

ความคิดเห็น • 25

  • @thegigasurgeon
    @thegigasurgeon หลายเดือนก่อน +1

    very clear survey of multimodal models. Everything under one roof. Great work

    • @avb_fj
      @avb_fj  หลายเดือนก่อน

      Thanks a lot!

  • @madsfrederiksen6213
    @madsfrederiksen6213 ปีที่แล้ว +4

    Great and clear video! Heard about multimodal models for the first time today, and i already feel like i have a better grasp of it, thanks to you :)

  • @boogati9221
    @boogati9221 5 หลายเดือนก่อน +3

    Dude this video was so fucking good. Keep it up.

  • @xxlvulkann6743
    @xxlvulkann6743 3 หลายเดือนก่อน +1

    This was a useful summary for finding papers to research developments in multimodal machine learning models!

    • @avb_fj
      @avb_fj  3 หลายเดือนก่อน

      Thanks! Super glad you found the video resourceful!

  • @meet_minimalist
    @meet_minimalist 10 หลายเดือนก่อน +2

    Excellent video with all the paper references. Lot to read and learn from papers. Thanks. :)

    • @avb_fj
      @avb_fj  10 หลายเดือนก่อน

      Thanks!🙏🏽

  • @joshuatettey7771
    @joshuatettey7771 3 หลายเดือนก่อน +1

    Awesome video. Thanks mate🤩

  • @tomm9716
    @tomm9716 3 หลายเดือนก่อน

    Really good stuff mate, subbed

  • @syoyazhou8657
    @syoyazhou8657 ปีที่แล้ว +1

    Like your videos. Explain things in a very clear way. Thx for sharing.

    • @avb_fj
      @avb_fj  ปีที่แล้ว

      Thank you!

    • @xspydazx
      @xspydazx 7 หลายเดือนก่อน

      CODE IS BETTER ??
      rom transformers import VisionEncoderDecoderModel, VisionTextDualEncoderProcessor, AutoImageProcessor, AutoTokenizer
      print('Add Vision...')
      # ADD HEAD
      # Combine pre-trained encoder and pre-trained decoder to form a Seq2Seq model
      Vmodel = VisionEncoderDecoderModel.from_encoder_decoder_pretrained(
      "google/vit-base-patch16-224-in21k", "LeroyDyer/Mixtral_AI_Tiny"
      )
      _Encoder_ImageProcessor = Vmodel.encoder
      _Decoder_ImageTokenizer = Vmodel.decoder
      _VisionEncoderDecoderModel = Vmodel
      # Add Pad tokems
      LM_MODEL.VisionEncoderDecoder = _VisionEncoderDecoderModel
      # Add Sub Components
      LM_MODEL.Encoder_ImageProcessor = _Encoder_ImageProcessor
      LM_MODEL.Decoder_ImageTokenizer = _Decoder_ImageTokenizer
      LM_MODEL
      This is how you add vision to llm (you can embed the head inside )
      print('Add Audio...')
      #Add Head
      # Combine pre-trained encoder and pre-trained decoder to form a Seq2Seq model
      _AudioFeatureExtractor = AutoFeatureExtractor.from_pretrained("openai/whisper-small")
      _AudioTokenizer = AutoTokenizer.from_pretrained("openai/whisper-small")
      _SpeechEncoderDecoder = SpeechEncoderDecoderModel.from_encoder_decoder_pretrained("openai/whisper-small","openai/whisper-small")
      # Add Pad tokems
      _SpeechEncoderDecoder.config.decoder_start_token_id = _AudioTokenizer.cls_token_id
      _SpeechEncoderDecoder.config.pad_token_id = _AudioTokenizer.pad_token_id
      LM_MODEL.SpeechEncoderDecoder = _SpeechEncoderDecoder
      # Add Sub Components
      LM_MODEL.Decoder_AudioTokenizer = _AudioTokenizer
      LM_MODEL.Encoder_AudioFeatureExtractor = _AudioFeatureExtractor
      LM_MODEL
      This is how you can add vision :

  • @AI_ML_DL_LLM
    @AI_ML_DL_LLM ปีที่แล้ว +1

    Wow, there is lots of works behind it, thank you

    • @avb_fj
      @avb_fj  ปีที่แล้ว

      Haha thanks for the comment! It’s an emerging area, and a lot of groundbreaking research really has happened in the past few years.

  • @ahmed_hefnawy1811
    @ahmed_hefnawy1811 8 หลายเดือนก่อน +1

    Excellent

  • @vobbilisettyveera2973
    @vobbilisettyveera2973 ปีที่แล้ว +1

    awesome!!!!!!!!!!

  • @IsmailIfakir
    @IsmailIfakir หลายเดือนก่อน

    some multimodal llm can fine-tuning for sentiment analysis

  • @420_gunna
    @420_gunna 8 หลายเดือนก่อน

    7:55 lol

    • @avb_fj
      @avb_fj  8 หลายเดือนก่อน

      Honest reactions lol😅

  • @deliciouspops
    @deliciouspops ปีที่แล้ว

    do you think you should tune your audio levels or what? according to youtube, i am your 666th view

    • @avb_fj
      @avb_fj  ปีที่แล้ว

      Always open for feedback. What kind of tuning are we talking about?

    • @avb_fj
      @avb_fj  ปีที่แล้ว

      @@LonewolfeSlayer Sounds good... something to keep in mind for my next one. :)