OpenAI CLIP Explained | Multi-modal ML

แชร์
ฝัง
  • เผยแพร่เมื่อ 17 ม.ค. 2025

ความคิดเห็น • 34

  • @ricardojung3849
    @ricardojung3849 2 ปีที่แล้ว +3

    Thanks for reporting, explaining and lastly opening up recent ML!
    I found clip to be very interesting since I always frowned at the lost potential of two different embeddings being arbitrary and methodically separate. This is huge!

    • @jamesbriggs
      @jamesbriggs  2 ปีที่แล้ว +1

      yes there will be plenty more on CLIP and other similar models very soon - some of stuff I've built (and will demo) is awesome and nothing more than zero-shot CLIP, excited to share!

  • @mszak50
    @mszak50 ปีที่แล้ว

    This was really excellent - some of the pieces are starting to make sense

  • @adrianarroyo9839
    @adrianarroyo9839 ปีที่แล้ว +1

    Nice video and explanation! I think on min 28:45 you plotted cos_sim instead of dot_sim!

  • @konichiwatanabi
    @konichiwatanabi ปีที่แล้ว

    Thank you so much for this great walkthrough! Looking forward to more

  • @DallanQuass
    @DallanQuass 2 ปีที่แล้ว

    Great video! Looking forward to your next video diving more into using CLIP for zero-shot classification!

    • @jamesbriggs
      @jamesbriggs  2 ปีที่แล้ว

      Me too, it's fascinating. Thanks for watching!

  • @ismailashraq9697
    @ismailashraq9697 2 ปีที่แล้ว

    This is amazing James. Thanks for the detailed explanation. I am excited for the future CLIP videos 🙂.

    • @jamesbriggs
      @jamesbriggs  2 ปีที่แล้ว

      Thanks Ashraq! As you know, I'm excited for them too

  • @anantzen171
    @anantzen171 ปีที่แล้ว

    10:23 I believe CLIP is an abbreviation of Contrastive Language Image Pretraining

  • @justinmiller7150
    @justinmiller7150 ปีที่แล้ว +1

    Great video. I think you may be plotting the same graph twice though (cos sim). In practice it is almost the same though it would seem.

  •  ปีที่แล้ว

    Thanks James, very good video about CLIP. Funny thing is that you display twice the cos_sim, so the second time it is not the dot_sim which is displayed. And you fighted to find any difference between the two similarity matrices. LOL 🤣

    • @jamesbriggs
      @jamesbriggs  ปีที่แล้ว

      ah did I do that, oops 😅

  • @mvrdara
    @mvrdara 2 ปีที่แล้ว +1

    Excellent explanation! We can build a TH-cam video search engine powered by clip, perhaps you can iterate on the Nlp TH-cam search video you did?

    • @jamesbriggs
      @jamesbriggs  2 ปีที่แล้ว +1

      That's a great idea, but it might be difficult for TH-cam videos where it is just someone talking, as the image embedding would just be something like "a person talking"
      Possibly it could be interesting to embed both the text + images with CLIP, and maybe even an averaged text+image embedding for parts of videos where both the speech + image are important.
      I will think about this more, it's a great idea so thankyou!

  • @valentinfontanger4962
    @valentinfontanger4962 ปีที่แล้ว

    Excellent video

  • @PurpleRivar
    @PurpleRivar ปีที่แล้ว

    Thanks. It is very informative. Can you pls explain and teach us how to do fine tunning on the custome dataset. Pls

  • @AdeleHaghighatHoseiniA
    @AdeleHaghighatHoseiniA 2 ปีที่แล้ว

    Thank you for the good explanation, if we have 2 different embeddings like texts and 3D images, we can use CLIP to predict images?

  • @debashisghosh3133
    @debashisghosh3133 2 ปีที่แล้ว

    Really liked the content...thanks for sharing

    • @jamesbriggs
      @jamesbriggs  2 ปีที่แล้ว

      Thanks for watching!

  • @abdirahmann
    @abdirahmann ปีที่แล้ว

    is there a hosted API for clip where you can provide your image data and get the vectors instead of having to host it yourself, kinda like how you give an input to `ada-002`?

  • @Gabriel-ey5ky
    @Gabriel-ey5ky 2 ปีที่แล้ว

    Great video really ! I have just one thing to say, you should let the images longer in the screen I had to pause the video multiple times to be able to understand them

    • @jamesbriggs
      @jamesbriggs  2 ปีที่แล้ว

      Thanks Gabriel, I head the same from another viewer - will do this going forwards :)

  • @sharanbabu2001
    @sharanbabu2001 2 ปีที่แล้ว

    Nice explanation!

  • @dancinghoka
    @dancinghoka ปีที่แล้ว

    Thanks a lot!

  • @behnamplays
    @behnamplays 2 ปีที่แล้ว

    Excellent content! As a suggestion, can you please keep the images/diagrams a bit longer? They move pretty fast in the video, which means I'll have to rewind the video every now and then.

    • @jamesbriggs
      @jamesbriggs  2 ปีที่แล้ว

      Sure that’s great feedback, thanks!

  • @shaheerzaman620
    @shaheerzaman620 2 ปีที่แล้ว

    fantastic stuff!

  • @pyalgoGPT
    @pyalgoGPT 2 ปีที่แล้ว

    Plz post on Deep Reinforcement Learning tutorials & projects with python !

    • @jamesbriggs
      @jamesbriggs  2 ปีที่แล้ว +1

      Eventually I’m sure I will, RL is very cool

  • @debayudhmitra9432
    @debayudhmitra9432 9 หลายเดือนก่อน

    can you give the github code please

  • @mackenzieclarkson8322
    @mackenzieclarkson8322 9 หลายเดือนก่อน

    Transitions are too flashy and triggering to my eyes. Good explainer however.

  • @davide0965
    @davide0965 หลายเดือนก่อน

    Too much talk and very few illustrations