👑 LLaVA - The NEW Open Access MultiModal KING!!!

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 ม.ค. 2025

ความคิดเห็น • 52

  • @leafdriving
    @leafdriving ปีที่แล้ว +8

    Re: Time mark from 2:13~2:24
    Officer: "Do you know why I pulled you over today sir?"
    Me: "Yea, I gap like that all the time! Just give it a second, you'll remember!"

    • @1littlecoder
      @1littlecoder  ปีที่แล้ว +8

      Oh my bad. I should have edited it out. Totally missed it. I heard some sound outside and I was trying to hear what is it. Can't believe I missed it!

    • @icykenny92
      @icykenny92 ปีที่แล้ว +1

      @@1littlecoder A.D.D moment. 😅

  • @quotesspace1713
    @quotesspace1713 ปีที่แล้ว +3

    I really love your videos man 💚💚 . it's keep me updated with the AI tech (specially open source ones)

  • @KevinKreger
    @KevinKreger ปีที่แล้ว

    The distraction did not interrupt your flow. Impressive!🤩

  • @mikeyjohnson5888
    @mikeyjohnson5888 10 หลายเดือนก่อน

    We are actually beyond the not hotdog phase. I love it.

  • @im-notai
    @im-notai ปีที่แล้ว +4

    i want you to do a captcha test live, so everyone can make sure, you are not an advance multimodel with speech. we got a server latency at 2:13

    • @1littlecoder
      @1littlecoder  ปีที่แล้ว +2

      Truth be told , I often fail in many captcha test :D

  • @TheXenonite
    @TheXenonite ปีที่แล้ว +6

    i value these open-source model more than closed-source alternatives , even if the closed source version has better results.

    • @1littlecoder
      @1littlecoder  ปีที่แล้ว +1

      Well said! The room for improvement is also high and creates more opportunities for others to build on top of this!

    • @danielnanski838
      @danielnanski838 ปีที่แล้ว

      This is not open source, it's open access.

    • @TheXenonite
      @TheXenonite ปีที่แล้ว

      @@danielnanski838 well then the same applies, i would love to see a decent MLLM come out , and make apps using it.

  • @robin5845
    @robin5845 ปีที่แล้ว

    How do you learn about the models when they just came out?

  • @nixes1636
    @nixes1636 11 หลายเดือนก่อน

    How can I upload multiple images at once to the LLava model?

  • @yuvrajkukreja9727
    @yuvrajkukreja9727 5 หลายเดือนก่อน +1

    open source is awesome source :)

    • @1littlecoder
      @1littlecoder  5 หลายเดือนก่อน

      Yes, you are right

  • @atdigit
    @atdigit ปีที่แล้ว

    Thank you for the video and detailed explanation, I learned a lot from it and am your subscriber. I'm looking forward to the release on how it is possible to use the model in Colab, since for me, apparently, this is the only opportunity to work with it. I would also like to clarify that in business applications, of course, I will have not one, but a series of pictures, since we are talking about automation of working with data. How can I organize work, for example, so that this model analyzes not one, but 100 pictures at once, or one by one? You need to write the appropriate code for Colab, be a programmer for this, do I understand correctly? I also tried to test the model in the Web application, as you showed in the video - it really understands the details of what is depicted quite well, but it is less skilled than GPT-4 in the specialized scientific topics that I need. I also asked it to construct an answer in the form of a phrase in a given format, adding to it in the right place the phrases that I require, as well as maintaining a certain limit on the length of the signature required by me. As a result, insertions are made in arbitrary places instead of the required ones, and the length of the answer does not correspond to the required number of characters. Again, GPT-4 (I use the free version, Bing AI) does this in 90% of situations without errors. What do I need to do to make LLaVa more responsive? Do I need to train it somehow? And one more question, regarding speed, when compared with Bing, generating the answer took several times longer. What ways are there to speed up the work, considering that I technically do not have the ability to run this model on my machine?
    I would be grateful for brief answers to my questions, I wish you success in the development of the channel.

  • @JscottMays
    @JscottMays ปีที่แล้ว

    You are always on top of things, while I tread water. Barely.😂

  • @dohua_ai
    @dohua_ai ปีที่แล้ว

    good stuff, thanks!!!

  • @SAVONASOTTERRANEASEGRETA
    @SAVONASOTTERRANEASEGRETA 11 หลายเดือนก่อน

    but what is the purpose of the model? let us tell us what the image we already see is?

  • @mohankrishnan08
    @mohankrishnan08 ปีที่แล้ว

    The video was informative sir, could you do the video about the evolution of Transformers to LLMs to LMMs.
    It will be really helpful for us to learn.

  • @narenkumar2109
    @narenkumar2109 ปีที่แล้ว

    Hi, Please share video based on how to load multi model (llava) in locally.

  • @IronZk
    @IronZk ปีที่แล้ว

    Does this working on cpu ??

    • @1littlecoder
      @1littlecoder  ปีที่แล้ว +1

      Nope, I've been trying to get the quantized version on colab, some challenges, will make a vidoe if it's successful!

    • @IronZk
      @IronZk ปีที่แล้ว

      @@1littlecoder Maybe one day.

  • @besooab8810
    @besooab8810 9 หลายเดือนก่อน

    I’m wondering how local llava and llama2 have an access to public Internet . And when u ask chatGPT will tell u dose not have an access to public Internet ?

  • @LouisGedo
    @LouisGedo ปีที่แล้ว +1

    👋

  • @stresstherapist
    @stresstherapist ปีที่แล้ว +1

    I just tried it and gave it a picture of vomit (Yes I have that lying aroung on my computer). Just wanted to see if it would recognize it as such.
    This is what it said the pictuure was.
    The image features a piece of food, possibly a meatloaf, sitting on a white surface. The food appears to be a mixture of meat and vegetables, with several carrots scattered around it. The carrots are in various sizes and positions, some closer to the main piece of food and others further away. The overall scene gives the impression of a close-up view of a delicious and healthy meal.
    LOL!!!

    • @1littlecoder
      @1littlecoder  ปีที่แล้ว +1

      It's so advanced that it managed to find the content of vomit LOL

  • @blender_wiki
    @blender_wiki ปีที่แล้ว

    tested on 200+ images: not perfect but very very good. Nice step in the right direction.

    • @nixes1636
      @nixes1636 11 หลายเดือนก่อน

      Did you find how to add multiple images at once?

  • @dumbol8126
    @dumbol8126 ปีที่แล้ว +3

    wtff visual instruct tuning is (actually vqa) my 7th sem miniproject is somewhat replicating their work!

    • @1littlecoder
      @1littlecoder  ปีที่แล้ว

      Woah. How did that mini project go? Did you go on publishing that work somewhere?

    • @dumbol8126
      @dumbol8126 ปีที่แล้ว

      @@1littlecoder nope i mean we have the review in few weeks im working on it

    • @dumbol8126
      @dumbol8126 ปีที่แล้ว +1

      the problem is with trying do it by making use of little resources as possible cuz
      i dont really got any 😛

    • @1littlecoder
      @1littlecoder  ปีที่แล้ว

      Do you mean resources like GPU or something else

    • @dumbol8126
      @dumbol8126 ปีที่แล้ว

      @@1littlecoder yeah gpu