LLaVA 1.6 is here...but is it any good? (via Ollama)

แชร์
ฝัง
  • เผยแพร่เมื่อ 3 ก.พ. 2024
  • LLaVA (or Large Language and Vision Assistant) recently released version 1.6. In this video, with help from Ollama, we're going to compare this version with 1.5 to see how it's improved over the last few months. We'll see how well it describes a photo of me, if it can create a caption for an image, how well it extracts text/code from images, and whether it can understand a diagram.
    Resources
    * Blog - www.markhneedham.com/blog/202...
    * LLaVA 1.6 release - llava-vl.github.io/blog/2024-...
    * Ollama - ollama.ai/
    * Ollama Python Library - pypi.org/project/ollama/
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 36

  • @tiredofeverythingnew
    @tiredofeverythingnew 4 หลายเดือนก่อน +2

    Thanks Mark great video. Loving the content lately.

    • @learndatawithmark
      @learndatawithmark  4 หลายเดือนก่อน

      Glad you liked it! Let me know if there's any other topics you'd like me to cover.

  • @PoGGiE06
    @PoGGiE06 2 หลายเดือนก่อน

    Great video, thanks.

  • @user-yu2wr5qf7g
    @user-yu2wr5qf7g หลายเดือนก่อน

    thx. very helpful. subscribed.

  • @munchcup
    @munchcup 4 หลายเดือนก่อน +1

    I find it easy that for more accurate results in text images to use pytessaract instead of llms but a description of an image llms serve well.Hope this helps.

    • @learndatawithmark
      @learndatawithmark  4 หลายเดือนก่อน

      Oh interesting. I noticed that ChatGPT was using pytessaract so perhaps they aren't even using GPT4-V when you ask it to extract text from images at all! Didn't think of that

  • @techgeekal
    @techgeekal 4 หลายเดือนก่อน

    Hey great content ! In your experience, how much is the performance difference between ollama version of the model (compressed) and its original version?

    • @learndatawithmark
      @learndatawithmark  4 หลายเดือนก่อน

      I tried the last few examples that didn't work well on the 7b model with the 13b and 34b models and I didn't see any better results. My impression is that this model is good with photos but struggles with other types of image.

  • @geoffreygordonashbrook1683
    @geoffreygordonashbrook1683 4 หลายเดือนก่อน

    What size version of this model were you using? Have you compared variants such as Bakllava? What did you need to do to get open-AI to work? If you could show how to run various llava models on huggingface, e.g. from The Bloke, that would be swell. Many thanks for all the helpful videos and insights!

    • @learndatawithmark
      @learndatawithmark  4 หลายเดือนก่อน

      I think it was this one - 7b-v1.6. There are 13b and 34b versions too, but i haven't tried those yet. They've also got a bunch of others, some based on Mistral/Vicuna (ollama.ai/library/llava/tags)
      Not sure how different those ones would be - I need to give them a try!
      I did actually have a LLaVA 1.5 vs Bakllava example in progress but then stopped when I saw that there was a new LLaVA model out. I'll have to get back to it.
      Re: OpenAI - When I asked it to extract the text it kept throwing an exception when using pytessaract. So then I asked it what the code was doing (which it got right) and then I asked it to extract the code. And somehow that combination worked?!

    • @josefsteiner8616
      @josefsteiner8616 4 หลายเดือนก่อน

      @@learndatawithmark i tried it with the 34b and i think it wasn't bad. i only had a screenshot from the youtubevideo of the diagram so the quality wasn't really good. maybe you can try it with the original image. that's the Answer:
      "
      The image you've provided appears to be a diagram illustrating the concept of data transformation and processing from a simple structure to a more complex one.
      On the left side, there is a basic representation with two boxes connected by an arrow, labeled "From this ..." This could represent data in its most raw or unstructured form, where information may not be processed or integrated into any system yet.
      On the right side, we see a more sophisticated diagram representing a network or a set of interconnected systems. There are multiple boxes connected with lines indicating relationships or data flow. Each box is labeled with various terms such as "Node A," "Node B,"
      "Process," and "Service," which suggest that this represents a complex system where data goes through various processes and services before it reaches its final form.
      The arrow from the left to the right with the label "... to this!" implies that the data moves from a simple state on the left to a more structured or processed state on the right, possibly within a larger network of systems or as part of a workflow processing
      system. This could be used in educational materials to explain concepts such as data integration, data flow in complex systems, or the transformation process in information technology infrastructure.
      "

  • @JoeBurnett
    @JoeBurnett 4 หลายเดือนก่อน +3

    That arrow wasn’t pointing to the left as 1.6 indicated….

    • @learndatawithmark
      @learndatawithmark  4 หลายเดือนก่อน

      Hah, good catch! Dunno how I missed that :D

  • @jonyfrany1319
    @jonyfrany1319 2 หลายเดือนก่อน +1

    How can we use llava to control the mouse ?

    • @learndatawithmark
      @learndatawithmark  2 หลายเดือนก่อน

      To control the mouse? I don't think it can do that - why do you want it to do that?

    • @vSouthvPawv
      @vSouthvPawv หลายเดือนก่อน

      My thought is to overlay a grid on a screenshot of your desktop (write a function to take a screenshot and apply the grid when you send a prompt or whatever), then ask LlaVa to respond with the grid location closest to the spot you want to click. Clean that response and send it to pyautogui to move the mouse to the correct spot. Prompt engineer to taste.

  • @troedsangberg
    @troedsangberg 2 หลายเดือนก่อน

    Comparing 7b models to ChatGPT is of course slightly misleading. I'm getting satisfactory results from 13b (fits in my GPU) and am quite happy using it for image captioning specifically.

    • @learndatawithmark
      @learndatawithmark  2 หลายเดือนก่อน

      Oh interesting. I tried 13b and 34b and was getting pretty similar results to what I showed in the video. Now you've got me wondering why I wasn't seeing better captions!

  • @tapos999
    @tapos999 หลายเดือนก่อน

    which mac it was running on?

  • @Ravi-sh5il
    @Ravi-sh5il หลายเดือนก่อน

    Hai am getting this:
    >>> /load llava:v1.6
    Loading model 'llava:v1.6'
    Error: model 'llava:v1.6' not found

    • @learndatawithmark
      @learndatawithmark  หลายเดือนก่อน

      Did you pull it down to your machine? See ollama.com/library/llava

    • @Ravi-sh5il
      @Ravi-sh5il หลายเดือนก่อน +2

      @@learndatawithmark Thank I forgot to pull :)

  • @bennguyen1313
    @bennguyen1313 3 หลายเดือนก่อน

    I have pdf files of handwritten data that I'd like to OCR, perform calculations and finally edit or append the pdf with the results.
    I like the idea of using a Custom GPT, but only GPT4 Plus subscribers could use it. So I'd prefer a standalone browser or desktop solution, that anyone drag and drop a file into. However, not sure if ChatGPT4's API assistant has all the Vision / Ai PDF Plugin support.
    If using Ollama, would anyone who wants to use my application also need to install the 20GB Ollama?

    • @learndatawithmark
      @learndatawithmark  3 หลายเดือนก่อน +1

      You'd need to host an LLM somewhere if you want to create an application that other people can use. Unless you're having them run the app locally, I think it'd be better to use one of the LLM hosting services.
      Maybe something like replicate? replicate.com/yorickvp/llava-13b/api

  • @annwang5530
    @annwang5530 หลายเดือนก่อน

    Hi can llava be integrated with groq?

    • @learndatawithmark
      @learndatawithmark  หลายเดือนก่อน

      I don't think groq have any of the multi modal models available at the moment. But there are a bunch of GPU as a service providers that keep popping up, so it should be possible to deploy it to one of them.
      One I played with a couple of weeks ago is Beam and now I kinda wanna see if I can deploy LLaVA there :D
      th-cam.com/video/WY6loJ6DYBA/w-d-xo.html

    • @annwang5530
      @annwang5530 หลายเดือนก่อน

      @@learndatawithmark thanks man

  • @xiaofeixue7001
    @xiaofeixue7001 หลายเดือนก่อน

    Is this the paid version of ChatGPT?

    • @learndatawithmark
      @learndatawithmark  หลายเดือนก่อน

      Yes that's GPT-4. I don't think you can upload images to GPT-3.5?

  • @Ravi-sh5il
    @Ravi-sh5il หลายเดือนก่อน +1

    how to load the 23.7 GB lava-v1.6-34b.Q5_K_S.gguf ?,am currently having a 4.7 GB
    Can you please help me with this Brother.?
    Thanks in advance!

    • @learndatawithmark
      @learndatawithmark  หลายเดือนก่อน

      Are you getting an error?

    • @Ravi-sh5il
      @Ravi-sh5il หลายเดือนก่อน

      @@learndatawithmark am unable to figure it outt as of how to load the 23GB file into ollama
      please help.give the command that can pull the 27gb

    • @Ravi-sh5il
      @Ravi-sh5il หลายเดือนก่อน

      @@learndatawithmark Actuallly I dont know how to load the 23.7GB Llava on ollama

  • @varadhamjyoshnadevi1545
    @varadhamjyoshnadevi1545 4 หลายเดือนก่อน

    HI
    I have few doubts on openai . Could you pls share you mail id ?