Llama 3.2-vision: The best open vision model?

แชร์
ฝัง
  • เผยแพร่เมื่อ 11 ธ.ค. 2024

ความคิดเห็น • 18

  • @marwangs686
    @marwangs686 13 วันที่ผ่านมา +1

    I think its good enough, but what if it was uncensored?

  • @focusedstudent464
    @focusedstudent464 19 วันที่ผ่านมา

    can you make a viedo on comparing the llava 7b vs llam3.211b vision
    please

  • @lovol2
    @lovol2 24 วันที่ผ่านมา +3

    Not sure what it is with TH-camrs and vision models.
    These are not the use cases for business or even pet projects.
    If we want OCR, we'll use OCR. What you want is to intelligently answer questions.
    E.g. What level is the water at on this reservoir - the image should have a measure stick on the water.
    How many boxes are in this image.
    Does this look safe or dangerous
    Etc.

    • @learndatawithmark
      @learndatawithmark  23 วันที่ผ่านมา +2

      Oh interesting, those are good ideas. I suppose you could also do how many parking spaces are free or something like that.
      I do use ChatGPT for the first three examples, although admittedly not the last one comparing the footballers. This is the first open model (that I've tried) that pulls code out of an image - all the other ones I've tried start hallucinating at some point.
      The TH-cam thumbnail critique as well - I think that's more than just OCR?
      I do sometimes use vision models to interpret graphs/charts and compare them to each other, but I haven't tried that with Llama 3.2 vision

    • @cherubin7th
      @cherubin7th 14 วันที่ผ่านมา +1

      OCR still sucks. Always gets it wrong in real life. But good point!

  • @MichaelDeeringMHC
    @MichaelDeeringMHC 13 วันที่ผ่านมา

    Face identification is on the list of things the tech companies don't want you doing with the models. It ties into to many of the dystopian futures depicted in fiction.

    • @learndatawithmark
      @learndatawithmark  12 วันที่ผ่านมา

      I found most of the Llava models were able to identify famous people at least! I mean it doesn't really matter, but it's interesting that they seem to censor it like this

    • @notgiven-nv5ix
      @notgiven-nv5ix 4 วันที่ผ่านมา

      Facial identification would be moving into their lane. How do you think they make all their money with free products? You are the product. Google DeepFace for one example related to meta. (google will auto correct to deepfake so search DeepFace -deepfake)

  • @jovokrneta1412
    @jovokrneta1412 23 วันที่ผ่านมา

    What is the hardware configuration you are using?

    • @learndatawithmark
      @learndatawithmark  18 วันที่ผ่านมา +1

      I have a Mac M1 Max with 64GB RAM that it splits between the GPU and CPU

  • @sebingtoon
    @sebingtoon 25 วันที่ผ่านมา

    As for the model repeatedly failing to identify Ronaldo in the picture, perhaps lowering the temperature would be an idea? EDIT: I've tried playing with the temperature (same LLM and same image), but it doesn't seem to have a significant effect on the results (except when temp=0, of course). After several runs I'd say Ronaldo is identified about 33% of the time.

    • @RuairiODonnellFOTO
      @RuairiODonnellFOTO 24 วันที่ผ่านมา

      What prompts did you use for the 30% success? Did temp change any behaviour/success?

    • @sebingtoon
      @sebingtoon 24 วันที่ผ่านมา

      @@RuairiODonnellFOTO The prompt I used was a double question, something like "Can you describe the picture? Who is the person depicted?" I've tried a few more runs and now I'd say it's less than 33%, maybe 10% success. Most of the time the model says it cannot provide names of people based on their photograph. As I said, changing the temperature doesn't seem to have a measurable effect on the answers.

    • @learndatawithmark
      @learndatawithmark  23 วันที่ผ่านมา

      It's weird why it can't pick it up. IIRC all the Llava models were able to identify him and for every other example that I tried Llama 3.2 vision is better than Llava.

    • @learndatawithmark
      @learndatawithmark  23 วันที่ผ่านมา +1

      I wonder whether it's deliberately not identifying people. It's sometimes even reluctant to say anything at all about a photo e.g. when I give it photos of myself

    • @sebingtoon
      @sebingtoon 21 วันที่ผ่านมา

      ​@@learndatawithmark I think you're right. It looks like they've done something during training (or after) that makes it behave that way. Obviously, it's not been a total success. No matter how well trained LLMs are, they are still hard to tame!

  • @RuairiODonnellFOTO
    @RuairiODonnellFOTO 24 วันที่ผ่านมา

    Has anyone tried the 90B model to see if it can name messi or ronaldo?

    • @learndatawithmark
      @learndatawithmark  18 วันที่ผ่านมา

      I think that would be insanely slow on my machine so I haven't tried it!