CogVLM-2: Multimodal LLAMA-3 WITH VISION (Opensource) Beats GPT-4O, CLAUDE-3 & GEMINI 1.5 PRO

แชร์
ฝัง
  • เผยแพร่เมื่อ 20 พ.ค. 2024
  • In this video, We'll be talking about a new Opensource model named CogVLM-2 which is a model based on Llama-3. It is a multimodal model that allows image & video input. You can ask it for questions within your image. This model is beating GPT-4O, GPT-4V, ChatGPT, Claude-3, Gemini, Mixtral 8x22b & Llama 3 in multiple benchmarks. This model is basically Llama-3 with vision. It can be used to create own ChatGPT clone and Github Copilot alternative.
    [Key Takeaways]
    🌟 Optimized Answer: The answer is concise, producing a more streamlined result with fewer lines, which is efficient but has minor issues.
    🤔 Assumption Issues: The flow chart made an incorrect assumption about the number guessing range, highlighting a potential flaw in logic.
    🆚 GPT-4O vs. CogVLM: GPT-4O has the upper hand in accuracy and reasoning, but CogVLM comes close, showing competitive performance.
    🍕 Calorie Calculation: When calculating the calories for a product, CogVLM faltered, whereas GPT-4O provided the correct answer, demonstrating superior accuracy.
    😂 Meme Explanation: Both AI models explained a meme, but GPT-4O managed to do so more accurately, despite both having a slightly off-putting formal approach.
    📊 CSV Conversion: Converting a table screenshot to CSV format was done accurately by both CogVLM and GPT-4O, showing that both are proficient in this task.
    📏 Object Comparison: When comparing the sizes of objects in an image, both models gave incorrect answers, leading to a tie in performance.
    🚗 Driving Directions: Both AI models provided the correct driving directions when given an image, demonstrating their capability in processing and interpreting visual information correctly.
    🔧 Open Source Utility: CogVLM, being open source, offers great utility for various tasks with some tweaking, making it a valuable tool for many users.
    👍 Engagement Call: Encourage viewers to interact by liking, subscribing, and using the Super Thanks option, fostering a supportive community around the channel.
    🤖 CogVLM is a promising open-source alternative: With some tweaking, CogVLM can be a useful tool, especially considering its open-source nature.
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 18

  • @AGI-Bingo
    @AGI-Bingo 26 วันที่ผ่านมา +7

    Nice! Now let's see SOTA Opensource Voice2Voice realtime model with emotion (like gpt4o)

  • @HideBuz
    @HideBuz 26 วันที่ผ่านมา +7

    GPU sales for AI will go through the roof because every developer will want to run their own Jarvis AI assistant.

    • @AICodeKing
      @AICodeKing  26 วันที่ผ่านมา +3

      True if people will be able to buy one after Sam Altman takes it all.

  • @Maisonier
    @Maisonier 26 วันที่ผ่านมา

    This is amazing. I'm waiting for GGUF model for LMStudio

  • @siddhantashtekar5806
    @siddhantashtekar5806 26 วันที่ผ่านมา +4

    Make videos on hugging face spaces as well 🙂

  • @settlece
    @settlece 26 วันที่ผ่านมา

    thank you AICodeKing

  • @simeonnnnn
    @simeonnnnn 26 วันที่ผ่านมา +2

    God damn!

  • @Termonia
    @Termonia 26 วันที่ผ่านมา +3

    Something is going wrong here. Many of the new LLaMA 3 models have coding issues, like unclosed parentheses or missing closing parts. What's worse is that this isn't just happening in one model but across several based on LLaMA 3. Has no one looked into what's going on? This is a recurring problem.

    • @AICodeKing
      @AICodeKing  26 วันที่ผ่านมา +2

      Yes, I thought only i am facing this. But, it seems this is a wider issue with fine-tuned models.

    • @RobynLeSueur
      @RobynLeSueur 26 วันที่ผ่านมา +2

      If I'm understanding correctly, those are the system tokens. It's not that Llama3 has a coding issue, it's that whatever you're using to run it doesn't recognise those as special tokens and strip them.

  • @DevsDoCode
    @DevsDoCode 26 วันที่ผ่านมา +1

    Hey could you please tell which website or app you used for gpt 4o (other than openai website)

    • @AICodeKing
      @AICodeKing  26 วันที่ผ่านมา +1

      Lmsys arena

  • @kasrasadrazamy9852
    @kasrasadrazamy9852 26 วันที่ผ่านมา

    What is the name of the platform you are using GPT-4o Vision?

    • @AICodeKing
      @AICodeKing  26 วันที่ผ่านมา

      Lmsys Arena

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w 26 วันที่ผ่านมา +2

    Why such a name for this LLM?

    • @AICodeKing
      @AICodeKing  26 วันที่ผ่านมา +2

      I think they are inspired by Google. But, VLM here stands for Visual Language Model and Cog is Cognitive, I guess.

  • @maris3926
    @maris3926 23 วันที่ผ่านมา

    Too few languages, so not for translation.

  • @harundemirtas1181
    @harundemirtas1181 26 วันที่ผ่านมา +1

    How can we install and use kalilinux windows in an open source way? Open sources that are free and full, not demo.