Llama 3.2 goes Multimodal and to the Edge

แชร์
ฝัง
  • เผยแพร่เมื่อ 6 พ.ย. 2024

ความคิดเห็น • 24

  • @toadlguy
    @toadlguy หลายเดือนก่อน +5

    These small models are not only good for low memory situations but also where you can have multiple models run at once. Work is being done where you can run 405B by loading and unloading layers (epochs) in small memory configurations to run more advanced models much slower and run these small models for routing and interactivity at the same time. All this could be done locally in situations where you don’t want to send the data it is working with (like personal information) off the device.

    • @samwitteveenai
      @samwitteveenai  หลายเดือนก่อน

      Very good point about multiple models, totally agree.

  • @comfixit
    @comfixit หลายเดือนก่อน

    Yes please a video on fine tuning these models would be awesome. Also videos showing the tiny models running on edge devices and or in browser would be super cool as well.

  • @i_accept_all_cookies
    @i_accept_all_cookies หลายเดือนก่อน

    This is great news! Can't wait to start using the lightweight models.

  • @ibrahimhalouane8130
    @ibrahimhalouane8130 หลายเดือนก่อน

    No intro no music right to the point amazing work Sam.I wish to know your opinion about unsloth ?

    • @samwitteveenai
      @samwitteveenai  หลายเดือนก่อน +1

      I love unsloth. Its a simple but good way for people to do LoRAs

  • @autoflujo
    @autoflujo หลายเดือนก่อน

    Nice video! It would be awesome if you can make a video of how to fine tune these small models.

  • @aminzarei1557
    @aminzarei1557 หลายเดือนก่อน

    Hey Sam, Great video 👌
    Will be waiting for fine-tuning 1b json in and out

    • @samwitteveenai
      @samwitteveenai  หลายเดือนก่อน

      yeah thats a good use case.

  • @chenqu773
    @chenqu773 หลายเดือนก่อน

    Thank you for this quick update Sam! BTW, "QWen" should probably be pronounced as "qian wen" in original Chinese with the hidden meaning of "capable of answering to thousands of questions". 😀

    • @samwitteveenai
      @samwitteveenai  หลายเดือนก่อน

      lol I tried to pronounce it like their devrel guy does. Is there an audio some where I can hear it ?

  • @SirajFlorida
    @SirajFlorida หลายเดือนก่อน +3

    11 and 90B make since because it's 3b and 20B vision parameters respectively? That's what I would guess right off the bat.

  • @IvarDaigon
    @IvarDaigon หลายเดือนก่อน

    Another obvious use case for the mini models is moderation. APIs like OpenAI require you make a moderation call before making the inference call which means two round trips to the server before you get any content you can show to the user. If you can do moderaion on device, then you only need one round trip, making your realtime chats appear faster to the user.
    Moderation, routing, summarization = mini models for the win.

  • @IsmailIfakir
    @IsmailIfakir หลายเดือนก่อน

    is there is a multimodal llm can fine-tuning for sentiment analysis from text, image, video and audio ?

  • @jmspat14b
    @jmspat14b หลายเดือนก่อน +1

    A video on how to finetune these small models would be great! By the way, being from Denmark I always test these models in Danish as well as in English. Llama 3.2 3B is by far the best small, multilingual model I have tested - far better than Gemma 2 2B!

    • @pozytywniezakrecony151
      @pozytywniezakrecony151 หลายเดือนก่อน

      they all kinda fail in Polish :D but well, in english it's quite nice

    • @samwitteveenai
      @samwitteveenai  หลายเดือนก่อน

      ohh that is super interesting to know. Is Danish one of the 8-9 prioritized languages or is it just getting better at European languages in general I wonder.

    • @pozytywniezakrecony151
      @pozytywniezakrecony151 หลายเดือนก่อน

      @@samwitteveenai It appears it doesn't understand some language rules or I am using too small models - tried o1-mini:latest / DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored.i1-Q4_K_M.gguf:latest /
      Qwen2.5-14B_Uncencored-Q6_K_L.gguf:latest . I.e. I asked all to write me 4 verse poems in Polish about "Bocian" . It does create some correct lines but in the middle it mixes wrong words here and there and most of the time it doesn't make sense like it would be saying a story of sort. Here o1 mini : Bocian wysoki, z wody unosi się swobodnie,
      Czerwone dzióbki biały kaptur trzyma.
      Lecąc lecieli nad pól i lasów brzegi,
      Piekne słońce oświetla mu skrzydła jak diamenty."
      "Lecąc lecieli" sounds bad :) It's like "Flying they flew ...."same word repeated. However I think this one is quite good compared to the other output 3/4 actually.

    • @jmspat14b
      @jmspat14b หลายเดือนก่อน

      @@samwitteveenai I feel the need to clarify that its abilities are, of course, no where near what it is in English. But it is the first small language model I have tried, that is able to produce a Danish summary of a Danish text, which is mostly correct and coherent. It does still suffer from making up words (I think it sometimes confuses Danish with Swedish and Norwegian), but gemma 2 and other models are much worse in this regard.
      Also, its knowledge regarding Denmark is very limited - as you would expect for such a small model, I suppose. If for example I ask it to list the last 5 prime ministers of Denmark it only knows the current one and hallucinates the rest. When asking it to list the last 5 governors of any US state, I find that it typically gets 4-5 right.

    • @samwitteveenai
      @samwitteveenai  หลายเดือนก่อน +2

      I looked up both these languages and they aren't in their main multilingual priority languages. Speaking to a friend they pointed out that there aren't huge amounts of Facebook users there, so that might be a reason. Meta themselves are benefiting from all the data they have for training etc. I think it also prioritizes some of their training decisions

  • @nosuchthing8
    @nosuchthing8 13 วันที่ผ่านมา

    Can you train a model with a new conputer language

  • @Nick_With_A_Stick
    @Nick_With_A_Stick หลายเดือนก่อน +1

    It kind of makes me sad that meta trained llama two on audio and pictures and made it where I can output, audio and pictures, and then Nerfed the model removed the decoders for “safety” reasons. And released it even though L3 was already out, and now they are using that llama three version of the model on their app where you can talk to it, as if it was GPT4 Omni.

  • @proflead
    @proflead หลายเดือนก่อน

    1B model is fast 😀👍

    • @nosuchthing8
      @nosuchthing8 13 วันที่ผ่านมา

      How much vram do you need?