Run Mixtral 8x7B MoE in Google Colab

แชร์
ฝัง

ความคิดเห็น • 31

  • @snypzzz8702
    @snypzzz8702 10 หลายเดือนก่อน +2

    i was Dyingggg for this tutorial. thanks mannnn

  • @gangs0846
    @gangs0846 9 หลายเดือนก่อน

    Thank you my friend. One question. You dont use the following, why? :
    The template used to build a prompt for the Instruct model is defined as follows:
    [INST] Instruction [/INST] Model answer [INST] Follow-up instruction [/INST]

  • @publicsectordirect982
    @publicsectordirect982 10 หลายเดือนก่อน +3

    You are my go to guy for anything open source. Thanks for your work bhai 🙏

    • @engineerprompt
      @engineerprompt  10 หลายเดือนก่อน +1

      Glad it's helpful

  • @thisurawz
    @thisurawz 10 หลายเดือนก่อน +7

    Can you do a video on finetuning a multimodal LLM (Video-LlaMA, LLaVA, or CLIP) with a custom multimodal dataset containing images and texts for relation extraction or a specific task? Can you do it using open-source multimodal LLM and multimodal datasets like video-llama or else so anyone can further their experiments with the help of your tutorial. Can you also talk about how we can boost the performance of the fine-tuned modal using prompt tuning in the same video?

    • @WesnerSEI
      @WesnerSEI 9 หลายเดือนก่อน

      This guy is right on! Asking the right questions! Although I don't expect him to answer all of that, you are def. in the right direction!

  • @adityashinde436
    @adityashinde436 10 หลายเดือนก่อน +2

    make a video on fine tuning Mixtral 8x7b and how to use in production

  • @path1024
    @path1024 10 หลายเดือนก่อน +1

    I have a 2060 Super. Only 4% slower than a 3060, but only 8GB VRAM. I have 64 GB of DDR5 RAM and a 14900K CPU (with an NPU). I bet I could run it in 2 bit, but I never thought I'd go below 4 bit. Frankly I just see 8x7B as being a less efficient version of having several models fine tuned to a specific task. A couple 4 bit 7B models can fit in 8GB VRAM.

    • @engineerprompt
      @engineerprompt  10 หลายเดือนก่อน +1

      I think for general tasks it might be a good option. If you are working on a specific application, I will also recommend to fine tune a smaller model and use that instead. Will probably be a better option

    • @path1024
      @path1024 10 หลายเดือนก่อน

      @@engineerpromptYeah, the total is smaller than its implied parts, so for a general-purpose model it's probably more efficient. 8 7B models at 16 bit would usually take around 112GB instead of 90.

  • @jayr7741
    @jayr7741 10 หลายเดือนก่อน +1

    Please bring some multilingual (Hindi) TTS voice cloning on colab.

  • @DavidSegura99
    @DavidSegura99 10 หลายเดือนก่อน +1

    Thank you this is amazing i will use it for sure!, could make a video using this method with Free Kaggle, since t you can use 2 16gb T4 cards at the same time in the same instance also with 30 GB of RAM, this should run a lot faster, pretty please, also im sure that Free Kaggle tier videos will make you a tons of views for your channel, best of wishes for you and your love ones and happy 2024!

    • @engineerprompt
      @engineerprompt  10 หลายเดือนก่อน +1

      Thank you for the wishes, happy new year to you too! Kaggle is a great option. I haven't looked at it in a while but will see what I can do. Didn't know that they now offers two GPUs. Will explore that further.

  • @kunalsoni7681
    @kunalsoni7681 10 หลายเดือนก่อน +1

    Amazing

  • @bennguyen1313
    @bennguyen1313 8 หลายเดือนก่อน

    I imagine it's costly to run LLMs.. is there a limit on how much Google Colab will do for free?
    I'm interested in creating a Python application that uses AI.. from what I've read, I could use ChatGPT4 Assistant API and I as the developer would incur the cost whenever the app is used.
    Alternatively, I could host a model like Ollama, on my own computer or on the cloud (beam cloud/ Replicate/Streamlit/replit)?

  • @8888-u6n
    @8888-u6n 10 หลายเดือนก่อน

    Grate video, is there a way to upload your own RAG documents to this

  • @curiouslycory
    @curiouslycory 9 หลายเดือนก่อน

    The model can be 30+GB. Not surprising that it takes a while to load.

  • @geniusxbyofejiroagbaduta8665
    @geniusxbyofejiroagbaduta8665 10 หลายเดือนก่อน

    Thanks

  • @gangs0846
    @gangs0846 9 หลายเดือนก่อน

    How to let it write several pages text? Eventhough I set the max tokens to 32k and tell him to write 10 pages it still Outputs only 1 page of text

  • @lostInSocialMedia.
    @lostInSocialMedia. 10 หลายเดือนก่อน

    can we run uncensored model ?

    • @engineerprompt
      @engineerprompt  10 หลายเดือนก่อน +1

      I think yes but it needs to be converted into HQQ format

  • @DezorianGuy
    @DezorianGuy 10 หลายเดือนก่อน +1

    Is this better than chatgpt 3.5?

    • @valm7397
      @valm7397 10 หลายเดือนก่อน +1

      yes

    • @engineerprompt
      @engineerprompt  10 หลายเดือนก่อน +1

      On benchmarks, yes

  • @alx8439
    @alx8439 10 หลายเดือนก่อน

    You can run quantized 4bit mixtral literally on any recent computer with 32 gb of RAM without any GPU at all. I don't understand why you need Google Collab here, memory is ultracheap these days

    • @unkim7085
      @unkim7085 10 หลายเดือนก่อน +1

      Do you have a reference for a tutorial about how to do it? Thanks

    • @alx8439
      @alx8439 10 หลายเดือนก่อน

      @@unkim7085 or do the same in ollama - it just works there

  • @mavrick23
    @mavrick23 10 หลายเดือนก่อน

    can it work on 8gb ram?

  • @oliviertorres8001
    @oliviertorres8001 10 หลายเดือนก่อน +1

    Is there a way to make this model works in oobabooga Text generation WebUI that run in a Google Collab? Thx,