Run Mixtral 8x7B MoE in Google Colab

แชร์
ฝัง
  • เผยแพร่เมื่อ 27 ก.ค. 2024
  • Run the mighty Mixtral 8x7B MoE on Free Google Colab. Mixtral is huge 45B parameters model but with offloading, you can run it on consumer-grade GPUs.
    🦾 Discord: / discord
    ☕ Buy me a Coffee: ko-fi.com/promptengineering
    |🔴 Patreon: / promptengineering
    💼Consulting: calendly.com/engineerprompt/c...
    📧 Business Contact: engineerprompt@gmail.com
    Become Member: tinyurl.com/y5h28s6h
    💻 Pre-configured localGPT VM: bit.ly/localGPT (use Code: PromptEngineering for 50% off).
    LINKS:
    Technical Report: arxiv.org/pdf/2312.17238.pdf
    Github Repo: tinyurl.com/msuj2v47
    Google Colab: tinyurl.com/2nn5snb4
    Huggingface: tinyurl.com/csnapujn
    TIMESTAMPS:
    [00:00] Intro
    [00:30] Understanding the Offloading Paper
    [03:00] Running the Model on Google Colab
    [04:15] Walking Through the Notebook
    [06:26] Running the Model and Generating Responses
    [07:24] Examples of Model Outputs
    [08:16] Final Thoughts
    All Interesting Videos:
    Everything LangChain: • LangChain
    Everything LLM: • Large Language Models
    Everything Midjourney: • MidJourney Tutorials
    AI Image Generation: • AI Image Generation Tu...
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 31

  • @snypzzz8702
    @snypzzz8702 6 หลายเดือนก่อน +2

    i was Dyingggg for this tutorial. thanks mannnn

  • @kunalsoni7681
    @kunalsoni7681 6 หลายเดือนก่อน +1

    Amazing

  • @publicsectordirect982
    @publicsectordirect982 6 หลายเดือนก่อน +3

    You are my go to guy for anything open source. Thanks for your work bhai 🙏

  • @geniusxbyofejiroagbaduta8665
    @geniusxbyofejiroagbaduta8665 6 หลายเดือนก่อน

    Thanks

  • @thisurawz
    @thisurawz 6 หลายเดือนก่อน +7

    Can you do a video on finetuning a multimodal LLM (Video-LlaMA, LLaVA, or CLIP) with a custom multimodal dataset containing images and texts for relation extraction or a specific task? Can you do it using open-source multimodal LLM and multimodal datasets like video-llama or else so anyone can further their experiments with the help of your tutorial. Can you also talk about how we can boost the performance of the fine-tuned modal using prompt tuning in the same video?

    • @user-qf2sj8hp2c
      @user-qf2sj8hp2c 5 หลายเดือนก่อน

      This guy is right on! Asking the right questions! Although I don't expect him to answer all of that, you are def. in the right direction!

  • @adityashinde436
    @adityashinde436 6 หลายเดือนก่อน +2

    make a video on fine tuning Mixtral 8x7b and how to use in production

  • @gangs0846
    @gangs0846 6 หลายเดือนก่อน

    Thank you my friend. One question. You dont use the following, why? :
    The template used to build a prompt for the Instruct model is defined as follows:
    [INST] Instruction [/INST] Model answer [INST] Follow-up instruction [/INST]

  • @jayr7741
    @jayr7741 6 หลายเดือนก่อน +1

    Please bring some multilingual (Hindi) TTS voice cloning on colab.

  • @DavidSeguraIA
    @DavidSeguraIA 6 หลายเดือนก่อน +1

    Thank you this is amazing i will use it for sure!, could make a video using this method with Free Kaggle, since t you can use 2 16gb T4 cards at the same time in the same instance also with 30 GB of RAM, this should run a lot faster, pretty please, also im sure that Free Kaggle tier videos will make you a tons of views for your channel, best of wishes for you and your love ones and happy 2024!

    • @engineerprompt
      @engineerprompt  6 หลายเดือนก่อน +1

      Thank you for the wishes, happy new year to you too! Kaggle is a great option. I haven't looked at it in a while but will see what I can do. Didn't know that they now offers two GPUs. Will explore that further.

  • @user-gp6ix8iz9r
    @user-gp6ix8iz9r 6 หลายเดือนก่อน

    Grate video, is there a way to upload your own RAG documents to this

  • @path1024
    @path1024 6 หลายเดือนก่อน +1

    I have a 2060 Super. Only 4% slower than a 3060, but only 8GB VRAM. I have 64 GB of DDR5 RAM and a 14900K CPU (with an NPU). I bet I could run it in 2 bit, but I never thought I'd go below 4 bit. Frankly I just see 8x7B as being a less efficient version of having several models fine tuned to a specific task. A couple 4 bit 7B models can fit in 8GB VRAM.

    • @engineerprompt
      @engineerprompt  6 หลายเดือนก่อน +1

      I think for general tasks it might be a good option. If you are working on a specific application, I will also recommend to fine tune a smaller model and use that instead. Will probably be a better option

    • @path1024
      @path1024 6 หลายเดือนก่อน

      @@engineerpromptYeah, the total is smaller than its implied parts, so for a general-purpose model it's probably more efficient. 8 7B models at 16 bit would usually take around 112GB instead of 90.

  • @gangs0846
    @gangs0846 6 หลายเดือนก่อน

    How to let it write several pages text? Eventhough I set the max tokens to 32k and tell him to write 10 pages it still Outputs only 1 page of text

  • @bennguyen1313
    @bennguyen1313 5 หลายเดือนก่อน

    I imagine it's costly to run LLMs.. is there a limit on how much Google Colab will do for free?
    I'm interested in creating a Python application that uses AI.. from what I've read, I could use ChatGPT4 Assistant API and I as the developer would incur the cost whenever the app is used.
    Alternatively, I could host a model like Ollama, on my own computer or on the cloud (beam cloud/ Replicate/Streamlit/replit)?

  • @curiouslycory
    @curiouslycory 5 หลายเดือนก่อน

    The model can be 30+GB. Not surprising that it takes a while to load.

  • @lostInSocialMedia.
    @lostInSocialMedia. 6 หลายเดือนก่อน

    can we run uncensored model ?

    • @engineerprompt
      @engineerprompt  6 หลายเดือนก่อน +1

      I think yes but it needs to be converted into HQQ format

  • @DezorianGuy
    @DezorianGuy 6 หลายเดือนก่อน +1

    Is this better than chatgpt 3.5?

    • @valm7397
      @valm7397 6 หลายเดือนก่อน +1

      yes

    • @engineerprompt
      @engineerprompt  6 หลายเดือนก่อน +1

      On benchmarks, yes

  • @alx8439
    @alx8439 6 หลายเดือนก่อน

    You can run quantized 4bit mixtral literally on any recent computer with 32 gb of RAM without any GPU at all. I don't understand why you need Google Collab here, memory is ultracheap these days

    • @unkim7085
      @unkim7085 6 หลายเดือนก่อน +1

      Do you have a reference for a tutorial about how to do it? Thanks

    • @alx8439
      @alx8439 6 หลายเดือนก่อน

      @@unkim7085 or do the same in ollama - it just works there

  • @mavrick23
    @mavrick23 6 หลายเดือนก่อน

    can it work on 8gb ram?

  • @oliviertorres8001
    @oliviertorres8001 6 หลายเดือนก่อน +1

    Is there a way to make this model works in oobabooga Text generation WebUI that run in a Google Collab? Thx,