Llama 3 on Your Local Computer | Free GPT-4 Alternative

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 พ.ย. 2024

ความคิดเห็น • 45

  • @nartrab1
    @nartrab1 7 หลายเดือนก่อน +18

    Thanks for sharing. It is amazing that not only you create quality videos, but you also reply to so many technical problems. You are a great guy.

    • @martin-thissen
      @martin-thissen  7 หลายเดือนก่อน +1

      Thanks! That's actually a really nice compliment, really appreciate it! :-)

  • @metanulski
    @metanulski 7 หลายเดือนก่อน +7

    Very Nice. Since not everyone has 40 GB Vram, can you be more specific on how to do this with the llama3 8B model. ( because you say we maybe need to change the datatype ist we use a different model, and I have no clue how I should know the correct data type 😁 )

    • @martin-thissen
      @martin-thissen  7 หลายเดือนก่อน +3

      Great question, thanks for asking! You can see the data type in the config.json inside the Hugging Face repository. Inside the config.json search for “torch_dtype”. Bfloat16 is pretty popular but does currently not work for AWQ quantized models, which usually use float16. Hope this is helpful! :-)

  • @shrirambhardwaj8736
    @shrirambhardwaj8736 2 หลายเดือนก่อน

    Hi , thanks for making super helpful content , I have RTX 3050 which has a 4 GB VRAM is there a way i can use this model locally and if yes then how

  • @firewatermoonsun
    @firewatermoonsun 4 หลายเดือนก่อน

    I have this error: pip install autoawq install -r requirements.txt
    Defaulting to user installation because normal site-packages is not writeable
    ERROR: Could not find a version that satisfies the requirement autoawq (from versions: none)
    ERROR: No matching distribution found for autoawq

  • @frankinher7467
    @frankinher7467 3 หลายเดือนก่อน

    i came here looking to see nice cute animals where are the 3 llama's you advertised i could see on my computer

  • @EwenMackenzie
    @EwenMackenzie 7 หลายเดือนก่อน +2

    thanks for sharing! this was super helpful :D

  • @jennilthiyam980
    @jennilthiyam980 6 หลายเดือนก่อน +1

    Hi. Thank you for your video. I wanna know one thing. I have a multiple CSV files which I want the llama to know about it. I have went through other videos, there is a guy that does the same task like I want but after incorporating the files, llama cannot respond other general question correctly but focus only on the information of CSV file. Their method first split the text into chunck and use embedding to embedded them using other embedding methods. Can you please provide any solution to it using only llama and nothing else. What I want is for the llama to know about my files in the top of its already existing knowledge.

    • @axelef2344
      @axelef2344 6 หลายเดือนก่อน +1

      You need to embed the knowledge from your CSV files into vector database properly and then when you ask about something related to this knowledge, llama or any of smaller models good for vector search should find it ( this specific chunk or couple of them from DB ) and attach to your question as context. Otherwise you will have to train your model towards this data and this is much more hardware-devouring. AFAIK.

    • @jennilthiyam1261
      @jennilthiyam1261 6 หลายเดือนก่อน

      @@axelef2344 hi. do you have any good video for it. i followed some videos, and yes my model can answer the queries regarding my specific data but when i asked other general question, it fails to reply and also it does not have memory. i want llama to have knowledge about my data and still able to answer other general question also and still has memory.

  • @stresstherapist
    @stresstherapist 4 หลายเดือนก่อน

    is this on windows or linux. can't seem to install the vllm library on windows

  • @barnouinjoris164
    @barnouinjoris164 4 หลายเดือนก่อน

    you are so enthusiastic !! : )

    • @martin-thissen
      @martin-thissen  4 หลายเดือนก่อน

      Thank, really appreciate it! :-)

  • @codingwithsarah3650
    @codingwithsarah3650 6 หลายเดือนก่อน

    hello , just wondering . Can you help in doing this on Google Collab ?

  • @TaschenRechner22
    @TaschenRechner22 3 หลายเดือนก่อน

    When’s the next Video?

  • @jeffbruno847
    @jeffbruno847 7 หลายเดือนก่อน

    When i run pip install command I get the error "Could not find a version that satisfies the requirement flash-attn==2.5.7"

    • @martin-thissen
      @martin-thissen  7 หลายเดือนก่อน +1

      Can you maybe check your pip config? On PyPi you can see that 2.5.7 is actually the most recent version for flash-attn: pypi.org/project/flash-attn/

  • @allo.allo.
    @allo.allo. 5 หลายเดือนก่อน

    How to run llama 3 70b on 4 x rtx in linux?

  • @shotelco
    @shotelco 7 หลายเดือนก่อน

    Asking a LLM questions is fun and everything, but most will want a LLM to act as an "agent base". Utilizing a multi-expert foundation. Meaning the LLM is tasked with a coding problem or a finance problem, or to re-write a story. The LLM base is where the agent front-end goes to. How about you front-end something like pythagora;DOT:ai using Llama3 as a LOCAL backend over API? And (and I know I am asking a lot here), provide a training methodology which ingest something a like companies FAQ's, help-desk/knowledge base/ etc?
    Otherwise playing with any AI is more amusement and entertainment than an actual system for productivity.

    • @martin-thissen
      @martin-thissen  7 หลายเดือนก่อน +1

      Yes, definitely a fair call! I'll keep it in mind for future videos to take a look at more advanced use cases, such as autonomous task solving :)

  • @stefanocianciolo8432
    @stefanocianciolo8432 7 หลายเดือนก่อน

    If i wanted to only get the text results and not launch the UI, what should I remove? Thanks!

    • @martin-thissen
      @martin-thissen  7 หลายเดือนก่อน

      Basically the entire UI class. You would only load the vLLM engine and then call the generation function directly:
      llm = StreamingLLM(model="casperhansen/llama-3-70b-instruct-awq", quantization="AWQ", dtype="float16")
      tokenizer = llm.llm_engine.tokenizer.tokenizer
      sampling_params = SamplingParams(temperature=0.6,
      top_p=0.9,
      max_tokens=4096,
      stop_token_ids=[tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("")]
      )
      prompt = tokenizer.apply_chat_template(history_chat_format, tokenize=False)
      llm.generate(prompt, sampling_params):

  • @AnakinSkywalkerrrop
    @AnakinSkywalkerrrop 6 หลายเดือนก่อน

    Can use on i5 12th gen? No gpu?

  • @Stealthy_Sloth
    @Stealthy_Sloth 7 หลายเดือนก่อน

    I need the model to upload files for anaclasis like Chat-GPT's interface.

    • @martin-thissen
      @martin-thissen  7 หลายเดือนก่อน +1

      This would be possible if you further customise the UI. But probably using an existing solution is easier. I think PrivateGPT or Chat with RTX could be helpful for your use case. Is that something you would like me to create a video about? :)

  • @mike8289
    @mike8289 7 หลายเดือนก่อน

    Can I run the 70B with an rtx 3090, has 24gb of vram, and how would I do it?

    • @martin-thissen
      @martin-thissen  7 หลายเดือนก่อน

      Great question, fitting 70B parameters in 24 GB VRAM where the default precision for a parameter is 16 bits (8 bits = 1 byte) is a challenging task. Even if you quantize the model weights to 3-bit precision, fitting the model is difficult. However, there still seems to be hope by offloading some model weights to RAM and dynamically loading the required model weights during inference. Of course, this approach is not ideal due to the latency of loading and offloading the model weights, but it seems that a generation speed of 2 tokens/sec is still possible which is not too far off from the human reading speed (~7 tokens/sec). Probably the llama.cpp library is a good solution here: www.reddit.com/r/LocalLLaMA/comments/17znv35/how_to_run_70b_on_24gb_vram/

  • @mohsenghafari7652
    @mohsenghafari7652 7 หลายเดือนก่อน

    thanks . is great job.
    can i use for gpu 3080?

  • @omarnaser8291
    @omarnaser8291 6 หลายเดือนก่อน

    does it do images

  • @tubebility
    @tubebility 7 หลายเดือนก่อน +1

    I too have an RTX 6000, but only in my dreams. 🤑

  • @alexarngold4185
    @alexarngold4185 7 หลายเดือนก่อน

    Support 💙😊

  • @322ss
    @322ss 7 หลายเดือนก่อน

    Thanks! But lol, joe average don't have a 10,000+ euro GPU :D

  • @zskater1234
    @zskater1234 7 หลายเดือนก่อน +1

    Nice

  • @74Gee
    @74Gee 7 หลายเดือนก่อน +1

    Sweeet!!

  • @shogunX-u9b
    @shogunX-u9b 3 หลายเดือนก่อน +1

    so basically watching all of this was a waste of time having a gtx 1650 4 gb vram got u

  • @avi7278
    @avi7278 7 หลายเดือนก่อน

    let's be real, it's not GPT-4. I don't know why people insist on trying to make this false equivalency. No open source model has still come even close to GPT-4. They can release all the benchmarks they want and blah blah blah, using the two models you immediately see that llama3 is still quite a bit weaker than GPT-4. We'll see when the 300B version comes out. I'm not holding my breath though. If 300B still falls short then it wil be at least another year and a half maybe two before llama 4 comes out that should finally surpass it but by that time GPT-5 will be out and llama will again be behind of course.

    • @lucamatteobarbieri2493
      @lucamatteobarbieri2493 7 หลายเดือนก่อน +2

      OpenAI despite the name has gone down the closed source route. This makes them dependent on their software engineers. More open llms like llama3 have the advantage of a huge community of developers. One will be like Windows and the other like Linux. What is better? It depends from the use case scenario.

    • @avi7278
      @avi7278 7 หลายเดือนก่อน

      @@lucamatteobarbieri2493 my use case is complex coding tasks. Sure maybe on some rag stuff llama 3 can hang with gpt-4 but the advanced reasoning, context and instructions following is still not anywhere where it needs to be for my use case.

    • @martin-thissen
      @martin-thissen  7 หลายเดือนก่อน +1

      Yes agreed, it’s better than ChatGPT (GPT-3.5) but worse than GPT-4. I think the 400B+ model will achieve GPT-4 level performance. Of course, it would be helpful to know how many tokens it has already been trained on and how many more Meta plans to train it on, but the current benchmarks look very promising!

  • @strategy419
    @strategy419 6 หลายเดือนก่อน

    can i getvyour email for business inquiries