@@TirendazAI hi, I have use ollama3 with this localhost port : 127.0.0.1:11434. but I am confusing, how to load the model with transformers ? so I can follow your step?
Thank you for the great video. It was really helpful for getting everything set up. If I may ask, I have a 4090 graphics card and I can this maxing out my GPU usage so the cuda should be working correctly. However, my prompts when asked take anywhere between 20s and 2 minutes to return and after a few questions the chatbot stops responding at all and just stays processing. Is this normal?
You always keep up the updates. Thanks for the video!
You're welcome 😀
for me i didnt get same output as u when debuged prompt in 07:15
i did the exact ways u did brother..but the model takes a lot of time to generate the response..is there any possible fix?
What are your pc specs?
Thanks for the tutorial! Is this actually running locally? If so, how did it download so quickly?
Yes, the model is running locally and for free. To download the model, you can use ollama.
@@TirendazAI hi, I have use ollama3 with this localhost port : 127.0.0.1:11434.
but I am confusing, how to load the model with transformers ? so I can follow your step?
Thank you for the great video. It was really helpful for getting everything set up. If I may ask, I have a 4090 graphics card and I can this maxing out my GPU usage so the cuda should be working correctly. However, my prompts when asked take anywhere between 20s and 2 minutes to return and after a few questions the chatbot stops responding at all and just stays processing. Is this normal?
Which large model do you use? If you're using llama-3:70B, I think it's normal.
It seems the chatbot will not answer before checking the history🤔
Does this work on mac with m2 too?