So, I need an internet connection to get the software in the first place. After I have it installed, can I completely shutdown my internet connection to use it?
Yes! That's a great thing about the local models. You don't send your data out. It's all about data privacy, and considering how slow OpenAI has been for me lately, right now this is also about LLM performance.
@@Hdhead If its all about local data privacy, you shouldn't be using a server with ports on a web browser. I want LLMs that don't require any network ports webui garbage.
That is an absolutely amazing speed! I ran GPT4all locally, but that is much slower. Is it possible to run a llama2 34b (Mistral 7b wojld be an option, too) or bigger model running at that speed on CPU?
I ran Mistral 7b and Mixtral, I forgot the size but it fits in my 12 GB GPU RAM. They run slower than llava. I don't have the numbers here with me, but they are still running at a decent speed (respectable speed for a chat application). I can ram llama2 34b but it won't fit in my GPU RAM. It would have to use the CPU for inference. I tried a larger model like that and it was okay on the CPU, not amazing.
Hi I have downloaded the cda and install it. but while trying to run it on GPU I get this error: "hipErrorNoBinaryForGpu: Unable to find code object for all current devices! I have rtx2060 gpu and32 gigs of ram. I have configured cuda in my system.
Create an issue at the GitHub page. Sometimes I do a screen live capture to record the screen on case there's some feedback in the command line that's too quick to see.
Why go through all of this fuzz when people can just download LM Studio that is far easier to use? Why are people still so obsessed with command lines? If you're a nerd.... ok.... but the average John and Jane don't want to deal with that bullshit. The Windows OS shows that you don't need that nonsense to run some amazing applications or games on your computer. Stop living in the 90's
Llamafile allows you to run models at insane speeds from your CPU. LM Studio doesn’t. Ram is a lot cheaper than GPU vRAM. It’s insanely fast too based on some really innovative techniques the developer found for running matrix computations on a CPU. LMstudio doesn’t let you do much. It’s good for simple chatting but that’s about it. Its web and documents RAG setup sucks. Anyone seriously working with local LLMs is doing it from the command prompt, or IDEs and running custom Langchain scripts.
@@dimeloloco What are you rambling about? The topic was only about the fact that Llamafile isn't user-friendly enough for the average user. *_"LMstudio doesn't let you do much. It's good for simple chatting, but that's about it."_* What nonsense... ~sigh~ A model itself (just the bare-bones model with nothing added or changed) doesn't become smarter by running it with Llamafile, compared to running that same model in LM Studio and without any quantization. It will just run faster. Most people don't want to dive into AI like professionals or basement nerds like you do. I have several local AI models installed, but unless I have privacy concerns, I always use one of the three major AI models online because they're way faster, and their output is of superior quality. Secondly, they don't need my CPU/GPU/memory sources, which I might need myself for something else. I don't even see the need to invest more time into diving deeper into AI for local use, for the simple reason that I consider AI rather useless for important things I want to do, where the accuracy of the output matters. I have fiddled with numeric data using the large AI models online, and it continues to mess things up. I could ask the most simple questions about a data set, and it gets it often completely wrong, whereas Excel just gets it right... all the time, it just takes more effort. All those major AI models are language models, and as a result, they hallucinate like crazy, even the best models. I don't see myself using AI for critical applications anytime soon, and a lot of businesses think the same way. Sure, it will change when AI models become more reliable, but I don't see that happening anytime soon, when they fail to come up with something way more sophisticated than the current language models, something that can actually think, help us beat climate change, help us make the correct economic decisions, find cures for common diseases, boost progress in space travel, etc. Sure, what we have right now looks cool, but the more you work with it, the more you will notice how flawed it is. In 20 years, we will all laugh about it, just like we laugh about Eliza now. Bye.
is this true : in summary, the Llamafile 70B Instruct model requires a significant amount of RAM, likely in the range of 128GB or more, to run effectively without severe performance degradation. Systems with less RAM, such as 8GB, will struggle to run this large language model.
When running on CPU it is giving me correct answers, but when running on GPU it is giving me gibberish answers. {.\llamafile.exe "C:\Users\PARTH\.ollama\models\blobs\sha256-43f7a214e5329f672bb05404cfba1913cbb70fdaa1a17497224e1925046b0ed5" -ngl 35} I am using RTX 4090, this is Qwen 7B model.
a recent update to llamafile allows it to run much faster on the cpu than on the gpu (don't ask me how, i have no idea, but it works)
Across all architectures? I haven't played with it in a couple of months. I have to check that out. Thanks.
@@Hdhead should be across all architectures as far as i know
So, I need an internet connection to get the software in the first place. After I have it installed, can I completely shutdown my internet connection to use it?
Yes! That's a great thing about the local models. You don't send your data out. It's all about data privacy, and considering how slow OpenAI has been for me lately, right now this is also about LLM performance.
@@Hdhead If its all about local data privacy, you shouldn't be using a server with ports on a web browser. I want LLMs that don't require any network ports webui garbage.
@@RPG_Guy-fx8ns of course. llamafile also offers that option. You can communicate with it via the OpenAI API and through the command line.
That is an absolutely amazing speed! I ran GPT4all locally, but that is much slower.
Is it possible to run a llama2 34b (Mistral 7b wojld be an option, too) or bigger model running at that speed on CPU?
I ran Mistral 7b and Mixtral, I forgot the size but it fits in my 12 GB GPU RAM. They run slower than llava. I don't have the numbers here with me, but they are still running at a decent speed (respectable speed for a chat application).
I can ram llama2 34b but it won't fit in my GPU RAM. It would have to use the CPU for inference. I tried a larger model like that and it was okay on the CPU, not amazing.
Cool video, and great intro.
Hello, how do I "configure" llamafile to chat using files (pdf, doc, etc) in my local directory? Thanks!
you will probably want something like Open Interpreter which supports llamafiles
Hi I have downloaded the cda and install it. but while trying to run it on GPU I get this error: "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!
I have rtx2060 gpu and32 gigs of ram. I have configured cuda in my system.
Can you import .pdf files?
how to add other data feed (or whatever it called) to update it answers
I'm not sure what you mean. Could you describe what you are trying to do?
@@Hdhead Ok I'm not well aware how LLM works, but I mean, extend it training _/trained_ data, ore let it accepted new data and analysis it .
Oh, I see what you mean. That's kind of out of scope of llamafile. I'd recommend a place like Hugging Face to start your journey on this.
my problem is, whenever I execute the server file, the command prompt automatically closes and nothing happens.
Create an issue at the GitHub page. Sometimes I do a screen live capture to record the screen on case there's some feedback in the command line that's too quick to see.
Why go through all of this fuzz when people can just download LM Studio that is far easier to use? Why are people still so obsessed with command lines? If you're a nerd.... ok.... but the average John and Jane don't want to deal with that bullshit. The Windows OS shows that you don't need that nonsense to run some amazing applications or games on your computer. Stop living in the 90's
brainrot
Llamafile allows you to run models at insane speeds from your CPU. LM Studio doesn’t. Ram is a lot cheaper than GPU vRAM. It’s insanely fast too based on some really innovative techniques the developer found for running matrix computations on a CPU. LMstudio doesn’t let you do much. It’s good for simple chatting but that’s about it. Its web and documents RAG setup sucks. Anyone seriously working with local LLMs is doing it from the command prompt, or IDEs and running custom Langchain scripts.
@@dimeloloco
What are you rambling about? The topic was only about the fact that Llamafile isn't user-friendly enough for the average user.
*_"LMstudio doesn't let you do much. It's good for simple chatting, but that's about it."_*
What nonsense... ~sigh~
A model itself (just the bare-bones model with nothing added or changed) doesn't become smarter by running it with Llamafile, compared to running that same model in LM Studio and without any quantization. It will just run faster.
Most people don't want to dive into AI like professionals or basement nerds like you do. I have several local AI models installed, but unless I have privacy concerns, I always use one of the three major AI models online because they're way faster, and their output is of superior quality. Secondly, they don't need my CPU/GPU/memory sources, which I might need myself for something else.
I don't even see the need to invest more time into diving deeper into AI for local use, for the simple reason that I consider AI rather useless for important things I want to do, where the accuracy of the output matters.
I have fiddled with numeric data using the large AI models online, and it continues to mess things up. I could ask the most simple questions about a data set, and it gets it often completely wrong, whereas Excel just gets it right... all the time, it just takes more effort.
All those major AI models are language models, and as a result, they hallucinate like crazy, even the best models. I don't see myself using AI for critical applications anytime soon, and a lot of businesses think the same way. Sure, it will change when AI models become more reliable, but I don't see that happening anytime soon, when they fail to come up with something way more sophisticated than the current language models, something that can actually think, help us beat climate change, help us make the correct economic decisions, find cures for common diseases, boost progress in space travel, etc.
Sure, what we have right now looks cool, but the more you work with it, the more you will notice how flawed it is. In 20 years, we will all laugh about it, just like we laugh about Eliza now.
Bye.
is this true : in summary, the Llamafile 70B Instruct model requires a significant amount of RAM, likely in the range of 128GB or more, to run effectively without severe performance degradation. Systems with less RAM, such as 8GB, will struggle to run this large language model.
When running on CPU it is giving me correct answers, but when running on GPU it is giving me gibberish answers.
{.\llamafile.exe "C:\Users\PARTH\.ollama\models\blobs\sha256-43f7a214e5329f672bb05404cfba1913cbb70fdaa1a17497224e1925046b0ed5" -ngl 35}
I am using RTX 4090, this is Qwen 7B model.