Could you please suggest me If i should Ollama or LM studio... I think using LM studio is much easy and better option due to it's easy to setup functionality and also includes GUI any suggestion between Ollama or LM studio?
@ It's a bit tough on the steamdeck. But it can use the middle tier model of the AI so long as you don't have a substantial amount of tokens being used each generation. Unfortunately, it's a little too slow for my taste. Each response takes a solid 3-5 minutes.
Very very thorough guide, you went through every single dialog box. Thank you, it is much appreciated. I got an old water cooled 2080 super still chugging along and the weakest version of the ai works fine. Dare I step it up and make stuff catch on fire? lolol
@jnhkx nice. I have been noticing I can go up to that as well. But actually, I'm not too impressed with this ai, at least tge lesser forms. I hear the true non distilled one is good but no one has the 150tb of ram at home to run it lol. I have since tried dolphin llama 3 and it works faster and smarter, and also tried a roleplaying ai for conversations and it works well
I have locally deployed DeepSeek R1 (70B parameters). My goal is to develop an autonomous code generation system that integrates code synthesis and automated error correction. This system will iterate until an error-free code is generated. For testing, the system will automatically install the required dependencies and streamline the code execution process. Can you create this project?
I'm using a 3070 which also has 8GB of VRAm and the 8B model is running fine, one can append verbose to the run command like: `ollama run deepseek-r1:8b --verbose` to get more information like eval rate, I am getting a rate of 59.39 tokens per second, which is significantly faster than my reading speed
For people that want a nicer interface as well as people using AMD GPUs (because ollama doesn't have as good of a support for AMD GPUs), use LM Studio instead. You even get the option to expand and collapse the thinking process.
I was so stoked about trying this on my 1650Ti laptop until you got to the VRAM requirements. Now I'm wondering if it's even worth trying the 7B model on my rtx 3060 (12GB version) PC 😐
I would say yes but it could take ages to think, then actually give the response. The thinking text output is quite literally the AI "thinking" and is meant to stay part of the model as far as I understand
You were lucky your *1.5b* model responded "16". The first thing I asked *8b* was "What is 8/2(2+2)" and it said "4" after reasoning. Then I asked if it was sure, and it changed its answer to "1".
actually i use msty to use deepseek adding link from hugging face which does not require any additional webui cause it has inbuilt build and easy to use no login required hope you can try later which is way easier than ollama Edit: after installing msty add gpu in local Ai tab in settings
Can somebody tell me which version I should choose? I'm confused about how he explained VRAM vs RAM. My System: Total memory 42904 MB, Dedicated memory 10240 MB, Shared memory 32664 MB. I have an RTX3080. Do you think I can run 14B since I have over 32GB total RAM, or should I stick to 7B since I only have 10GB of VRAM?
What to do? My Ollama server is not working. After the Ollama serve command, there is an incomprehensible list and the server does not work. Instead, the message appears: Error: could not connect to ollama app, is it running?
. I would like to know how to install Deepseek on my computer but it downloads the 400GB version on a hard drive separate from the operating system. I am willing to purchase a 4TB internal hard drive to achieve this purpose, however, I don't know how to do it. Thanks for your help.
@@Panicthescaredycatthat and a few refreshes/restarting olama should help. Double check the Logs folder by right clicking the ollama icon, and open I think it's server.txt to see if it complains about anything there
wsarecv: An existing connection was forcibly closed by the remote host. and connectex: No connection could be made because the target machine actively refused it. any suggestions?
Is the local deepseek limited by ToS too? Like when you use the online deepsek there are certain prompts that wont go through as it says my question goes against their ToS and what not.
As this is offline you can download fine tuned models that have certain features changed like censors and the rest. These are just the base official releases but there are more.
i am using 7b version and i like hoiw it works but, man its doesn't know file formats of certain software, even the popular ones, it mistakes the software for something else, and doesn't even recognise it,or acknowledge its existence, I asked about final draft and .fdx format it uses and it doesn't even know what it mean, mistakes it for figma, which uses .fig, I asked chatgpt free version and it gave all the right answers, I think have to use bigger model but, I am limited by the hardware, I also have 3080ti and 64gb ram, I think this is not meant for regular users, unless they have a very high end setup, its better to stick with different model to use the free version. please let me know your thoughts.
Bigger means it has more information to derive answers from, thus, usually slower. Bigger = smarter but slow Small = less smart but quicker Hope this helps
Their Web interface: yes your data is being sent to their server and collected. The offline model? You could run it on a completely offline computer and prevent it from communicating in any way, if it ever tried
LM Studio has a built in UI and has much better support for AMD GPUs. It also lets you easily download models from hugging face while also letting you choose which quantization to download.
go to c drive>user>(folder name which u entered during windows installation)>.ollama>models>then delete the file(dont delete the files with 1-2kb size)
Uh It seems to me the listed RAM requirements in this video are wrong. As far as I can tell the models use just exactly the amount of ram equal to the filesize of the model. So 14b = 9.7gb VRAM. If i load the 14b model, you can see in your taskmanager that the model is using exactly 9.7GB and not 32gb. I have a 12GB vram GPU and the model does not seem to need any additional memory except what I just mentioned.
Look at Task Manager CPU usage, it may be near 100%. Programs like these can be heavy on the CPU, so it depends on the cooling arrangement you have on your CPU.
i've been testing it out and honestly, unless its just this version of the AI, its kinda dumb. like, it just feels like a very early version of ChatGPT, like almost exactually how it used to respond to questions. it even gave an outdated answer to something I asked. which makes the theory that its just a ChatGPT rip off but probably just a very old model, make more sense.
DANG!, I gave it my story to anyalze but it messe up everything, even the relationship in the story and their dynamic, I am having 2nd doubt about his model, very much, it doesn't understand the story, even a a short one, people please try it, write a short story and ask for suggestions and then give your own suggestion and it will mess up everything. is there other better model I can use to get feedback on the story.
maybe i am dumb, but i dont know how the AI can have any knowlege when its local installed and not using internet. no background information = no knowlege????
dude thats not ultimate at all. when i press search online it cant do it. when i give it a pdf file and ask it to print out all the text info it hallucinates all kind of stuff. the online version can do it perfectly btw.... i tried it with the 14b model btw
@@kevinzhu5591nah dude this is a locally installed model that you can run offline. But there is a bunch of people who enable this locally installed model to do a web search. Stop spreading false information
4:50 "SRAWBERRYING" It also messed up the haiku format. I'm sure the bigger models are way better but why anybody would use the smallest ones is beyond me.
Yeah, the ToS and guidelines imposed on it are not longer functioning when it is offline. And if it is built off skewed data than it can be fine tuned or corrected with new data.
The convention is that you do math from left to right, So 8/2*(2+2): - After parenthesis is; 8/2*4, - And then you do (8/2) first, as it is the first operation on the left and get; 4*4, - And then you do 4*4 getting; 16. I hope this clarifies it.
Looking for Deepseek image generation/interpretation? See Deepseek Janus Pro: th-cam.com/video/6axAY9NV1OU/w-d-xo.html (Also free, and local/offline)
Could you please suggest me If i should Ollama or LM studio...
I think using LM studio is much easy and better option due to it's easy to setup functionality and also includes GUI
any suggestion between Ollama or LM studio?
It was a pretty straightforward tutorial thank you
Awesome guide thank you. This post-DeepSeek phase of AI is going to be fun
The video is on point! fast and easy to follow. Thank you!
It's time to test the 4090.
5090
It works with the steamdeck. The 4090 will easily do it lol
@@ATH-camRambler what the helllll
Just tried with a Geforge 980 Ti and at least the first 1.5b worked ok for me, fast enough.
@ It's a bit tough on the steamdeck. But it can use the middle tier model of the AI so long as you don't have a substantial amount of tokens being used each generation. Unfortunately, it's a little too slow for my taste. Each response takes a solid 3-5 minutes.
This is a fabulous to-the-point and beginner-friendly video.
Thanks for mentioning the hardware. Alot of videos don't mention hardware and they be using Mac.
Easy and straightforward, thanks.
dude, thank u so much! It was interesting
Very very thorough guide, you went through every single dialog box. Thank you, it is much appreciated. I got an old water cooled 2080 super still chugging along and the weakest version of the ai works fine. Dare I step it up and make stuff catch on fire? lolol
I got 1070ti with 8GB VRAM and it run 8b perfectly and fast.
@jnhkx nice. I have been noticing I can go up to that as well. But actually, I'm not too impressed with this ai, at least tge lesser forms. I hear the true non distilled one is good but no one has the 150tb of ram at home to run it lol. I have since tried dolphin llama 3 and it works faster and smarter, and also tried a roleplaying ai for conversations and it works well
Thanks a lot ! Very handy !
Great tutorial, thanks!
VERY HELPFUL! THANK YOU SO MUCH!
I have locally deployed DeepSeek R1 (70B parameters). My goal is to develop an autonomous code generation system that integrates code synthesis and automated error correction. This system will iterate until an error-free code is generated. For testing, the system will automatically install the required dependencies and streamline the code execution process. Can you create this project?
You can integrate deepseek r1 into VS Code
How ?@@sainsrikar
Thank you! That was to the point.
I have a 4060 with 8 gigs of VRAM, the 8b model works pretty dam quickly I would say
Thank you. I have the same GPU. Thinking of trying it.
nvidia privilages
I'm using a 3070 which also has 8GB of VRAm and the 8B model is running fine, one can append verbose to the run command like:
`ollama run deepseek-r1:8b --verbose`
to get more information like eval rate, I am getting a rate of 59.39 tokens per second, which is significantly faster than my reading speed
Can you make a tutorial to install DeepSeek Janus Pro local ? Thank you.
For people that want a nicer interface as well as people using AMD GPUs (because ollama doesn't have as good of a support for AMD GPUs), use LM Studio instead. You even get the option to expand and collapse the thinking process.
Thank You
I was so stoked about trying this on my 1650Ti laptop until you got to the VRAM requirements. Now I'm wondering if it's even worth trying the 7B model on my rtx 3060 (12GB version) PC 😐
My rtx 3090 can run 32B model seemlessly, while 64B model does do slow generation due to maximum vram usage.
Thanks a lot, very helpful!
can we hide the thinking process in the chatbox just like a normal assistant?
Thanks for the thorough explaination. This explains why my rx6600xt with slow 8gb ram is choking on the 32b model😂
Thank you for the tutorial.
Wonderful mate!
8b in my machine ran a bit slower, in a 3050ti laptop, it got only 4gb of vram , but it's acceptable
Great walkthrough thank you! +1 sub
Can you teach us how to download more ram next? Thanks :)
Look like 16 vram is not enough for 2025. Lol
which model should i install. my laptop specs- i9 14900hx, rtx 4060 8gb vram, 16gb ddr5 ram? please help me
Is it possible to hide the thinking bar ? Like not just clicking it everytime but disable it ? Thanks for the video
I would say yes but it could take ages to think, then actually give the response. The thinking text output is quite literally the AI "thinking" and is meant to stay part of the model as far as I understand
Use LM Studio if you want to be able to hide the thinking process.
If needed, how do we uninstall the LLMs? Do we just delete the applications? and where do they live on our compusters?
ollama rm modelname
i am installing deepseek in my server. :D!
And now how to fine-tune model? ;)
But 32B one is more powerful than 7b and 1.5b as it answers some questions right as and is with the par with 671b parameters
i interrupted the powershell intallation step by disabling internet and now unable to install again, please help
Thank you so much
Can I run Janus locally using the same technique?
Hey TroubleChute, I have a cuda gpu, but with this setup you showed it is still running on my cpu, am I missing anything?
Thank you! How do I uninstall models?
Good vid!
I have been having weird problems on my laptop ever sense I downloaded the interface you recommended. :[ I really hope you didn't give me malware...
are you the voice behind kurzgesagt??
something tells me he's deepening his voice
thanks a lot
Excellent
You were lucky your *1.5b* model responded "16".
The first thing I asked *8b* was "What is 8/2(2+2)" and it said "4" after reasoning.
Then I asked if it was sure, and it changed its answer to "1".
XD
Finetune it with r1
actually i use msty to use deepseek adding link from hugging face which does not require any additional webui cause it has inbuilt build and easy to use no login required hope you can try later which is way easier than ollama
Edit: after installing msty add gpu in local Ai tab in settings
thanks for the heads up broskie, im gonna check it out
I have 48 GB ram but 4 GB Vram. Which model should I choose?
Display Memory: 3964 MB
Shared Memory: 24447 MB.
Is there a local way to have r1 also search the internet
Can somebody tell me which version I should choose? I'm confused about how he explained VRAM vs RAM. My System: Total memory 42904 MB, Dedicated memory 10240 MB, Shared memory 32664 MB. I have an RTX3080. Do you think I can run 14B since I have over 32GB total RAM, or should I stick to 7B since I only have 10GB of VRAM?
ya ..
which model is able to read images and documents like chat gpt?
What to do? My Ollama server is not working. After the Ollama serve command, there is an incomprehensible list and the server does not work. Instead, the message appears: Error: could not connect to ollama app, is it running?
thanks man
. I would like to know how to install Deepseek on my computer but it downloads the 400GB version on a hard drive separate from the operating system.
I am willing to purchase a 4TB internal hard drive to achieve this purpose, however, I don't know how to do it. Thanks for your help.
thx great stuff!
i didnt see the deepseek in model box even after i tried resting, so i tried firefox and it worked. i guess it dont work on brave browsers. :\
Just disable the adblock for that website, works fine.
@@Panicthescaredycatthat and a few refreshes/restarting olama should help. Double check the Logs folder by right clicking the ollama icon, and open I think it's server.txt to see if it complains about anything there
How to disable deepthink ?
i have gtx 1650 4GB can i run it
4:50 the display is not weird, it looks like LaTex, a typesetting system widely used in the scientific community.
how do we change the weights
wsarecv: An existing connection was forcibly closed by the remote host.
and
connectex: No connection could be made because the target machine actively refused it.
any suggestions?
I had the same issue
is there a command to ensure that the model uses the GPU ? mine seems slow and I want to be sure.
Is the local deepseek limited by ToS too? Like when you use the online deepsek there are certain prompts that wont go through as it says my question goes against their ToS and what not.
As this is offline you can download fine tuned models that have certain features changed like censors and the rest. These are just the base official releases but there are more.
You can use the Abliterated version and that one is mostly uncensored.
@@TroubleChute
Is there a way to send the real time responses of the model from the command line to, say, a program?
While chatbox is cool it is extremely heavy. A more light weight option would be cool.
I'm on 3070Ti with only 8GB vram and 16bg ram. I can still run 8b model fine! usage is 7g/8g
04:54 "S R A W B E R R I N G" ??? 😭 it gets the final answer right but why does the word suddenly change
or the lower model knows less than above ones?
Getting 404 when updating Chatbox. Seems you can't download any version(404 error msg) as of this moment(25/02/05).
EDIT: it's backup
Can you make video on xbox app on pc black ops 6 stuck in loading screen
Whoa you're not actually a robot!?? 🤯
i am using 7b version and i like hoiw it works but, man its doesn't know file formats of certain software, even the popular ones, it mistakes the software for something else, and doesn't even recognise it,or acknowledge its existence, I asked about final draft and .fdx format it uses and it doesn't even know what it mean, mistakes it for figma, which uses .fig, I asked chatgpt free version and it gave all the right answers, I think have to use bigger model but, I am limited by the hardware, I also have 3080ti and 64gb ram, I think this is not meant for regular users, unless they have a very high end setup, its better to stick with different model to use the free version. please let me know your thoughts.
guys does it work with amd
seems some info was omitted and this is more complicated than it should be.
Why AI boost cores on my new intel cpu's do nothing at all when this runs? What they even for I never seen them doing anything? Maybe anyone knows?
Great question!
if the application does not support your cpu's features then it wont matter. i would look into finding tools that support that
Is the Uninstaller a simple uninstall
Does the bigger model means its smarter or is it just faster?
Bigger means it has more information to derive answers from, thus, usually slower.
Bigger = smarter but slow
Small = less smart but quicker
Hope this helps
If it learns throughout its interaction with me, does it mean that it gets smarter even though im still using 1.5B?
@@oden2011 It doesn't learn from you at all. It just remembers your conversations, in order to give you related answers to your previous prompts.
For anyone interested, I am running the distill lama 8B on a 16gb ram with gtx1060 6gb vram
So this is fully safe right?
here before they start saying "iT sTeaLs yO dAta",
Their Web interface: yes your data is being sent to their server and collected. The offline model? You could run it on a completely offline computer and prevent it from communicating in any way, if it ever tried
How does Ollama compare to something like LM Studio?
They run the same gpu software back end: llamacpp. So performance should be the same.
Lm studio also supports deep seeker btw
LM Studio has a built in UI and has much better support for AMD GPUs. It also lets you easily download models from hugging face while also letting you choose which quantization to download.
@@thetechdog yes I can also recommend it👍
How can I delete a model that I don't want anymore?
I'd like to know too lol
Looking at the comments for answers, but all I get is people asking more questions
go to c drive>user>(folder name which u entered during windows installation)>.ollama>models>then delete the file(dont delete the files with 1-2kb size)
How to uninstall Deepseek from PC completely?
I run 7b with 2060 Super 8gb and it runs fast and smoothly
That's what Xi said.
Uh It seems to me the listed RAM requirements in this video are wrong. As far as I can tell the models use just exactly the amount of ram equal to the filesize of the model. So 14b = 9.7gb VRAM.
If i load the 14b model, you can see in your taskmanager that the model is using exactly 9.7GB and not 32gb. I have a 12GB vram GPU and the model does not seem to need any additional memory except what I just mentioned.
Is it normal the cpu goes above 90degrees when executing?
Look at Task Manager CPU usage, it may be near 100%. Programs like these can be heavy on the CPU, so it depends on the cooling arrangement you have on your CPU.
is this actually offline? like fully, you can use this model, without been conected to the internet and it will give you answer?
yes
I've heard the offline version isn't censored like the online one
i've been testing it out and honestly, unless its just this version of the AI, its kinda dumb. like, it just feels like a very early version of ChatGPT, like almost exactually how it used to respond to questions. it even gave an outdated answer to something I asked. which makes the theory that its just a ChatGPT rip off but probably just a very old model, make more sense.
I don't think the Haiku was correct 😅
4:54 ah yes strawberry is spelt as "strawberring"
DANG!, I gave it my story to anyalze but it messe up everything, even the relationship in the story and their dynamic, I am having 2nd doubt about his model, very much, it doesn't understand the story, even a a short one, people please try it, write a short story and ask for suggestions and then give your own suggestion and it will mess up everything. is there other better model I can use to get feedback on the story.
That Haiku was not a Haiku
Sadly it doesn't support intel gpu's
maybe i am dumb, but i dont know how the AI can have any knowlege when its local installed and not using internet. no background information = no knowlege????
frist time seeing you making a ai video!
dude thats not ultimate at all. when i press search online it cant do it. when i give it a pdf file and ask it to print out all the text info it hallucinates all kind of stuff. the online version can do it perfectly btw.... i tried it with the 14b model btw
This is a offline model, the model does not support online functionality
@@kevinzhu5591nah dude this is a locally installed model that you can run offline. But there is a bunch of people who enable this locally installed model to do a web search. Stop spreading false information
4:50 "SRAWBERRYING"
It also messed up the haiku format.
I'm sure the bigger models are way better but why anybody would use the smallest ones is beyond me.
\boxed{} is LaTeX.
I tried this and asked Deepseek some simple questions and it knows nothing.
Does this resolve the political censorship issue ?
no
Yeah, the ToS and guidelines imposed on it are not longer functioning when it is offline.
And if it is built off skewed data than it can be fine tuned or corrected with new data.
I'll train this Chad AI being my AI GF in my local vps
Bruh the math question at 4:12 isnt 16 its 1. These things still cant do math.
The answer is 16. You still can't do math...
The convention is that you do math from left to right,
So 8/2*(2+2):
- After parenthesis is; 8/2*4,
- And then you do (8/2) first, as it is the first operation on the left and get; 4*4,
- And then you do 4*4 getting; 16.
I hope this clarifies it.