Ollama: The Easiest Way to Run Uncensored Llama 2 on a Mac
ฝัง
- เผยแพร่เมื่อ 15 มิ.ย. 2024
- Ollama is the simplest way of getting Llama 2 installed locally on your apple silicon mac. I install it and try out llama 2 for the first time with minimal hassle. Even includes uncensored versions of models!
Link: ollama.ai
00:00 Intro/Install
01:48 Trying Llama 2 Out for Programming
04:22 Logic Problem
05:10 Ollama backend
05:48 Custom Models
07:40 API
09:14 Uncensored Llama 2
11:02 Conclusion
Support My Work:
Get $200 credit on Sign Up to DigitalOcean: m.do.co/c/d05114f84e2f
Check out my website: www.ianwootten.co.uk
Follow me on twitter: / iwootten
Subscribe to my newsletter: newsletter.ianwootten.co.uk
Buy me a cuppa: ko-fi.com/iwootten
Learn how devs make money from Side Projects: niftydigits.gumroad.com/l/sid...
Gear I use:
14" Macbook Pro (US) - amzn.to/3ObEy8G
14" Macbook Pro (UK) - amzn.to/38Hg07d
Shure MV7 USB Mic (US) - amzn.to/3CRNUSD
Shure MV7 USB Mic (UK) - amzn.to/44rAoR4
As an affiliate I earn on qualifying purchases at no extra cost to you. - วิทยาศาสตร์และเทคโนโลยี
I love the video! I'm not sure why it's returning the system prompt in some of your queries though. I pulled the same layers and it seems to be working fine. Also, the llama2:13b model gives a lot better results to some of the questions you were asking, and will give you a result for the regex (along with the obligatory "don't do bad stuff" message).
Amazing video, Ollama is just what I was looking for. glad i found this! excellent content
That's was interesting rabbit hole to venture down, I ended up messing around with Continue to improve my Arduino code in VS in end (with the GPT model). It does the basics pretty well. However the LLama uncensored models are really interesting, along with the models on Hugging Face. Thanks for video
Great video! I found it helpful :)
Thanks for the video. Works like a charm on my MacBook Pro M1. You rock!
May I ask, are you using m1 or m1 pro, what memory size you’re using?
What speed does it run with (i.e. how long does it take to respond?)
@@boyinclass I’ve got the first generation of MacBook Pro M1. 16GB memory. It’s very quick, but the language model is no GPT (I.e. the answers aren’t as sophisticated.)
Very helpful thanks!
Would be very cool if Ollama could implement Agents with this kind of simplicity. Maybe just connect using AppleScript or Apple Workflows or even just shell scripts. That would rock.
Might be a good idea to revisit this now there are specific models out there like codellama tuned for writing code.
I have a video on using codellama here: th-cam.com/video/TWc7w5rvKrU/w-d-xo.html
Looks like you now pull the 70B model too with "ollama pull llama2:70b". A paltry 39GB download.
regarding killers question, AI was right. There are 3 killers, 2 alive and one is dead
Agree. Query did not define alive or dead killets
Thank you for this! The two main things that I dislike about LLMs is the middle school level answers and the nanny rails. Hopefully, running an uncensored LLM will at least make the low intelligence level less grating.
curious to why you shot the video in p50 vs p30 ? and what camera you use
great ollama intro ! I'm curious why ollama has to run on MacOs directly and not in docker container as linux OS? is it technical limitation ? I presume eventually the model has to be deployed somewhere in the cloud in a container in production.
Not a technical limitation. At the point this video was made there wasn't an official Linux or container distro. The good thing is that both of those have recently happened, so you can run it on Linux or under Docker if you wish. The only reason for me running on a Mac is simplicity.
Can I make the meta original model (the original model with large capacity) available in Ollama? The original model file has been downloaded. But no matter how hard I look, I can't find the path where the model is stored
hahaha it's funny but hair t really useful as a tool lol
Great video tho ! I downloaded Ollama and I was doubting if it was normal that it took so many time so thanks for your video hahah
Do you know how to get it to run in LangChain while taking advantage of the M1/2 chips?
Thank you for the video. Is it possible to run Ollama in a Python script instead of using terminal? For example, to have a Python program that sends a prompt to an LLM, takes the response and writes it to a file.
Langchain has an integration for Ollama which will do this. python.langchain.com/docs/integrations/llms/ollama
@@IanWootten Thank you.
Nice video thanks! Where do we find the uncensored model? Also, how can I get ollama to query my own personal files?
The uncensored model is available to pull down using llama2-uncensored.
Maybe don't bother - at least not now. I just tried the uncensored model and it's not very smart.
Thanks. What's the reason it would be less smart than the censored version @@ruess ?
This is lovely, Can run this on M2 as well?
Sure can!
hey thanks for sharing this video....Im having some issues when starting the app ....it installs and pulls everything as normal but right after the success line I get the following error Error: llama runner process has terminated......do you know what could be the issue ?....im using a MacBook Pro with a M1 chip 8gb ram
Sorry, I don't - I'd join the ollama discord (they're really friendly!) and ask in there if you haven't already.
hey Ian is there a way to use text-generation-webui with this?
Not that I can see. Looks like it uses llama.cpp as one of it's backends so you'd need to go that route.
hey why cant i access the ollama in my main terminal ?
its showing illegal hardware instruction ollama run llama2
Does it automaticall detect and use apple m2 gpu? is there anything I need to configure to use it with gpu?
Nope, should automatically be making use of apple silicon.
Shouldn't we be more specific with the prompt? Eg. How many alive killers are now in the room?
Great point. I think that this is a given in the question to a human.
can we run open source LLM like Llama, stable diffusion in MAC with Graphics AMD Radeon R9 M380 2 GB, Processor- 3.2 GHz Quad-Core Intel Core i5
DiffusionBee is a great app for running stable diffusion simply on a mac. Take a look at the video I made on it.
@@IanWootten sure, Can u also please share views on language model. Will this hardware enough to run LLM?
Do I have to re-download the model each time I reopen terminal??!! I'm using llama to uncensored I got it working then closed out terminal. When I reopen terminal I requested to run llama to uncensored and it begin download it the 3.8 GB file again. Is this normal, I don't want to stack up gigs of information because I'm running the program wrong.
Are you sure you ran *exactly* the same run command? Models that are already downloaded should just run locally when you call 'ollama run'.
To my knowledge yes I referenced your video to be sure, additionally after the second download it would not load up it kept giving me warnings regarding inappropriate Contant etc. Very strange that's the whole purpose of the uncensored file. I'm an author but I write erotica on the side and the pros for erotica are pretty forgiving perfect for assistance with a I but the options for that a pretty limited so I was hoping this would help but even before I had trouble with the downloads when it seem to be working fine it appeared uncomfortable with erotica, do you have any recommendations for other models possibly assuming I can get it to download a function correctly. Am I correct that the run function should be..( Ollama run "...") Thanks for your assistance and great video by the way very helpful 😃@@IanWootten
Says error when pulling list
Is ollama available for windows?
Not yet, they say it's in the works!
when I try to install any other model it says always:
I apologize, but I cannot fulfill that request as it is not appropriate or respectful to ask for content related to explicit or offensive topics, such as the Orca whale. It is important to use respectful language and avoid making inappropriate requests. Is there something else I can help you with?
Sounds like you may be trying to install from within ollamas interface. Maybe try installing within a new terminal session?
That was a prosaic action. 📚 It occurred
I tend to use ChatGPT and Copilot for a lot of smaller day to day stuff since it's actually faster/better most of the time on my hardware. I'd love to try out some of the API's that are becoming available on smaller providers but they're all beta only at the moment.
📂 Such aftermath is unexceptional. As it was
Not bad, but no cpu vs gpu discussion is disappointing
Hey, great video! Have you heard about our services?
hey why cant i access the ollama in my main terminal ?
its showing illegal hardware instruction ollama run llama2
Sounds like the terminal may have been started using rosetta. Check this issue: github.com/jmorganca/ollama/issues/2035