Running a Hugging Face LLM on your laptop

Learn Data with Mark

มุมมอง 104 516

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 16 ม.ค. 2025

ความคิดเห็น • 124

@elmino19 ปีที่แล้ว ⁺⁴⁸
You explained completely and perfectly without wasting the audience's time! well done
@learndatawithmark ปีที่แล้ว ⁺¹
Thanks!
@norwegiansmores811 6 หลายเดือนก่อน
except for the part that i have got no clue what the required software to start is. what os to run on nor how to run the code in step 2. the farthest i got was git bash for the transformer and thats about it. i just want to run some local ai, why does it have to be so obtuse??
@learndatawithmark 6 หลายเดือนก่อน ⁺²
@@norwegiansmores811 It's running in a Jupyter notebook - jupyter.org/ but you can run the code anywhere that can run Python code e.g. a Python REPL or a Python script.
@learndatawithmark 6 หลายเดือนก่อน ⁺¹
@norwegianmores811 I think the easiest way to run local AI as of June 2024 is now this - lmstudio.ai/
@TheKaryo หลายเดือนก่อน
Thanks for the quick and consise video, exactly what I wanted as someone that is already somewhat familiar with using models and just new to hugging face
@artistry72 26 วันที่ผ่านมา
Thanks a ton man... outstanding video and tutorials
@alexandrerodtchenko6099 10 หลายเดือนก่อน ⁺¹
Super video!
@user-du8hf3he7r ปีที่แล้ว ⁺¹⁸
An API key is not needed if the model is downloaded and run locally.
@itspaintosee 10 หลายเดือนก่อน
So long as you have a behemoth of a machine. 16GB Ram = 100% memory usage 😭😂
@jayo3074 9 หลายเดือนก่อน
I don't think anyone can afford an expensive laptop lol
@Sovereignl55 9 หลายเดือนก่อน
Do you know how to run it on live servers!! How to get?
@Sendero-yp5gi 6 หลายเดือนก่อน ⁺²
It is needed to download the model!
@mortysmith666 6 หลายเดือนก่อน ⁺⁴
The api key is used to authenticate account for huggingface
@headshorts_YT 10 หลายเดือนก่อน
Awesome! Thanks for this video.
@MitulGarg3 8 หลายเดือนก่อน
Absolutely wonderful video! to the point and well explianed! way to go! thanks a lot!
@learndatawithmark 8 หลายเดือนก่อน
Thanks - very kind of you :D
@flaviocorreia4462 10 หลายเดือนก่อน
Thank you very much, you helped me a lot
@tee_iam78 6 หลายเดือนก่อน
Thank you for the contents.
@NappoAvanti 11 หลายเดือนก่อน
Thanks for this video!!
@shivamroy1775 ปีที่แล้ว
This was an extremely informative video. Really appreciate it.
@learndatawithmark ปีที่แล้ว ⁺¹
Thanks, glad you enjoyed it!
@youssefabbas6349 7 หลายเดือนก่อน
thank you very much for this great explaination
@radoslavkoynov322 11 หลายเดือนก่อน ⁺³
I am getting an error/ info log from transformers (twice) stating "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained." The model then generates only a bunch of whitespace, no matter the input. I have followed through your steps and made sure the files were downloaded at the expected location. The behavior occurrs both with and without setting legacy=False.
@learndatawithmark 11 หลายเดือนก่อน
Does it work after you see that message?
@kyledavelaar455 9 หลายเดือนก่อน
@@learndatawithmark getting the same error and hang when running in colab or locally. seems like the pipeline("my query") never resolves
@engineeringcareer8313 7 หลายเดือนก่อน
Same thing happens to me! Hanging and giving random replies! I am using mac m3
@WindyrootLimesprite หลายเดือนก่อน ⁺¹
what was the interface you were using? Is that on huggingface somewhere?
@colorblindzebra 5 หลายเดือนก่อน
Excellent thanks dude
@darylallen2485 ปีที่แล้ว ⁺²
1:03 - Thanks for this clarification. I'd done quite a bit of Google searching and scouring the Hugging Face website for this information. I found nothing of value. I'm a computer enthusiast / gamer and not a professional machine learning engineer. Since embarking on running an LLM locally on my previous daily use desktop, I've noticed its near impossible to find a model's resource needs. GPT4 says a 7b parameter model would consume about 48 GB memory. I asked it what size model would fit in my 12 GB Nvidia 3060, it said about 3.2 billion. My question for you is, why is it that everyone in this space who seems to offer a model (or talk about them) never includes something like a system requirements descriptor? Is it one of those situations where, if you need to ask, you probably don't have enough resources? Thanks for any insight you can give on this phenomenon.
@learndatawithmark ปีที่แล้ว ⁺⁵
My impression is that most of the models being created are assuming that you have insanely good GPUs to run them on!
Since I created this video, there's been a lot of work done by a guy called TheBloke on Hugging Face to 'quantise' the models, which effectively means that the amount of resources required is reduced, but the quality of the model is slightly reduced too.
I've found those models work a lot better on my laptop.
The Bloke is using a format called GGUF, which is kind of a defact format for LLM models. I made a video showing how to run one of his models on my machine - th-cam.com/video/7BH4C6-HP14/w-d-xo.html. That video uses a tool called Ollama which works on Linux/Mac - th-cam.com/video/NFgEgqua-fg/w-d-xo.html
There is also another library called CTransformers which lets you choose whether to run models on the GPU or CPU. I've found the 7B parameter quantised models work reasonably well even on the CPU. I should probably create a video about that I guess! But in the mean time, this is the link - github.com/marella/ctransformers
@darylallen2485 ปีที่แล้ว
@@learndatawithmark thanks!
@enceladus96 ปีที่แล้ว
this video saved my day
@armantech5926 9 หลายเดือนก่อน
That's Great! Thank you!
@piyushharsh01 ปีที่แล้ว ⁺³
Super helpful and easy to understand!
@learndatawithmark ปีที่แล้ว ⁺¹
Glad it was helpful :)
@ibrahimparkar6900 3 หลายเดือนก่อน
Awesome
@Inderastein 9 หลายเดือนก่อน ⁺¹
hey um, i don't know if you'll read this in time, but I have a problem:
pytorch_model.bin: 0%| | 0.00/13.5G [00:00
@learndatawithmark 9 หลายเดือนก่อน ⁺¹
Hard to know exactly why - maybe connectivity with Hugging Face or maybe your internet or maybe the download tool?! You could try going to Hugging Face directly and click through to files and download them directly to see if it helps.
@stanleyt6003 6 หลายเดือนก่อน
@@learndatawithmark You got to rerun that part couple times and make sure you have a fast connection. For example, pytorch_model.bin is 6.71gb that would take some time to download
@imaginarybuddy 6 หลายเดือนก่อน
hi, thanks for the video. May I ask what's the meaning of legacy=False when using the pretrained model?
@lolla6154 4 หลายเดือนก่อน ⁺¹
it says hf_hub_download isn't defined
@wadejohnson4542 ปีที่แล้ว
What is the configuration of your local environment
@Cynadyde ปีที่แล้ว ⁺²
If you're getting a wacky error trying to perform `AutoTokenizer.from_pretrained(model_id, legacy=False)`, do pip install protobuf==3.20.1 and restart the jupyter kernel
@learndatawithmark ปีที่แล้ว
Good tip! I get that error somewhat randomly but never quite figured out the combination steps that result in it happening!
@ravirajasekharuni 5 หลายเดือนก่อน ⁺¹
Amazing and outstanding. This video and presentation is awesome.
@learndatawithmark 5 หลายเดือนก่อน
Thanks!
@mbikangruth5630 6 หลายเดือนก่อน ⁺¹
I have done as you say, but running the model pipeline is taking forever to work. It still has not worked, please what can I do?
@learndatawithmark 6 หลายเดือนก่อน ⁺¹
If it's running too slowly then maybe it'd make sense to try out some of the quantised models instead. Those ones are smaller and better suited for running on consumer hardware.
I quite like Ollama and I've made a few videos on that. This is probably the best place to start - th-cam.com/video/NFgEgqua-fg/w-d-xo.html
@mikiallen7733 11 หลายเดือนก่อน
thanks sir , however I want to know
1- how one can integrate specific set of models (pre-trained) ones in to Rstudio ? so that one can simply run examples on data "proprietary in my case " locally within R
2- is there a way to ask the inference API for tasks different from the typical sentiment classification of text for example "multi-entity tagging" , "modalities" ....etc
your input is highly appreciated
@knotfoursail6404 ปีที่แล้ว
Super helpful 👍
@knotfoursail6404 ปีที่แล้ว
Random idea, but a video on how to run an embeddings model on a laptop would be really cool 😀 Could even combine embeddings + text2text for more specific answers. Or even t5_3b + selenium to create something similar to bing chat. Anyway, wish you luck on TH-cam 😊
@learndatawithmark ปีที่แล้ว
Sorry, I didn't see this reply! I've got a notebook with that idea sketched out, so I'll create a video for that soon. On holiday at the moment, but will do it when I get back home!
@Sara-po1jd หลายเดือนก่อน ⁺¹
what that tool you are using called? the broswer ?
@enoshpeterponraj2407 2 วันที่ผ่านมา
jupyter notebook
@Xploitacademy 8 หลายเดือนก่อน
What is the editor you are using on localhost ?
@learndatawithmark 8 หลายเดือนก่อน ⁺¹
I'm using a Jupyter notebook in the video
@wasgeht2409 7 หลายเดือนก่อน
thx
@MarxTech_DIY ปีที่แล้ว
Hey, great tutorial! I also found your blog on this and followed that, but I always get this error: Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
This is my first time experimenting with LLMs, so any assistance would be greatly appreciated.
@learndatawithmark ปีที่แล้ว
Oh I'm not sure about that error - I haven't seen that one before. Since I made this video I've been playing around with another tool called Ollama which I found easier to use. It might be worth giving that a try to see if that works for you?
th-cam.com/video/NFgEgqua-fg/w-d-xo.html
@l501l501l ปีที่แล้ว
Hi Mark, great video. May I know your notebook and the configuration? I’m thinking switching to MacOS to play around with Gen AI.
@learndatawithmark ปีที่แล้ว
I'm using the latest version of Jupyter Lab and I have it set to dark mode with pretty much every one of the views hidden so that I can use as much of the screen as I can.
Not sure if that answered your question, so feel free to follow up!
@SudhakarVyas 7 หลายเดือนก่อน
Thanks Mark for this video. A quick question- Is this safe to pass some PII data to one of the open source hugging face models that require the hugging face API token ? If No, how can this be resolved in deployment so that there is no risk of data leakage ? Please guide through this.
@learndatawithmark 7 หลายเดือนก่อน ⁺¹
It depends.
If you are passing your HF API token because you're using the HF inference endpoint then your data is getting sent to the HF API.
If you're passing it because you're downloading a model that requires token auth then your data will only be local to where you run the model that you download.
@riok4523 7 หลายเดือนก่อน
hi Mark - super helpful. can i run all of this in terminal?
@learndatawithmark 7 หลายเดือนก่อน
You can. You can use a Python REPL or even the iPython CLI. Or you could put it all in a Python script and run that.
@user-dkfbbdh ปีที่แล้ว
thanks Mark, very nice video, super clearly put!
could you please suggest, what could be the reason if (when trying to set the wifi off) the output of those lines of code is "ModuleNotFoundError: No module named utils"?
@learndatawithmark ปีที่แล้ว
utils should be referring to this file - github.com/mneedham/LearnDataWithMark/blob/main/llm-own-laptop/notebooks/utils.py - so in theory that's independent of WiFi connectivity. If it can't find that module you could copy/paste those functions into the notebook and use them like that.
@mokh1611 10 หลายเดือนก่อน
I'm probably missing something, but where are you using the downloaded files? you are entering model_id in .from_pretrained(), how is it finding/using the downloaded model?
@learndatawithmark 10 หลายเดือนก่อน
It's reading from the ~/.cache directory. So it constructs a file path based on that directory & the model id
@static_frostBRK 6 หลายเดือนก่อน
Hello there Mark i was wondering if i could use this method to download other ai models for example text to image models?
@learndatawithmark 6 หลายเดือนก่อน
Yes you should be able to use a similar approach. There's a good guide on image to text over here - huggingface.co/tasks/image-text-to-text
@dgl3283 7 หลายเดือนก่อน
I deeply appreciate your video! Although I have a question, does this still works when the model file is a .safetensors or .pth file, not a .bin file? Thank you!
@learndatawithmark 7 หลายเดือนก่อน
Yeh I think it should work with both of those.
@Shivam-bi5uo ปีที่แล้ว
i want to work with a model that is tagged as 'text-generation' how do i run it?
@dimitripetrenko438 ปีที่แล้ว
Hi Mark! This video is very helpful, may I ask do you think fastchat can be used in combination with Qdrant for RAG? Thank you in advance
@learndatawithmark ปีที่แล้ว ⁺¹
Yeh you can could combine it with any database to do RAG.
@Haui1985m ปีที่แล้ว
Hi, wich webinterface you use for python scripts? I want to use it to :)
@learndatawithmark ปีที่แล้ว
This is Jupyter Lab - jupyter.org/
@trealwilliams1563 4 หลายเดือนก่อน
We're going to start by opening??? Start by opening what exactly?
@learndatawithmark 4 หลายเดือนก่อน
Open a Jupyter Notebook - jupyter.org/
Here's a link to the notebook I used in the video - github.com/mneedham/LearnDataWithMark/blob/main/llm-own-laptop/notebooks/LLMOwnLaptop.ipynb
@OmarAli19591 ปีที่แล้ว
sorry i'm just starting with this, the code you're writing in the beginning, what is the website called?
@learndatawithmark ปีที่แล้ว
Do you mean this one? huggingface.co/
@Sendero-yp5gi 6 หลายเดือนก่อน
What is the difference w.r.t to using the classical:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
Thanks in advance!
@learndatawithmark 6 หลายเดือนก่อน
I think it's the same thing under the hood - no need to change from your approach!
@무둥이-f8o ปีที่แล้ว
Thank you! I finally downloaded a big llama model.. lol 😹
@learndatawithmark ปีที่แล้ว ⁺¹
Winning!
@aprilscott2369 2 หลายเดือนก่อน
really tried to follow along but I'm completely lost. I've never used python I'm assuming I put my own info in the parameters? Do I run each cell after filling it in? i just really have no clue what I'm doing but I've wanted to use hugging face for a while an no tutorial really helped me. it seems like every one expects me to know how to set up virtual environments right off the bat and do everything. if someone could just tell me what im actually putting where or what to run and when it would probably work i think
@viniciustsugi8007 ปีที่แล้ว
Awesome content, love your channel! Video is very informative and concise, thanks. As a friendly suggestion, you might want to give a couple of secs at the end for the video for slow people like me to hit that well deserved like button :)
@learndatawithmark ปีที่แล้ว ⁺¹
Thanks for your kind words! Let me see if I can figure out a good way to implement your suggestion 🙂
@engineeringcareer8313 7 หลายเดือนก่อน
Hey, can you tell me about your system info, i am using mac m3 and its not giving any response and running continuously?
@learndatawithmark 7 หลายเดือนก่อน
I use a Mac M1 with 64GB RAM. I think it's a 2021 edition. I've found in general that the quantised models work better on my machine - either using Ollama or llama.cpp.
th-cam.com/video/YDj_ScvBpKU/w-d-xo.html
th-cam.com/video/NFgEgqua-fg/w-d-xo.html
@marufakamallabonno146 ปีที่แล้ว
How can I use this downloaded model next time ?
@learndatawithmark ปีที่แล้ว
It will already be there so if you try to use it again there won't be any need to download it
@rodriguezmj11 ปีที่แล้ว
Has anyone built a GUI for this?
@insideworld4122 10 หลายเดือนก่อน
sir if wifi is on then they model is working properly or not?
@learndatawithmark 10 หลายเดือนก่อน
Yes it should work without wifi - but you will need a connection to the internet to download the model.
@harshans7712 4 หลายเดือนก่อน
Thank you for this tutorial, this tutorial was really useful
@thenotsurechannel7630 2 หลายเดือนก่อน
Could you perhaps make a video with this very same theme... but assume there are those who don't know the first thing about installing this stuff? This is WAY too complex... why wouldn't there just be an installer (.exe) file you can just download and run, and just like any other program... Bob's you're uncle! It's ready to use?
@The_little_black_cat ปีที่แล้ว
Hi, i try to find someone who uses GGUF directly and locally without using a .bin to launch it because I would like to launch it under python, is this possible? Or should I do something else?
@learndatawithmark ปีที่แล้ว
You can do this using CTransformers like I did in this video - th-cam.com/video/S2thmwdrYrI/w-d-xo.html
I think you might even be able to do it with HuggingFace transformers, but I haven't tried it myself.
@The_little_black_cat ปีที่แล้ว
@@learndatawithmark if one day you make a video on this, I would like to see it, in fact what I would have liked was to discuss with the model directly with python without going through any interface and to give it a personality with json like we have could do it with webui (but without webui) I tried various methods and honestly I find so little explanation. I had the idea of making my own bot as I saw in "wifu" mode in the sense that it is totally customizable and we give it a personality with a long term memory. The basic idea was to have a small model just for me. I'm just frustrated to see bots that don't even remember talking to us 2 seconds before. xD
@learndatawithmark ปีที่แล้ว
@@The_little_black_cat it sounds like you want to keep the history of the chat messages between you and the LLM? I showed how to do this in memory on this video using Ollama, but it can be adapted to another approach - th-cam.com/video/MiJQ_zlnBeo/w-d-xo.html. I can across a tool called MemGPT which I think attempts to solve this problem, but I haven't tried it yet - memgpt.ai/
@timjx3675 ปีที่แล้ว
Great vid, however I’m getting a value error, failure to import transformers error even though I used pip to do that, wondering if it’s a python version issue, I’m using 3.10, wonder if anyone has any ideas ? Thx
@learndatawithmark ปีที่แล้ว
Can you share a script with all the code you ran and I'll try to reproduce?
@CGATTMUSIC ปีที่แล้ว
use 3.9 its more stable
@stanleyt6003 6 หลายเดือนก่อน
your python installation is missing the transformer module 'pip install transformers' will do the trick.
@IanTindale 6 หลายเดือนก่อน
I keep following along until about 12 seconds in, where you start typing into something and you say let’s open up age something, and carry on typing into whatever it is you’re typing into - I can’t get that far, I don’t know what to type into
@learndatawithmark 6 หลายเดือนก่อน ⁺¹
I'm using a Jupyter Notebook, but the code would work in any Python environment or script
jupyter.org/
@IanTindale 6 หลายเดือนก่อน
@@learndatawithmark ah thanks, that’s interesting - I’ve never heard of that
@diln5 7 หลายเดือนก่อน
i personally found disabling your wifi from a jupyter notebook to be bad ass
@learndatawithmark 7 หลายเดือนก่อน
Haha, thanks. It took me a little while to figure out how to do it!
@paulohss2 10 หลายเดือนก่อน ⁺¹
So many steps missing in this video...
@P.Raghul 5 หลายเดือนก่อน
Bro I need to make offline chatbot using llm how to implement it please tell
@mohsenghafari7652 11 หลายเดือนก่อน
hi. please help me. how to create custom model from many pdfs in Persian language? tank you.
@artusanctus997 10 หลายเดือนก่อน
Seems unnecessarily complex... isn't there like an online space to use this stuff without having to write a bunch of stuff just to download it?
@learndatawithmark 10 หลายเดือนก่อน
Yeh I think with a bunch of the models you're able to run them on the Hugging Face website on the right hand side of the page. And then in general there are many services that offer APIs that you can call. The approach describe in this video is only for if you don't want to use those services.
@AwkwardTruths 9 หลายเดือนก่อน
Pinned
@TheFrankyguitar ปีที่แล้ว
When I run this: "os.environ.get("HUGGING_FACE_API_KEY")" I get"None". Is it normal?
@TheFrankyguitar ปีที่แล้ว ⁺¹
I guess I need to set the HUGGING_FACE_API_KEY variable to my token beforehand.
@learndatawithmark ปีที่แล้ว ⁺¹
You would need to set that environment variable before running your Python environment otherwise yeh it'll be none

ต่อไป

เล่นอัตโนมัติ

Running Mistral AI on your machine with Ollama