Converting Safetensors to GGUF (for use with Llama.cpp)

Cognibuild AI - GET GOING FAST

มุมมอง 2 022

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 13 ต.ค. 2024
One of the problems with beginning to use chatbot software is the different types of model files. Quite often you find a model you want to use that is in the incorrect format and you don't know what to do.
Well, here we go over how to convert a .safetensor file to .gguf. The beautiful part is it's all ready in your Llama.cpp build!

ความคิดเห็น • 24

@zacharycangemi9525 14 วันที่ผ่านมา
Hey - before I ask a question, I found your videos on youtube last week - outstanding content - thank you so much, massive shout out. You helped me get some local AI up and running for work and just got major shoutouts at work - thank you so much!
How do we convert a already quantized version of a llama 8B to a .gguf file? I keep getting a tensor issue!
@cognibuild 4 วันที่ผ่านมา
i found this on github which might help. You'll need to install and use it in Linux or WSL:
github.com/kevkid/gguf_gui
Just be certain to pip install: pip install llama-cpp-python
because its not added in the installation.
Let me know if it works for you!
@gostjoke หลายเดือนก่อน ⁺¹
thanks bro you are my god
@cognibuild หลายเดือนก่อน ⁺¹
@@gostjoke I'm not a god. I'm just a dude 😎
@gostjoke หลายเดือนก่อน
@@cognibuild Actually I want to ask a question. After my safetensor became gguf, I tried some question, but the model answer seems worse a lot than when it was in safetensor. do you know why?
@cognibuild หลายเดือนก่อน ⁺¹
@@gostjoke a lot of times it has to do with the parameters. Check the chatml mode.
Also try using KoboldCPP (kcpp) for your gguf files to see if they work
@gostjoke หลายเดือนก่อน
@@cognibuild got it, thanks
@heliotek1212 4 วันที่ผ่านมา
What if I don't have llama cpp and I wanna run my model in jan?
@cognibuild 4 วันที่ผ่านมา
in jan?
@cikokid 23 วันที่ผ่านมา ⁺¹
what is your pc system bro share it
@cognibuild 23 วันที่ผ่านมา
@@cikokid asus pro art x670e motherboard, and ryzen 9 7950x, 128 ddr, Nvidia 4090
@robert_nissan หลายเดือนก่อน ⁺¹
exelente bro
@xspydazx 3 หลายเดือนก่อน ⁺¹
Yes this is normal stuff .. but you may not realize that you can open gguf with the transformers library !! ..
Hence you can use save to pretrained to unquantize the gguf file back to safe tensors !!
@cognibuild 3 หลายเดือนก่อน
How would you unquantize something? The numbers are lost
@xspydazx 3 หลายเดือนก่อน
@@cognibuild they are not my friend : i thought that but the numbers are not lost the model is just in perminant 4-bit mode:
so :
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)
print('Extract and Convert to FP16...')
model.to(torch.float16)
model
in this way transformers loads the model as normal(in 4 bit pretrained) then you can save like normal :
i was searching for this in the begining but when i could not find it i gave up:
but i fed my model all of huggingface docs:
so when i was talking about gguf it told me that it was possible so i found it in the docs::
@xspydazx 3 หลายเดือนก่อน
really i could not beleive it so i tried it !!! techniically gguf is just another form of zip!!
but for tensors: it converts the model into a llama clone but it remains the mistral inside : technically its only a wrapper for a q4 etc .... yes the tensor sizes are changed but the calculation to compress is the same to decompress ... so it can unzip again ?
when you compress the model ie 7b it turns into 3.5b ? so did it shrink .... but the unsloth uses 4-bit models ? so we use quantized loras? ....
so there should be no problem once the model is loaded !
@cognibuild 3 หลายเดือนก่อน ⁺¹
Right transfer it back to the format makes more sense. Because if you cut off decimals those decimals are gone. Which is why you're saying it stayed in the quantized size but is now able to run by safe tensors. Cool man!
@xspydazx 3 หลายเดือนก่อน
@@cognibuild i actually discovered it today bro! so i thought i would share...
as gguf locks it for transport ... so you can unlock it ... but as you say i think there will be some loss on Q4 and the harsh qunatizes. i always train in 4 bit to make sure when i quantize the model after its basically the same as it was in training:
but : if i was to use it for transporting , i would probably do a Q8 or even fp16 gguf...
just to make sure .... (this is something quite hidden as you know it can be done but not the syntax) ... as you choose the folder location or repo location you also need to specify the filename .... (wow) ... or you can even just specify the full path of the filename with the kwag handle... (wow)....
(but its still better to run them with llama_cpp for it speed ( on laptop or pc transformers runs a bit slow) but llamaCpp runs fast .. so on laptop if you have to use weight use pipelines as its also much faster for some reason !
(today i actually conquered the stable audio (local)) ... with minor adjustments to thier code to recompile it for local use and not repo (quite easy in the end).. now i can do the sound generation ... .=im still using blip1 for image captioning etc ...(to learn the craft) ... for me i have been concentration on getting media IN first all outputs lead to text but now sound also (speech and noises)... (really enjoyable stuff bro... perhaps you should do a few tutorials...)....
@alwekalanet885 หลายเดือนก่อน
I always wonder why the hell peopel do coding tutorial in a video?
@cognibuild หลายเดือนก่อน
@@alwekalanet885 I didn't know... ask everyone else who appreciates it

ต่อไป

เล่นอัตโนมัติ