Thanks for letting us know about this new release. Just tried it on my 6800xt, and it works. FYI, I think the supported list is all Navi 21 cards and all RDNA 3. That's the same list as the HIP SDK supported cards on the AMD ROCm Windows System Requirements page.
@@JoseRoberto-wr1bv On the Q8_0 version of Llama 3 I was getting 80 t/s, but for a couple of reasons the quality wasn't so good. I'm using Mixtral Instruct as my daily driver, and getting 14-18 depending on how I balance offload vs context size.
TH-cam algorithm is crazy. I tried to do this with my 6800xt when I first got it. After all my research, I found nothing. And five months later, here we are. I can't wait to try this out. Thank you.
Thanks you for the video, I can now use 8B large LLM models with my AMD RX 7600(8GB) and it is really fast. I use Arch Linux and it runs without any problems 👍
@@whale2186 If you work a lot with AI models, projects, an Nvidia RTX graphics card is the best choice. AMD ROCm support is okay but unfortunately not nearly as good as the support from Nvidia CUDA and cuDNN.
I've successfully utilized 70B models with 4-bit quantization on my 4070ti Super. I offload 27 out of 80 layers partially, while the remainder utilizes the RAM. It functions quite well-not exceedingly fast, but sufficiently for comfortable operation. A minimum of 64GB of RAM is required. While VRAM is significant, in reality, you can operate 70B networks with even 10GB of VRAM or less. It ultimately depends on the model's response time to your queries.
Maybe he can say his tok/s, comparing my 2080 vs the video (rx 7600) I get this results: I just tried this vs my 2080 (non super) and I get 62.40tok/s, which is around 40% faster for a card with around the same gaming performance, the vram usage seem a bit lower though (base was on 1.8gb and when opening the same model it was 7.2), so around 5.4gb vram usage for the model. Hopefully amd can catch up in the future :(
Its not working for me, I have a 7900xt installed and attempted the same as you but it just gets an error message with no apparent reason. Drivers up to date and everything in order but nothing
Thanks, worked for me very well on my 6800xt! The answers are as quick as in the video. But I guess I need to learn how and what to ask, because the answers were always very confident and always completely wrong and made-up. I asked the chat to make a list of French kings who were married off before they were 18 yo, and it invented a bunch of Kings that never lived, and said that Emperor Napoleon Bonaparte and President Macron were both married off at 16, but they were not kings technically, and they were certainly not married at 16, lol.
If you already have this GPU, go ahead and play with LLM's. It's a good place to get started. I started playing with a Vega 56 GPU which is rock bottom of what ROCm supports for LLM's if I understand things correctly. If LLM's is the focus and you are buying new nVidia is still the better option. An RTX 3060 w. 12GB of VRAM gives you 20% more tokens/s at 20% less price. I sometimes see used RTX 3080's at the same price point as the RX 7600 XT. You don't need all that VRAM if you don't have the compute power to back it.
Well, it is not like GPGPU came just with LLMs. OpenCL on AMD GPUs in 2013 and before was the most viable option for crypto mining, while Nvidia was too slow at that time due to small cache size and poor efficiency. All changed with 750ti and gtx9xx generation of cards. History of GPU programming is even longer than that as people were trying to bend even fixed pipeline GPUs to calculate something unrelated to graphics. Geforce 8 with early and limited CUDA was of course a game changer and I am a big fan of CUDA and OpenCL since then. Thanks for a great video on 7600XT! ❤
As a total dummy all things LLM your video was the catalyst I needed to entertain the idea of learning about all this AI stuff. I'm wondering and this would be a greatly appreciated video if you make it, is it possible to put this gpu to my streaming pc and it encodes and uploads stream and at the same time runs a local LLM that interacts with the chat on twitch. How can I integrate these models with my twitch streams?
Good vid; however the amd rocm versions of relevant files are no longer available (Link in description leads to generic lm studio versions) )? The later versions don't appear to specifically recognize AMD GPU's ?
Seeing as how I spent last night trying to install ROCm without any luck, nor could I find any good tutorials or a single success story, I'll be curious to see how insanely easy this is. Wait, I don't need to install and run ROCm in WSL?
Hey, I've had success with ROCm on 5.7/6.0/6.1.1 on Ubuntu and 5.7 on Windows so let me know if you're still having an issue and I can probably point you in the right direction
Alas... since this uses ROCm, and AMD does not list *any* RDNA1 cards, then the answer is almost certainly... no. You really wouldn't even want to try it though, since the RX5500 XT is a severely gimped card (not to mention the horror of the non-xt OEM-variant) - it has only 1408 shader-cores, compared to the next jump up: the RX 5600 XT's 2304 cores - that's almost a 50% cut in compute! And it has a measly 4 GB's of VRAM... that's complete murder for LLM-usage - everything will be slow as molasses. You'll lose more time and money in trying to run the model (even if it was supported), than if you just got an RX6600 - that card is the best value *still* on this market, so if you want a cheap entry-level card to try this out, I would recommend that.
This is cool, but I have to say that I'm running Ollama with OpenWebUi and a 1080Ti and I get similarly quick responses. I would assume a newer card would perform much better, so I'm curious where the performance of the new cards really matters for just chatting, if at all.
If there's a Windows driver for ROCm how come PyTorch still only show ROCm available for Linux? Anyway good to know they works, I'd like to buy a new system dedicated to LLM/Diffusion tasks and your is the first confirmation it actually works as intended 😅
I've been looking to make a dedicated AI machine with an LLM. i have a shelf bound 6800xt that has heat issues sustaining gaming, (have repasted, i think is partially defective) i didnt want to throw it away, Now i know i can repurpose it.
Does anyone know of a way to make an RX 580 run with ROCm on Windows? Yes, it's old, but it would be better than using the processor to play with A.I. and there are plenty of RX580s out there.
"If you've used an AMD GPU for compute work, you'll know that's not great" Bruh that Pugetbench score shows the RX 7900 XTX getting 92.6% of the RTX 4090's performance and it has the same amount of VRAM for at least £700 less. 💀💀
Today, I finally jumped off the AMD Struggle Bus, and installed an NVIDIA GPU that runs AI like boss. Instead of waiting SECONDS for two AMD GPUs to SHARE 8GB of memory via torch and pyenv and BIFURCATION software… My RTX 4070 Super just does the damn calculations right THE FIRST TIME!
you couldn't load 30B parameter one because in your settings your trying to offload all layers to your GPU. Play with the setting and try reducing the GPU offload to find your sweet spot.
@sean58271 i dont even know why i said that 30B can fit in 16GB of VRAM and surely not with 12 ...also quantization it's fine ...like the difference from a 8B model at f16 to 8 nothing change basically from 4 there is a loss but the bigger you go the less noticeable
Can I run kobold ai on a rx 7800xt? A 13B model with 4-bit quantization? Currently using a 12GB 3060 and it has been a great card overall. But nvidia being the as* they're they won't increase the vram size even if they double the price of the same series cards, rather lower it. So I'm planning on switching sides.
Is there rx-580 support, who knows for sure? (it's not on the list of ROCm that's why I'm asking) or at list does it work with RX6600M 'cause I see in compatible list only RX6600XT.
The RX6600M is the same chip as the RX6600 (Navi23), just with a different vbios - and since Navi23XT (RX6600XT-6650XT) is simply the full die, without cutting, then it should work on the RX6600M - same chip, just a bit cut down. (not a bad bin though - it's a good bin, with a higher base-clock than desktop RX6600, even, but shaders cut on purpose, to improve efficiency. I.e, desktop RX6600's are failed bins of RX6600XT's, whom are then cut down to justify their existence - laptop RX6600M, are some of the best 6600XT's but cut on purpose to save power)
I just tried this vs my 2080 (non super) and I get 62.40tok/s, which is around 40% faster for a card with around the same gaming performance, the vram usage seem a bit lower though (base was on 1.8gb and when opening the same model it was 7.2), so around 5.4gb vram usage for the model. Hopefully amd can catch up in the future :(
Chat GPT 3.5 has about 170B parameters and I heard that Chat GPT 4 is a MoE with 8 times 120B parameters, so effectively 960B parameters that you would have to load into vram.
@dead_protagonist .. you should mention you don't know what you are talking about and/or didn't read the compatibility supported/unsupported gpu list ... or ... maybe you just can't count ¯\_(ツ)_/¯
Nice video. I just wish they didn't change the UI so much in the space of 8 months that your instructions and information are completely useless :( Literally nothing in their UI is the same now.
Thanks for letting us know about this new release. Just tried it on my 6800xt, and it works. FYI, I think the supported list is all Navi 21 cards and all RDNA 3. That's the same list as the HIP SDK supported cards on the AMD ROCm Windows System Requirements page.
How much Token/s??? Using a 7B model??
And 7600XT is not a part of the official supported list.
@@JoseRoberto-wr1bv On the Q8_0 version of Llama 3 I was getting 80 t/s, but for a couple of reasons the quality wasn't so good. I'm using Mixtral Instruct as my daily driver, and getting 14-18 depending on how I balance offload vs context size.
@@chaz-e that and the 7600 are both gfx1102.
TH-cam algorithm is crazy. I tried to do this with my 6800xt when I first got it. After all my research, I found nothing. And five months later, here we are. I can't wait to try this out. Thank you.
Thanks you for the video, I can now use 8B large LLM models with my AMD RX 7600(8GB) and it is really fast. I use Arch Linux and it runs without any problems 👍
How did you get it to work on Linux? I've been having issues (and Ollama seems to recommend the proprietary AMD drivers....)
@@puffin11 not install amd pro drivers(proprietary). amdgpu is completely sufficient with rocm.
I was confused about buying an rtx 3060 over rx 7600 I thought ROCm was not supported on this card. How is the image generation and model training ?
@@whale2186 If you work a lot with AI models, projects, an Nvidia RTX graphics card is the best choice. AMD ROCm support is okay but unfortunately not nearly as good as the support from Nvidia CUDA and cuDNN.
@@sebidev thank you . I think I should go with 3060 or 4060 with GPU passthrough
brilliant! Thanks for letting us know, I am excited to try this
Will be trying this out later on, thank you my man.
It works awesome on the 6800xt. Thankyou for the guide.
is it as fast in the video ?
@@agx4035the video accurately shows expected performance, yes.
Just picked up a 16gb 6800, can't wait to get it installed and see what this baby can do! ;D
Update ?@@CapaUno1322
do u think it will work well on my standard rx6800
I've successfully utilized 70B models with 4-bit quantization on my 4070ti Super. I offload 27 out of 80 layers partially, while the remainder utilizes the RAM. It functions quite well-not exceedingly fast, but sufficiently for comfortable operation. A minimum of 64GB of RAM is required. While VRAM is significant, in reality, you can operate 70B networks with even 10GB of VRAM or less. It ultimately depends on the model's response time to your queries.
it would be nice to try on a amd equivalent. maybe 7800xt or 7900xt
Maybe he can say his tok/s, comparing my 2080 vs the video (rx 7600) I get this results: I just tried this vs my 2080 (non super) and I get 62.40tok/s, which is around 40% faster for a card with around the same gaming performance, the vram usage seem a bit lower though (base was on 1.8gb and when opening the same model it was 7.2), so around 5.4gb vram usage for the model. Hopefully amd can catch up in the future :(
Nice to know. I thought it only use VRAM xor RAM. Good to know it add up all memory available.
@@gomesbruno201 7900 XTX is around the same performance as the 4070 Ti Super.
Amazing video, I learnt a lot! I love these videos about commerical GPUs running AI/ML workloads as I'm into developing AL/ML models.
Its not working for me, I have a 7900xt installed and attempted the same as you but it just gets an error message with no apparent reason. Drivers up to date and everything in order but nothing
Thanks, the only good video I could find on yt which explained everything easily. Your accent helped me focus. Very useful stuff.
Thank you! Glad to be helpful :D
Thanks, worked for me very well on my 6800xt! The answers are as quick as in the video. But I guess I need to learn how and what to ask, because the answers were always very confident and always completely wrong and made-up. I asked the chat to make a list of French kings who were married off before they were 18 yo, and it invented a bunch of Kings that never lived, and said that Emperor Napoleon Bonaparte and President Macron were both married off at 16, but they were not kings technically, and they were certainly not married at 16, lol.
If you already have this GPU, go ahead and play with LLM's. It's a good place to get started. I started playing with a Vega 56 GPU which is rock bottom of what ROCm supports for LLM's if I understand things correctly. If LLM's is the focus and you are buying new nVidia is still the better option. An RTX 3060 w. 12GB of VRAM gives you 20% more tokens/s at 20% less price. I sometimes see used RTX 3080's at the same price point as the RX 7600 XT. You don't need all that VRAM if you don't have the compute power to back it.
You are aware that you are running a q4 version of the model?
Which explains the low vram
Works just fine with RX5700xt, it does respond decently fast.
What a fabulous collection of undefined acronyms.
Well, it is not like GPGPU came just with LLMs. OpenCL on AMD GPUs in 2013 and before was the most viable option for crypto mining, while Nvidia was too slow at that time due to small cache size and poor efficiency. All changed with 750ti and gtx9xx generation of cards. History of GPU programming is even longer than that as people were trying to bend even fixed pipeline GPUs to calculate something unrelated to graphics. Geforce 8 with early and limited CUDA was of course a game changer and I am a big fan of CUDA and OpenCL since then. Thanks for a great video on 7600XT! ❤
Can you add multiple AMDs together increasing the power?
As a total dummy all things LLM your video was the catalyst I needed to entertain the idea of learning about all this AI stuff. I'm wondering and this would be a greatly appreciated video if you make it, is it possible to put this gpu to my streaming pc and it encodes and uploads stream and at the same time runs a local LLM that interacts with the chat on twitch. How can I integrate these models with my twitch streams?
Can you talk more about the memory efficiency of the VRAM usage between AMD and NVidia. Would like to learn more about this.
Good vid; however the amd rocm versions of relevant files are no longer available (Link in description leads to generic lm studio versions) )? The later versions don't appear to specifically recognize AMD GPU's ?
gpu not detected on rx 6800 windows 10. edit: nvm must load model first from the top center.
Good news! ;D
What do you mean by “first from the top center?” I couldn’t get ROCm to recognize my CLU either, but that was through WSL 2 not this app
Seeing as how I spent last night trying to install ROCm without any luck, nor could I find any good tutorials or a single success story, I'll be curious to see how insanely easy this is. Wait, I don't need to install and run ROCm in WSL?
Hey, I've had success with ROCm on 5.7/6.0/6.1.1 on Ubuntu and 5.7 on Windows so let me know if you're still having an issue and I can probably point you in the right direction
Can you do an update when ROCm 6.1 is integrated to LM Studio?
6.1 is not likely to ever be available on Windows. Need to wait for 6.2 at least.
@@bankmanager Ok, thanks for the reply.
07:34 Not sure if this will fix it but try unchecking the "GPU offload" box before loading the model, do tell us if it works!
I thought it only runs on linux. Do you use WSL/Docker?
Hi, does it work on RX5500 series ?
Alas... since this uses ROCm, and AMD does not list *any* RDNA1 cards, then the answer is almost certainly... no. You really wouldn't even want to try it though, since the RX5500 XT is a severely gimped card (not to mention the horror of the non-xt OEM-variant) - it has only 1408 shader-cores, compared to the next jump up: the RX 5600 XT's 2304 cores - that's almost a 50% cut in compute! And it has a measly 4 GB's of VRAM... that's complete murder for LLM-usage - everything will be slow as molasses. You'll lose more time and money in trying to run the model (even if it was supported), than if you just got an RX6600 - that card is the best value *still* on this market, so if you want a cheap entry-level card to try this out, I would recommend that.
When you have a LLM on your machine, can it still access the internet for information? Just thinking aloud? Thanks, subbed! ;D
Turn of internet and see what would happened :)
Yes you can toggle the web search feature for latest data
This is cool, but I have to say that I'm running Ollama with OpenWebUi and a 1080Ti and I get similarly quick responses. I would assume a newer card would perform much better, so I'm curious where the performance of the new cards really matters for just chatting, if at all.
If you add a voice generation then it matter a lot. With no voice anything over 10 token sec is pretty usable.
hope this works with my rx6800
How is it doing with image generation?
AM I required to install AMD HIP SDK for Windows first before I can use LLM studio?
Yes.
Can rx 570 8gb variant support ROCm?
Are any of these models that we can run locally uncensored/unrestricted?
officially this is for 6000 and 7000 series only atm on windows
Incredible video!
Great video. Worked for me on the first try. Is there a guide somewhere on how to limit/configure a model?
If there's a Windows driver for ROCm how come PyTorch still only show ROCm available for Linux?
Anyway good to know they works, I'd like to buy a new system dedicated to LLM/Diffusion tasks and your is the first confirmation it actually works as intended 😅
Can you do a comparison vs cuda?
Can we use it to generate images as well (like mid journey or dall-e) or does it work only for text?
yeah, on linux with SD
I've been looking to make a dedicated AI machine with an LLM. i have a shelf bound 6800xt that has heat issues sustaining gaming, (have repasted, i think is partially defective) i didnt want to throw it away, Now i know i can repurpose it.
Does anyone know of a way to make an RX 580 run with ROCm on Windows? Yes, it's old, but it would be better than using the processor to play with A.I. and there are plenty of RX580s out there.
can you teach how to do LIMs, Large Image Models ?
"If you've used an AMD GPU for compute work, you'll know that's not great"
Bruh that Pugetbench score shows the RX 7900 XTX getting 92.6% of the RTX 4090's performance and it has the same amount of VRAM for at least £700 less. 💀💀
I would like to see how it performs with a graphics card, rx 7600 standard version.
Today, I finally jumped off the AMD Struggle Bus, and installed an NVIDIA GPU that runs AI like boss. Instead of waiting SECONDS for two AMD GPUs to SHARE 8GB of memory via torch and pyenv and BIFURCATION software…
My RTX 4070 Super just does the damn calculations right THE FIRST TIME!
what about multiple 7600 cards
I'll try it in a few hours with the 780M iGPU and let you know
Not working!
you couldn't load 30B parameter one because in your settings your trying to offload all layers to your GPU. Play with the setting and try reducing the GPU offload to find your sweet spot.
ZLUDA is available again btw
wait 30bilion parameter model are fine with GGUF and 16gb even with 12 is something that im missing ?
Quantization...which decreases the quality of the response. Not really worth it in my opinion
@sean58271 i dont even know why i said that 30B can fit in 16GB of VRAM and surely not with 12 ...also quantization it's fine ...like the difference from a 8B model at f16 to 8 nothing change basically from 4 there is a loss but the bigger you go the less noticeable
How it work with laptops? We have 2 GPU, small and large and llama studio turn on small gpu(
Nice Miniled !
Can I run kobold ai on a rx 7800xt? A 13B model with 4-bit quantization? Currently using a 12GB 3060 and it has been a great card overall. But nvidia being the as* they're they won't increase the vram size even if they double the price of the same series cards, rather lower it. So I'm planning on switching sides.
please someone tell me how to make this 7600xt work normally with stable diffusion
I have a 6800XT, 6900XT and a 7900XT. I will attempt this on each.
Would this work with Ollama?
Yes
I asked If it can generate a qr cod for me and it faild.
it says runtime isnt supported
Can I do anything useful on the phoenix NPU? Just bought a Phoenix laptop.
Can you try Ollama with the this rocm thing? I've been splitting my head trying to get it to work with 6800xt
Ollama doesn’t work with ROCm. It is for nvidia and Apple silicon only.
@@ManjaroBlackTalking out your hairy buttocks
@@ManjaroBlack rocM is an amd project LIKE AMD CPUS AND GPUS are you high?
mine isnt using the GPU, it still uses the cpu. 6950xt
WHAT IF IM ON WINDOWS?
will be good if also create a video for open-webui + AMD
how tf do you download llama its so weird
So I asked the ai what it recommends if I want to upgrade my pc and it recommended RX 8000 XT💀
Is there rx-580 support, who knows for sure? (it's not on the list of ROCm that's why I'm asking) or at list does it work with RX6600M 'cause I see in compatible list only RX6600XT.
The RX6600M is the same chip as the RX6600 (Navi23), just with a different vbios - and since Navi23XT (RX6600XT-6650XT) is simply the full die, without cutting, then it should work on the RX6600M - same chip, just a bit cut down.
(not a bad bin though - it's a good bin, with a higher base-clock than desktop RX6600, even, but shaders cut on purpose, to improve efficiency. I.e, desktop RX6600's are failed bins of RX6600XT's, whom are then cut down to justify their existence - laptop RX6600M, are some of the best 6600XT's but cut on purpose to save power)
I'd like you to try out an 8700G with fast ram to run LLMs. Also please run Linux.
Would be interesting to see if the npu of those CPU could be usable
I just tried this vs my 2080 (non super) and I get 62.40tok/s, which is around 40% faster for a card with around the same gaming performance, the vram usage seem a bit lower though (base was on 1.8gb and when opening the same model it was 7.2), so around 5.4gb vram usage for the model. Hopefully amd can catch up in the future :(
Amazing. The 7600(xt) is not even officially supported in AMDs ROCM software.
calling a 350-370€ grafics card "budget" is kinda weird ngl
Nvidia said so, pal.
Shouldn't you be at the Olympics? Maybe you are! 😅
How on earth can these cards be cheaper then NVIDIA I think I'll never buy NVIDIA again ...
its working in my 6700xt tnx
Hey man did you find the download link to that and rocm ? cause it's just giving me the normal one
Chat GPT 3.5 has about 170B parameters and I heard that Chat GPT 4 is a MoE with 8 times 120B parameters, so effectively 960B parameters that you would have to load into vram.
Let me know when amd can run diffusi0n models quicker than CPUs 😢
you should mention that ROCm only supports... three... AMD gpu's
More than 3
@@user-hq9fp8sm8f source
@@user-hq9fp8sm8f does it support RX 5600 XT?
@@arg0x- no
@dead_protagonist .. you should mention you don't know what you are talking about and/or didn't read the compatibility supported/unsupported gpu list ...
or ... maybe you just can't count ¯\_(ツ)_/¯
Hintz Summit
Nice video. I just wish they didn't change the UI so much in the space of 8 months that your instructions and information are completely useless :( Literally nothing in their UI is the same now.
Buy a NVIDIA card and be happy.
I can't set my 7900xtx to roc. Only options are Vulkan
Clickbait title I suppose? Just because you can run local LLMs, doesn't mean the GPU plays in the same league as nvidia consumer GPUs (4090)
ROCm still very sux today.
Zluda :)
NEVER BUY AMD
How do you get the AMD out of your throat? Just wondering since I’ve never seen anyone gobble so hard…
Turn off your screen
sad that there is only advertising here, an amd gpu is bad - where is the video about the problems of an amd gpu ?
i have had AMD GPUs for the past 14 years . never a problem , im on the 7900xtx now and i works great for what i do
Amd is improving software in lighting speed. So what are you smoking ? Why Amd gpu can not do GPGPU with good software ?
Not everyone can afford a 4090 GPU. AMD seems like a better value, at the cost of a little extra effort.
Got anything as good for image generation?