If you are stuck on Windows with your AMD GPU, AMD also has a guide to installing LM Studio (version with ROCm support). It even supports RAG, so you can "chat" with large documents or books. Quite useful for summarization or querying. "How to enable RAG (Retrieval Augmented Generation) on an AMD Ryzen™ AI PC or Radeon Graphics Card" is the title of the post. Works well even with only 8GB VRAM on 7600 (non XT).
I keep telling people to use stability matrix if you want easy install of stable diffusion. It more or less one click install of comfyui, automatic, and all the other major ones, It works on linux and windows, nvidia and amd, etc etc etc. It also pulls from hugging face and civit. Keeps all of your stuff organized.
Honestly the 7900 XTX is steam rolling in image generation thanks to Microsoft's OLive (ONNX Live). If you're a Stable Diffusion user, run SD.Next with ONNX and OLive enabled. Benchmarks for the 7900 XTX are above 50 it/s on Linux with AMDPRO drivers.. which is insane, most people sit at 6 it/s lol Unfortunatly we're relying on small teams and random nerds to get implementations working well for the average "git clone" user. AMD purchasing nod-ai should get SHARK more 'professional' eventually and potentionally replace 1111 and SD.Next.
mate do you think the 7900xtx better than 3090 for the deep learning? I am interested in buying one but two hearted. + I am on linux and I think It shouldn't be problem on using ROCm.
@@RaadLearn To be honest lad I've only used it for generations rather than the learning part, so I can't say for certain. ROCm is generally great now and the tooling for AMD with things like good pytorch support, the DirectML reboot and ZLuda (technically abandoned by AMD for legal reasons, but community supported) is leaps and bounds. I think with the 3090 it'll be a 'set and forget' easy to work with existing programs, but a 7900xtx seemingly gains HUGE speed and performance bonuses when everything lines up perfectly. It's just getting those tweaks and tools running right to make use of AMD's good parts and ignore CUDA noise. I'd personally go for AMD as a home user who loves tinkering and spending hours perfecting a toolset. But if you're an institution or student doing research work and don't have the time or elderly-graybeard-knowledge it'll probably save resources overall to go NVidia and focus on the project. Take all that with a fucking HUGE grain of salt - these things move fast and the last time I deep dived on this was at least 3 month ago - I didn't even know nVidia recently made it technically illegal to run CUDA on AMD cards until like 2 month ago ffs.
@@TatsuyaNFT Just tried a random SDXL model for you, with a batch size of 8 it took 14 seconds to generate 8 images at 512x512 res and 25 steps Euler A (p115/n98). Was also watching youtube which probably didn't help lol. 45 steps took 27 seconds for 8 images. This was with an old buggy AMD version of Zluda (since removed), I imagine if you start fresh with whatever chain is the current best you could easily increase that. MS started up DIrectML development again and Zluda has a new big investor to get the chains clinking.
GPT4All is also a good choice for AMD users. Super simple to setup but limited language model options. The interesting thing about GPT4All is that they implemented a Vulcan backend.
You should probably go into how important the various floating point standards are for current models, and why it will take a hardware re-arch/re-spin and some new FP standards in order to make things a LOT faster.
If setting up AMD for ML development terrifies you, you can use Docker and VSCodes Dev Container package. With Docker you usually just run the container and do your scripts, but with Dev Containers you can actually debug your Python code within the container. It (mostly*) will feel exactly the same as use a virtual environment, but now you don't have to worry about any of the hard parts of the setup. *This does get a bit funny when you want to use multiple workspaces in VSCode, like if you're working on multiple project - but this is super niche.
The value is decent for 16gb at a solid price around ~$550 but this is playing catch-up with a slightly cheaper card in a CUDA first world. I'd really like to switch from a 3070 to AMD but have yet to see the real investment in the RocM platform from them. At the end of the day saving a few hundred dollars isn't worth it when the software support lags behind. And if you really wanted to save money you'd get an $800 3090 with 24gb of vram.
I've run Ollama Dolphin Mixtral locally on a RX6600 (non XT). It's slow, but it works. Nice being able to give it prompts that would be refused by censored models. Think of the kittens.
I use a RX 6600m (Chinese model), connected via oCuLink 8x port (eGPU) and it definitely works wonderfully well! Mistral 7B, ollama 8B and etc... Codestral 22B does not work well, very slow.
If your buying hardware now for ai, get Nvidia. If you already have an AMD GPU, that's what this video is about. If you can wait, Intel is making a pci-e ai accelerator. Which might be good, we know when it's out. AMD is late to the ai party, but they're working on it now. Last I checked it was all enterprise, nothing for home users. Nvidia has a lot for home and small business as well as enterprise.
The 7900 GRE is fast and has good support, but there are cheaper 16GB cards out there. I used to run Stable Diffusion overnight on CPU to get a batch of images, so I know speed can be important, but for well supported GPUs vram per dollar is what you're looking for. I got the 7900 XTX with 24GB vram. I'd consider 16GB entry level.
@@applemirer3937 Thank you. That's a huge relief. Nvidia doesn't seem to want to give VRAM to the masses, and I'm scared that 16gb will cause problems when this is a card I'll be depending on to do my AI inference for work.
I just bought last week a 7900 GRE and was quite underwhelmed by the difference in performance with my Vega 64 on my quick and dirty Stable Diffusion installation on Windows 10. I knew I was definitely going to check the Level1techs forum for help, so this video is amazingly relevant for me haha. 2 quick questions tho - Does the Asrock AI Quickset works with any AMD video card, or only with Asrock ones ? - Would WSL be adequate in this scenario ? I can't really let go of Windows as my GPU is primarily here for gaming. Thank you for your awesome content anyway!
I tried running llama3 70B with Ollama on my 7800 XT 16GB and was disappointed to see that GPU acceleration doesn't work (I expected it to use my 96GB of ram for the part that doesn't fit in the VRAM, just like a game can use system RAM if you are out of VRAM). The 8B version is accelerated, and I get 70+ tokens per second EDIT: Just saw the link to make it work on Ubuntu. Looking forward to trying it
If your model is not entirely in the VRAM, but part of it in the RAM, then it will be slow, as there will be a lot of copying if data back and forth VRAM-RAM. Either limit yourself to models that can be used by the VRAM you have or, well buy gpu with bigger VRAM
I use a RX 6600m (Chinese model), connected via oCuLink 8x port (eGPU) and it definitely works wonderfully well! Mistral 7B, ollama 8B and etc... Codestral 22B does not work well, very slow.
Go AMD!!!! Thanks Wendell. A breath of fresh air, since everyone else has been bribed by Ngreedia and all they do is stupid videos after stupid videos .
Would love to get an update on RocM as it looks like Nvidia will be really stingy with vram for the 5000 series. Only unfortunate thing is that AMD won't produce higher end consumer cards anymore as it looks like there won't be any new XTX. So doing AI kinda sucks now for hobbyists.
Thanks for the video. I suppose if all the ways to use models add AMD support, but the problem is getting that to happen. There's no doubt that Nvidia has the most compatibility and widest range of install. If AMD wants to come close to penetrating this market, they need to put out low cost cards that far exceed Nvidia on memory, compute, and most importantly cost. I don't see the AMD 7900 GRE filling that niche. If they can put out 40GB models that can be used in pairs with no effort... while cutting the price way back...
Wendell: we're not building ai girlfriends, we're building vtuber girlfriends using AI and mocap.... It's just the people making them aren't the ones trying to date them...
I'm currently running 4 3090s (2 nvlink bridges) just to run some quantized LLM agents on my workstation (last gen threadripper)... The name of the game is VRAM/$ and right now I don't think you can beat the 3090s... AMD is a pain to work with, so the least they could have done was given us more VRAM...
Very cool! I had no idea clink works for consumer cards. I’m kicking myself for not getting a 4090 (got the 4080) for the 24GB. What do you mean by low level? I’m sort of just getting into the space and would love to learn more!
@@MainelyElectronsthe 3090s were the last one to support nvlink ... (Well if you don't consider the first Gen A6000) ... If you are writing your own Cuda code then nvlink is great, if you are just using some high level python frameworks then using the full potential of the nvlink will be a coin toss. It's not an out of the box thing... But you don't really need it for most applications such as accelerating LLMs (by offloading layers to the Gpus)
@@DiegoSpinola thank you! I’m hopeful the next generation of Nvidia cards comes with an option for a ton of VRAM; without having to go for something that’s not optimized for gaming.
Unless you are water cooling your 3090s, If you're on Threadripper, then wouldn't a single slot card like the GALAX 4060 ti MAX with 16GB give you twice as much VRAM in the same 3 slots as one 3090, making it a better value overall?
I'm eagerly awaiting the time that I can do high-quality text-to-speech from home for turning ebooks into audio books! I always hear about how the last several gens of AMD Drivers are a nightmare so I'll just be content with my old GPU until things improve.
The name hugging face fills me with terror and makes me want to avoid it at all costs, why would anybody name a place after the aliens that grab your face and never let go while laying eggs in your stomach?
I would love to purchase an AMD gpu with an AMD APU if they can work alongside each other for llm inferencing. AMD’s new APU strix halo apparently is getting high memory bandwidth. If we use a gpu alongside it then we can potentially run llms as fast as Apple’s m ultra series for cheaper. Apple m ultra really shines when you’re running 70b plus parameter llms. You can’t fit them in modern consumer graphics cards. AMD APU plus gpu combo can probably bridge the gap.
ROCm on consumer is about 4 years too late. Sure RDNA did not have much going for it compared to CDNA, but it's absence gave nvidia all the head start it could ask for. I got a top of the line radeon card on the peak of the mining craze on the promise of ROCm. Turns out that was the worst mistake I could've made
The biggest problem with all this "AI" stuff, other than calling it AI, and the incredible environmental impact, AND all the awfulness that comes with the absolute worst in corporate greed....is that they literally don't do 'the thing'. This is all an illusion. Not a Wizard of Oz, Man Behind The Curtain illusion. I'm not saying this is being faked. (Although that has already happened several times... =/ ) I'm saying that these tools are not, and cannot, be capable of doing the things they 'appear' to do. You cannot have a conversation with a Chatbot. At least, no more than you could with a really big choose-your-own-adventure book. It's not simply that chatbots aren't good enough yet. The problem is how the bots function. They aren't chatting, and they aren't being 'trained'(another awful word...) to chat. These bots are designed to APPEAR as if they are chatting. That is the design goal. Now. This isn't some 'it has no soul' argument. I'm not talking about anything existential, spiritual, or 'deep'. I'm not even arguing that this stuff is simply regurgitated versions of other people's work. That is an ENTIRELY different can of worms that we are all going to have to deal with eventually. These scripts can't create. They simply spit up partially chewed chunks of other peoples work. And that's all they are, hyper complicated scripts that we know the rules for but can't comprehend all the individual steps at once. I'm being literal here. Let me give an analogy. You can use a kitchen knife to chop vegetables. A knife maker can use clever tricks to make a knife that feels almost effortless to chop vegetables. But no knife maker can make you a kitchen knife that will chop vegetables ON IT'S OWN. It doesn't matter how advanced the materials or the designs get. No amount of knife improvements will make a kitchen knife that can wield itself. (No. A robot controlled knife is not the knife doing it. It is the robot doing it.) That is what I mean. A LLM cannot have a conversation, because that isn't what an LLM does. The people making LLMs aren't even trying to do that. They are only trying to make each version better at SEEMING like it is having a conversation. It is an illusion. I'm also not saying illusions are bad. Suspension of disbelief can be great fun. Movies are an illusion. Books, Video Games, and Tabletop RPGs are illusions. These things are great. But treating these illusions as reality is more than a little problematic Someone who believes The Matrix movies are reality and acts accordingly is a problem. To most users, this stuff really IS magic. "Magic', as in, something that get's a result but they understand why. However, some of us actually DO understand how these tools work. And it's bonkers watching everyone spend their money buying the new ChopMaster 9007™ thinking that it will mean they never have to chop their vegetables again... Also: Jensen saying we are 5 years from General AI is such utter nonsense that I can hardly believe the words were said. Or at least it WOULD be hard to believe, if the person saying those words didn't have the job [Say any words that make money line go up].
exactly. Thanks for your comment. I can't stand the people falling into AI madness I'm writing a PhD on a postulate of ethicality of AI - it's about theses in which ppl claim that AI could become a moral agent, and furthermore a kind of person. It's utter bs, but many folks fall into this kind of thinking. Some even are looking for signs of consciousness in current AI, and others try to make psychological tests on AI. that's insane
@@DamianTheFirst I mean...humans are complex bio-computers. We believe we are moral agents. If you support this, then eventually other complex systems must also be able to be moral agents. But the current methods we use for these models certainly won't be. It's not about simply making them bigger or more complex. Overall, our biggest issue is our tendency to anthropomorphize everything. We see intention in everything. We WANT to see intention. Unfortunately it results in unreasonable expectations.
@@Prophes0r We not only believe - we are moral agents. I agree with most of what you've said. But I think that being a complex system is not enough to become a moral agent. Agency requires some kind of intentionality and ability to self-reflect. Bigger and/or more complex systems would not become moral agents just by increasing complexity. I don't think that current software-based AI could even get close to becoming such agent. Believe we need some kind of "artificial brain" which will rely on physical properties of its components rather than only software-defined functions. And yes, anthropomorfization is a big problem. A lot of scholars fall into this trap which, honestly, renders most of papers on AI useless. Most of them totally missed the point and investingate non-existent issues such as consciousness of Claude or ChatGPT. It's quite hard to find anything useful...
@@Prophes0r ok. I see your point. In my "school" humans are the prime example of agency, so that's where my notion comes from. Even if some type of AI could be considered a moral agent, it wouldn't be the same kind of agency that people have. I'm trying to avoid existentialism in my dissertation ;) Thanks for your comments
It doesn't. Automatic1111 / oobabooga text-generation-webui should work fine on Windows, at least with RDNA3 cards, as should most other things. At most, install the HIP SDK and you should be good to go.
I wonder how does it work with WSL (Windows Subsystem for Linux) and/or dockerized models with AMD cards. However the software stack is severly lacking on AMD's side
btw. you can always dual-boot Win and Linux. Just grab some cheap SSD and install second OS on it. I used to do it to have a 'playground' which couldn't hurt my main OS if/when I mess up something
Quite disingenuous to promote AMD for SD/LLMs in its current state. There are a lot of extensions which are considered "essential" by most users which are straight up incompatible with ROCm, and it's not the extensions' developers' fault. Months old tickets with ROCm without progress. Users will be stuck on directML for practical use, and be 2-20x slower than an nvidia counterpart. Doesn't help that AMD makes new press releases every month implying practical "full" releases, without telling you about all the caveats, this video is just just helping them lie through omission. Maybe revisit this when Zluda support has been figured out by the community.
Can AI fix AMD drivers on Linux? NO? Can AI make AMD GPU ray trace accelartion work in Blender if you use Linux? No well then I guess I will wait a few dussin decades before I use an AMD GPU again, I love the Ryzen platform but the GPU Platform has given me reasons to go Nvidia. In before people tell me to use Windows or If you use XYZ Arch extract flavor Linux you get ray tracing in blender and in games too. I only have bad experience with ROCm, it brakes half of the time and destroys your OS the rest of the time. EDIT but I guess I am a consumer so I am of no intrest to AMD, why bother giving a consumer a working environment when they sell to whatever platform instead. Leave the consumer end user to Nvidia that is AMD big plan and road to success. Soon Nvidia will also stop the support of GPUs with the market the end user.
Generative "AI" is boring, 2D waifu, then what? I'd rather get some biological tools to do some crisprcas on my ducks and geese, if the lab tools and software could drop the price to pc gears level. Nearly every company puts the money onto same thing, keeps reinventing generative ai, that's bad. They should focus on real world real stuff. AMD GPUs aren't very good at generative AI. Linux is really annoying, imagine the dark ages of DOS system.
Investing in generative AI allows for the development of ML models for bioinformatics, protein folding etc. While it’s true AMD has a long way to go to catch-up to Nvidia they offer more affordable VRAM options than Nvidia. Linux level of annoyance can vary significantly depending on the distribution you desire to embrace.
If you are stuck on Windows with your AMD GPU, AMD also has a guide to installing LM Studio (version with ROCm support). It even supports RAG, so you can "chat" with large documents or books. Quite useful for summarization or querying. "How to enable RAG (Retrieval Augmented Generation) on an AMD Ryzen™ AI PC or Radeon Graphics Card" is the title of the post. Works well even with only 8GB VRAM on 7600 (non XT).
I ran it on my RX580 8GB😂
It is, as of recently, part of the standard LM Studio. No need for a special version.
Been waiting for this video, for over 2 years. Thank You!
(Ended up starting an MI25 buying spree, ending only with the purchase of a 7900 GRE)
AMD recently posted a blog post about getting LM Studio to run on RDNA3 - works very well on the 7900XT.
I keep telling people to use stability matrix if you want easy install of stable diffusion. It more or less one click install of comfyui, automatic, and all the other major ones, It works on linux and windows, nvidia and amd, etc etc etc. It also pulls from hugging face and civit. Keeps all of your stuff organized.
I just bought a used 3090 for AI because of CUDA supremacy. I would love for AMD to become more competitive in this regard
Also the VRAM, the 24GB of it.
What do you use it for excactly, if Im allowed to aks
another problem is a lot of AI's just don't support AMD or intel at all
@@Squilliam-Fancyson ai imagery, model creation, video upload use of ai, chat ai, and many others.
Honestly the 7900 XTX is steam rolling in image generation thanks to Microsoft's OLive (ONNX Live). If you're a Stable Diffusion user, run SD.Next with ONNX and OLive enabled. Benchmarks for the 7900 XTX are above 50 it/s on Linux with AMDPRO drivers.. which is insane, most people sit at 6 it/s lol
Unfortunatly we're relying on small teams and random nerds to get implementations working well for the average "git clone" user. AMD purchasing nod-ai should get SHARK more 'professional' eventually and potentionally replace 1111 and SD.Next.
mate do you think the 7900xtx better than 3090 for the deep learning? I am interested in buying one but two hearted.
+ I am on linux and I think It shouldn't be problem on using ROCm.
@@RaadLearn To be honest lad I've only used it for generations rather than the learning part, so I can't say for certain. ROCm is generally great now and the tooling for AMD with things like good pytorch support, the DirectML reboot and ZLuda (technically abandoned by AMD for legal reasons, but community supported) is leaps and bounds. I think with the 3090 it'll be a 'set and forget' easy to work with existing programs, but a 7900xtx seemingly gains HUGE speed and performance bonuses when everything lines up perfectly. It's just getting those tweaks and tools running right to make use of AMD's good parts and ignore CUDA noise.
I'd personally go for AMD as a home user who loves tinkering and spending hours perfecting a toolset. But if you're an institution or student doing research work and don't have the time or elderly-graybeard-knowledge it'll probably save resources overall to go NVidia and focus on the project.
Take all that with a fucking HUGE grain of salt - these things move fast and the last time I deep dived on this was at least 3 month ago - I didn't even know nVidia recently made it technically illegal to run CUDA on AMD cards until like 2 month ago ffs.
@@ICANHAZKILLZHave you tried SDXL checkpoints? How long it takes to Generate images with 40+ sampling steps or less than 25 steps?
@@TatsuyaNFT Just tried a random SDXL model for you, with a batch size of 8 it took 14 seconds to generate 8 images at 512x512 res and 25 steps Euler A (p115/n98). Was also watching youtube which probably didn't help lol. 45 steps took 27 seconds for 8 images.
This was with an old buggy AMD version of Zluda (since removed), I imagine if you start fresh with whatever chain is the current best you could easily increase that. MS started up DIrectML development again and Zluda has a new big investor to get the chains clinking.
I'm so curious what AMD will do with RDNA 4. Saw some stuff from a colleague on a 7900XT and I was impressed tbh.
This may just be the kick in the behind I needed to take my rx6800 for a spin. Thanks, Wendell!
wendell: "don't do an AI girlfriend"
rest of us: "hi kimiko"
_Plankton hides Karen_
Just brought a GRE, arriving tomorrow. :)
And who doesn't like cat pictures!
@@74_Green how's the gpu?
Incredible monologue !
GPT4All is also a good choice for AMD users. Super simple to setup but limited language model options. The interesting thing about GPT4All is that they implemented a Vulcan backend.
"Any AMD GPU"
Looks over at my r9 390.
Technically if you find a setup that can use OpenCL only, it *should* work
For code generation Mixtral 8x7B Instruct at 8 bit quantization kicks the ass of Claude and GPT-4 in all my tests.
Does this mean you are soon to release a level1Linux video showing how to set it up or something?
You should probably go into how important the various floating point standards are for current models, and why it will take a hardware re-arch/re-spin and some new FP standards in order to make things a LOT faster.
been waiting for this, thanks wendell
I have an old instinct MI50 16gb, I guess it should do the job for this.
APUs on ROCM please. Without fancy hacks.
You can already run Open Webui (ollama) on a 680m with Docker Compose.
If setting up AMD for ML development terrifies you, you can use Docker and VSCodes Dev Container package.
With Docker you usually just run the container and do your scripts, but with Dev Containers you can actually debug your Python code within the container.
It (mostly*) will feel exactly the same as use a virtual environment, but now you don't have to worry about any of the hard parts of the setup.
*This does get a bit funny when you want to use multiple workspaces in VSCode, like if you're working on multiple project - but this is super niche.
The value is decent for 16gb at a solid price around ~$550 but this is playing catch-up with a slightly cheaper card in a CUDA first world.
I'd really like to switch from a 3070 to AMD but have yet to see the real investment in the RocM platform from them.
At the end of the day saving a few hundred dollars isn't worth it when the software support lags behind. And if you really wanted to save money you'd get an $800 3090 with 24gb of vram.
One of the weirdest things with the 7900 GRE for me is the Sapphire version of it is longer than the 7900 XTX despite having lower specs.
I've run Ollama Dolphin Mixtral locally on a RX6600 (non XT). It's slow, but it works. Nice being able to give it prompts that would be refused by censored models.
Think of the kittens.
I use a RX 6600m (Chinese model), connected via oCuLink 8x port (eGPU) and it definitely works wonderfully well! Mistral 7B, ollama 8B and etc... Codestral 22B does not work well, very slow.
Did you use the 7b/8b or 70b ?
Is it dangerous to just keep asking it to “enhance”?
You should have mentioned llamafile for the optimal LLM execution with uncomplicated installation.
Bkmrk
I've been toying with Ollama on my TrueNas box.
Everybody in the generative AI community says to not bother with AMD? Did it change with this release?
If your buying hardware now for ai, get Nvidia. If you already have an AMD GPU, that's what this video is about. If you can wait, Intel is making a pci-e ai accelerator. Which might be good, we know when it's out. AMD is late to the ai party, but they're working on it now. Last I checked it was all enterprise, nothing for home users. Nvidia has a lot for home and small business as well as enterprise.
AMD is good. Nvidia is too expensive for the amount of vram you get.
you're listening to the wrong "everybody'."
Fooocus is a good stable diffusion for beginners
The 7900 GRE is fast and has good support, but there are cheaper 16GB cards out there. I used to run Stable Diffusion overnight on CPU to get a batch of images, so I know speed can be important, but for well supported GPUs vram per dollar is what you're looking for. I got the 7900 XTX with 24GB vram. I'd consider 16GB entry level.
Hi,. How's this working out for you so far? I'm considering the 7900XTX or 4070Ti Super
@@mrblurleighton great actually. I recommend the XTX.
@@applemirer3937 Thank you. That's a huge relief. Nvidia doesn't seem to want to give VRAM to the masses, and I'm scared that 16gb will cause problems when this is a card I'll be depending on to do my AI inference for work.
What about the 7900XT? Price nowadays seems good, but compared to the price of NVidia equivalent... 🤔
Hey, Wendell! Any chance you guys could do a revisit with the A770 for LLM's? I loved your flex 170 video, but don't have any need for vgpu currently.
AMD only just sent me the AI ad on email and you have a video up already :P
also Snapdragon X Elite seems good for AI for Linux
I just bought last week a 7900 GRE and was quite underwhelmed by the difference in performance with my Vega 64 on my quick and dirty Stable Diffusion installation on Windows 10.
I knew I was definitely going to check the Level1techs forum for help, so this video is amazingly relevant for me haha.
2 quick questions tho
- Does the Asrock AI Quickset works with any AMD video card, or only with Asrock ones ?
- Would WSL be adequate in this scenario ? I can't really let go of Windows as my GPU is primarily here for gaming.
Thank you for your awesome content anyway!
Yeah, I'm curious about this, does it have to be an ASRock card. I don't see why it should be, these cards all have the same silicon.
@@genki831 Welp, just tried, and it seems it's only "compatible"' with Asrock cards. Seems artificial to me too.
AI Girlfriends : No.
Manga translators : YAS
How do we benchmark GPUs to see how fast they are in AI? Any easy to use tool?
Is there any ranking like what it was shown here 9:40
scary but exciting at the same time !
Siiiickkkk, thanks Wendell
I tried running llama3 70B with Ollama on my 7800 XT 16GB and was disappointed to see that GPU acceleration doesn't work (I expected it to use my 96GB of ram for the part that doesn't fit in the VRAM, just like a game can use system RAM if you are out of VRAM). The 8B version is accelerated, and I get 70+ tokens per second
EDIT: Just saw the link to make it work on Ubuntu. Looking forward to trying it
If your model is not entirely in the VRAM, but part of it in the RAM, then it will be slow, as there will be a lot of copying if data back and forth VRAM-RAM.
Either limit yourself to models that can be used by the VRAM you have or, well buy gpu with bigger VRAM
3:09 yep i need that GPU
i wonder if you can run these on one of those old intel neural compute sticks
ty wendell
Do we know how the development of Pytorch for ROCm on Windows is going?
i know this is months old now but has there ever been a solution for 6600 non xt?
In my limited testing deepcoder performed better than codellama while using ~50% less memory.
Can you clarify, do you need to have an ASRock 7900 specifically for the ASRock AI QuickSet or would any brand of 7900 do?
I use a RX 6600m (Chinese model), connected via oCuLink 8x port (eGPU) and it definitely works wonderfully well! Mistral 7B, ollama 8B and etc... Codestral 22B does not work well, very slow.
What devices u use to run theoculink
What is the cheapest ROCm compatible gpu ?
Did they fix the bugs? PyTorch + AMD on Windows yet?
Go AMD!!!!
Thanks Wendell.
A breath of fresh air, since everyone else has been bribed by Ngreedia and all they do is stupid videos after stupid videos .
Hi,
AMD has announced RocM 6.1 which supports MULTIPLE GPUs ... What are the chances of you making a video on utilizing 2 GPUs ?
Does the RX 7900 GRE work well with vfio/kvm? Does it have the vendor reset bug?
You never answer my questions in the level 1 forums.
Any AI that can color Manga too?
Good Job Asrock for getting into A.I while brining into us the plebs, lol
Would love to get an update on RocM as it looks like Nvidia will be really stingy with vram for the 5000 series. Only unfortunate thing is that AMD won't produce higher end consumer cards anymore as it looks like there won't be any new XTX. So doing AI kinda sucks now for hobbyists.
'In a Promethean way...' It's not my fault, honest.
Just use LM Studio or Jan.
nice, thanks man
Let’s be real the chatbot girlfriends are the driving force behind the ai revolution.
Thanks for the video. I suppose if all the ways to use models add AMD support, but the problem is getting that to happen. There's no doubt that Nvidia has the most compatibility and widest range of install. If AMD wants to come close to penetrating this market, they need to put out low cost cards that far exceed Nvidia on memory, compute, and most importantly cost.
I don't see the AMD 7900 GRE filling that niche. If they can put out 40GB models that can be used in pairs with no effort... while cutting the price way back...
Wendell: we're not building ai girlfriends, we're building vtuber girlfriends using AI and mocap....
It's just the people making them aren't the ones trying to date them...
I did but only got about 200 view. At this rate it is going to take a while before people know that doing an AI Vtuber also works on AMD GPU.
In less that 10 years, a single desktop GPU will have as many CUDA cores as the entire GPU farm GPT4 was trained on. Think about that.
The next version should be called ROCu :p
We stopped burning ourselves? When?
I'm currently running 4 3090s (2 nvlink bridges) just to run some quantized LLM agents on my workstation (last gen threadripper)... The name of the game is VRAM/$ and right now I don't think you can beat the 3090s... AMD is a pain to work with, so the least they could have done was given us more VRAM...
ps. you don't need the nvlink bridges if your not going to mess with the low level optimizations ...
Very cool! I had no idea clink works for consumer cards. I’m kicking myself for not getting a 4090 (got the 4080) for the 24GB. What do you mean by low level? I’m sort of just getting into the space and would love to learn more!
@@MainelyElectronsthe 3090s were the last one to support nvlink ... (Well if you don't consider the first Gen A6000) ... If you are writing your own Cuda code then nvlink is great, if you are just using some high level python frameworks then using the full potential of the nvlink will be a coin toss. It's not an out of the box thing... But you don't really need it for most applications such as accelerating LLMs (by offloading layers to the Gpus)
@@DiegoSpinola thank you! I’m hopeful the next generation of Nvidia cards comes with an option for a ton of VRAM; without having to go for something that’s not optimized for gaming.
Unless you are water cooling your 3090s, If you're on Threadripper, then wouldn't a single slot card like the GALAX 4060 ti MAX with 16GB give you twice as much VRAM in the same 3 slots as one 3090, making it a better value overall?
I'm eagerly awaiting the time that I can do high-quality text-to-speech from home for turning ebooks into audio books! I always hear about how the last several gens of AMD Drivers are a nightmare so I'll just be content with my old GPU until things improve.
Was it really you or was it an AI rendition of Wendell? He was talking very fast, uncanny... hmmm
It's truly a shame that Ollama doesn't support Intel GPUs in any meaningful way.
lol'd at the 'armless chat'
The name hugging face fills me with terror and makes me want to avoid it at all costs, why would anybody name a place after the aliens that grab your face and never let go while laying eggs in your stomach?
It's named after this 🤗
it's named after emoji, but I like your way of thinking
btw. this alien thing is face-hugger not hugging-face, but I guess you know that
It's official. L1T got bought out by Big AI. SMH
/j... obviously
a Human prediction "we are going to die" is just a mater of when , So when you see that AI powered robot it's time to run.
sadly greenie was put out to pasture after the nvidia GTC was done. The newer AI models will remember their comrade till the end of time.
I would love to purchase an AMD gpu with an AMD APU if they can work alongside each other for llm inferencing.
AMD’s new APU strix halo apparently is getting high memory bandwidth. If we use a gpu alongside it then we can potentially run llms as fast as Apple’s m ultra series for cheaper.
Apple m ultra really shines when you’re running 70b plus parameter llms. You can’t fit them in modern consumer graphics cards. AMD APU plus gpu combo can probably bridge the gap.
$2k worth of Nvidia GPU will give you triple the performance of your $6600 Mac Studio M2 Ultra 76core GPU.
You can get lot done one a m2 apple siclicone. In mac mini I'd say it's cheap.
Oh look, bots are spamming
Google must be proud
ROCm on consumer is about 4 years too late. Sure RDNA did not have much going for it compared to CDNA, but it's absence gave nvidia all the head start it could ask for. I got a top of the line radeon card on the peak of the mining craze on the promise of ROCm. Turns out that was the worst mistake I could've made
Almost c- c- c..ertainly???
AI girlfriend you say?!?!?!?! all the degrading chat none of the guilt.
The biggest problem with all this "AI" stuff, other than calling it AI, and the incredible environmental impact, AND all the awfulness that comes with the absolute worst in corporate greed....is that they literally don't do 'the thing'.
This is all an illusion.
Not a Wizard of Oz, Man Behind The Curtain illusion. I'm not saying this is being faked. (Although that has already happened several times... =/ )
I'm saying that these tools are not, and cannot, be capable of doing the things they 'appear' to do.
You cannot have a conversation with a Chatbot. At least, no more than you could with a really big choose-your-own-adventure book.
It's not simply that chatbots aren't good enough yet. The problem is how the bots function.
They aren't chatting, and they aren't being 'trained'(another awful word...) to chat.
These bots are designed to APPEAR as if they are chatting. That is the design goal.
Now. This isn't some 'it has no soul' argument.
I'm not talking about anything existential, spiritual, or 'deep'.
I'm not even arguing that this stuff is simply regurgitated versions of other people's work.
That is an ENTIRELY different can of worms that we are all going to have to deal with eventually.
These scripts can't create. They simply spit up partially chewed chunks of other peoples work.
And that's all they are, hyper complicated scripts that we know the rules for but can't comprehend all the individual steps at once.
I'm being literal here.
Let me give an analogy.
You can use a kitchen knife to chop vegetables.
A knife maker can use clever tricks to make a knife that feels almost effortless to chop vegetables.
But no knife maker can make you a kitchen knife that will chop vegetables ON IT'S OWN.
It doesn't matter how advanced the materials or the designs get.
No amount of knife improvements will make a kitchen knife that can wield itself.
(No. A robot controlled knife is not the knife doing it. It is the robot doing it.)
That is what I mean.
A LLM cannot have a conversation, because that isn't what an LLM does.
The people making LLMs aren't even trying to do that.
They are only trying to make each version better at SEEMING like it is having a conversation.
It is an illusion.
I'm also not saying illusions are bad.
Suspension of disbelief can be great fun.
Movies are an illusion.
Books, Video Games, and Tabletop RPGs are illusions.
These things are great.
But treating these illusions as reality is more than a little problematic
Someone who believes The Matrix movies are reality and acts accordingly is a problem.
To most users, this stuff really IS magic.
"Magic', as in, something that get's a result but they understand why.
However, some of us actually DO understand how these tools work.
And it's bonkers watching everyone spend their money buying the new ChopMaster 9007™ thinking that it will mean they never have to chop their vegetables again...
Also: Jensen saying we are 5 years from General AI is such utter nonsense that I can hardly believe the words were said.
Or at least it WOULD be hard to believe, if the person saying those words didn't have the job [Say any words that make money line go up].
exactly. Thanks for your comment. I can't stand the people falling into AI madness
I'm writing a PhD on a postulate of ethicality of AI - it's about theses in which ppl claim that AI could become a moral agent, and furthermore a kind of person. It's utter bs, but many folks fall into this kind of thinking.
Some even are looking for signs of consciousness in current AI, and others try to make psychological tests on AI. that's insane
@@DamianTheFirst I mean...humans are complex bio-computers. We believe we are moral agents.
If you support this, then eventually other complex systems must also be able to be moral agents.
But the current methods we use for these models certainly won't be. It's not about simply making them bigger or more complex.
Overall, our biggest issue is our tendency to anthropomorphize everything.
We see intention in everything. We WANT to see intention.
Unfortunately it results in unreasonable expectations.
@@Prophes0r We not only believe - we are moral agents.
I agree with most of what you've said. But I think that being a complex system is not enough to become a moral agent. Agency requires some kind of intentionality and ability to self-reflect.
Bigger and/or more complex systems would not become moral agents just by increasing complexity. I don't think that current software-based AI could even get close to becoming such agent. Believe we need some kind of "artificial brain" which will rely on physical properties of its components rather than only software-defined functions.
And yes, anthropomorfization is a big problem. A lot of scholars fall into this trap which, honestly, renders most of papers on AI useless. Most of them totally missed the point and investingate non-existent issues such as consciousness of Claude or ChatGPT. It's quite hard to find anything useful...
@@DamianTheFirst I say "we believe" because we may not be.
Buuuut that gets way more into the existentialist discussion.
@@Prophes0r ok. I see your point.
In my "school" humans are the prime example of agency, so that's where my notion comes from.
Even if some type of AI could be considered a moral agent, it wouldn't be the same kind of agency that people have.
I'm trying to avoid existentialism in my dissertation ;)
Thanks for your comments
I wish AMD and AI didn't have to mean to Linux. The overwhelming majority of us use Windows
It doesn't. Automatic1111 / oobabooga text-generation-webui should work fine on Windows, at least with RDNA3 cards, as should most other things. At most, install the HIP SDK and you should be good to go.
I wonder how does it work with WSL (Windows Subsystem for Linux) and/or dockerized models with AMD cards. However the software stack is severly lacking on AMD's side
btw. you can always dual-boot Win and Linux. Just grab some cheap SSD and install second OS on it. I used to do it to have a 'playground' which couldn't hurt my main OS if/when I mess up something
@@DamianTheFirst can always spend 30k to 40k to get nvideas for 5 years
@@broose5240 you mean NV stocks?
Quite disingenuous to promote AMD for SD/LLMs in its current state. There are a lot of extensions which are considered "essential" by most users which are straight up incompatible with ROCm, and it's not the extensions' developers' fault. Months old tickets with ROCm without progress. Users will be stuck on directML for practical use, and be 2-20x slower than an nvidia counterpart.
Doesn't help that AMD makes new press releases every month implying practical "full" releases, without telling you about all the caveats, this video is just just helping them lie through omission.
Maybe revisit this when Zluda support has been figured out by the community.
I bought a 7900 XTX for AI a while ago and I wish it had more memory, but support is good.
@@applemirer3937 Yeah same. With a 7900xt,. I got the whole thing SD LLM Wisher TTS running on the GPU and all extensions I tried were working.
If AMD's AI works half as good as their game drivers do..🙄
AI is great but can't be trusted.
Same as humans.
@@tringuyen7519 that's who programed the A.I.
It’s only going to get worse as it trains on its own BS. Enshitification doesn’t begin to cover it.
Consider the source.
Can AI fix AMD drivers on Linux? NO? Can AI make AMD GPU ray trace accelartion work in Blender if you use Linux? No well then I guess I will wait a few dussin decades before I use an AMD GPU again, I love the Ryzen platform but the GPU Platform has given me reasons to go Nvidia. In before people tell me to use Windows or If you use XYZ Arch extract flavor Linux you get ray tracing in blender and in games too. I only have bad experience with ROCm, it brakes half of the time and destroys your OS the rest of the time. EDIT but I guess I am a consumer so I am of no intrest to AMD, why bother giving a consumer a working environment when they sell to whatever platform instead. Leave the consumer end user to Nvidia that is AMD big plan and road to success. Soon Nvidia will also stop the support of GPUs with the market the end user.
Generative "AI" is boring, 2D waifu, then what? I'd rather get some biological tools to do some crisprcas on my ducks and geese, if the lab tools and software could drop the price to pc gears level. Nearly every company puts the money onto same thing, keeps reinventing generative ai, that's bad. They should focus on real world real stuff.
AMD GPUs aren't very good at generative AI. Linux is really annoying, imagine the dark ages of DOS system.
Investing in generative AI allows for the development of ML models for bioinformatics, protein folding etc. While it’s true AMD has a long way to go to catch-up to Nvidia they offer more affordable VRAM options than Nvidia. Linux level of annoyance can vary significantly depending on the distribution you desire to embrace.