March 2024 - Stable Diffusion with AMD on windows -- use zluda ;)
ฝัง
- เผยแพร่เมื่อ 2 ต.ค. 2024
- SD is so much better now using Zluda!
Here is how to run automatic1111 with zluda on windows, and get all the features you were missing before!
** Only GPU's that are fully supported or partially supported with ROCm can run this, check if yours is fully or partially supported before starting! **
check if your gpu is fully supported on windows here:
rocm.docs.amd....
Links to files and things:
Git for windows: gitforwindows....
Python: www.python.org...
Zluda: github.com/lsh...
AMD HIP SDK: rocm.docs.amd....
Add PATH for HIP SDK and wherever you copies Zluda files to
%HIP_PATH%bin
C:\path\to\zluda\folder
Start Automatic 1111 webui
webui.bat
copy zluda cublas and cusparse to
...\stable-diffusion-webui-directml\venv\Lib\site-packages\torch\lib
delete cublas64_11 and cusparse64_11
rename zluda files
cublas.dll to cublas64_11.dll
cusparse to cusparse64_11.dll
back in terminal run webui
webui.bat --use-zluda
If you have issues with cudnn
...\stable-diffusion-webui-directml\modules\shared_init.py
Add this after def initialize
torch.backends.cudnn.enabled = False
If you have a GPU that is not fully supported in hip SDK follow these instructions
github.com/vla...
Performance better, worse or equal to Linux ROCm?
about 20% - 25% worse than ROCm on linux I would say...but has all the normal features of automatic without any ONNX or Olive stuff that were very irritating.
@@FE-Engineer Ig its better than having to the all the optimization stuff over and over again, also likely a lot less space is consumed compared to duel booting linux
Yes. If you hate the idea of dual booting Linux. Or have other reasons why Linux ROCm is not an option. This is a reasonable work around.
@@FE-EngineerI am only getting 1-2it/s on 6900xt with zluda. What is wrong
@@FE-Engineer I used ventoy to run linux and i don't see where people say its easier to install and use, i had a hard time trying to get SD to even run.
How to uninstall torch-2.2.0+cu121 and install torch-2.2.0+cu112 , i think that is my problem because when i run sd i run with cpu
[WinError 126] The specified module could not be found. Error loading "C:\Users\___\ZLUDA\stable-diffusion-webui-directml\venv\lib\site-packages\torch\lib\cublas64_11.dll" or one of its dependencies. please help
Just FYI, you can use a Windows variable %userprofile% to provide an actual full ksth that can be zsed in Windows without exposing your username.
Make sure u copied the 2 files he mentioned and renamed them correctly.
@@silvermoonk9121I worked it out, all good 😊
same error here, did anyone solve it?
I just bought a new card and once again I am back to your tutorials. Your videos helped me before, your tutorials are extremely up to date and easy to follow. Thanks man, you're doing a great job here!
Whahoo! Glad it worked and went smoothly! :). Thanks for watching!
I just got a bargain rx6800 as I heard that you can do the AI stuff without mortguaging your house to Nvidia, and rx6800 is only 20% slower than an rtx3090 and a new one is half the price of a used 3090 so eh, so here I am trying to get it to work....thanks for your videos....good work! ;D
Did you get this to work with the latest SD Direct ML? This tutorial as is is too old and I am getting errors.
Very nice tutorial. I've moved onto the NVidia side, but your tutorials were extremely helpful with setting up SD with Olive when I was still using my RX 7800XT.
Thank you for making it easier for AMD users stuck in windows who are curious about trying SD.
Thank you :)
are you seeing better performance when considering price? This is skewed by location of course and the used gpu market. I guess windows is easier. But wont you get better performance with linux?
Very nice, because it doesn’t work?
This goes back and forth. About a year ago price / performance was on the side of amd mostly but due to continued improvements now nvidia likely has an edge if you can get a good price for like a 3080 or even maybe a 4070 super.
With AMD. Yes. Linux will give you better performance 99% of the time because full ROCm.
THANKS A LOT MATE! This is so awesome. I have played with directML and its settings before like hell. My command webui-user.bat argument lines were almost one A4 page.
i noticed that you need to restart your PC to get new PATH directions to work on WIN11. Without restart you end up getting "failed to load zluda path automatically" and "use skip-cuda-torch-test" info. Also first install will download cublas64_12 and cusparse64_12 instead of 64_11 without using --use-zluda argument with user.bat.Idk why.
My 6750XT results:
1.
1.5 SD models: txt2img 1024x1024: 3.75s/it /average and 1:05min generation time.
SDXL models: txt2img 1024x1024: 3.50s/it average and 1:10 minutes.
NOTE: without zluda this was impossible task because instant memory error. and SDXL models generated over 2 minutes with 512x512 resolution.
2. Memoryusage is now calibrated. With zluda SD using only 10.2gb/12Gb memory and it will free up memory after generation. 15min 1024x1024 -> 2048 upscaling did not encounter memory error. With directML you cant use more than 1.5x upscale and controlnet. No you dont need a control net with zluda. This is awesome.
3. ControlNet works just fine
4. Ultimate Upscaler works normally
5. Inpaint works normally
AMD pro drivers are slight faster than adrealine version. There is sligh 5-15s delay with adrealine when press "generate" and no delay with ProDrivers. IDK what cause this.
How it worked for you.. i've got
Cannot read C:\Program Files\AMD\ROCm\6.1\bin\/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx1031
rx6700xt, with no working workaround on that shlt
@@SanyaWoFloride-k5u you need ROCm 5.7.1 and change files in \ROCM\5.7\
i went from 14 seconds per iteration to 3 iterations per second on my 7800xt lol thanks
I “sidegraded” from an RTX 3070 to an RX 6800. Mainly did it because I wanted that extra VRAM and I found a really good deal. Thank you for this tutorial! Very well put together
Me too, just found a bargain rx6800, this is my best ever card and apart from the bells and whistles this card punches well above it's weight....
great stuff! Great tutorial as always, thank you.
Thank you so much for watching :)
You're a blessing upon this world.
hey man just installing this to test with my 6800xt, is this still usable?
Works beautifully, thanks man!
I appreciate your content. Its so frustrating that it cant just work for AMD on windows like it does for Nvidia cards. Im hoping that'll change in the near future but for now I use my 3070 for SD and my 7800XT for gaming and I'm good with that
do u know how much it/s i should get with a rx 7800xt if i dont all right?
works perfectly on 6800xt thx
using amd for stable diffusion is pain in the ass
Gracias , hasta ahora encuentro un tutorial funcional, funcionando con una RX6650XT . Saludos en español comprendo el ingles pero no tengo buena dicción. Gracias
RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check.
Any clues? Do I need to install another version of pytorch? If so, how? Thanks in advance!
Did you run "./webui.bat --use-zluda" in the terminal
Thank you very much for the detailed tutorial❤, but I have a little problem which is that the Karras type samplers do not appear. Any solution? 😢
Hi, thanks for the tutorial! I did everything as you said but I'm getting an error "launch.py: error: unrecognized arguments: --use-zluda". My GPU is RX 7800 XT
Did you add the correct path?
@@kobusdowney5291Yes. BTW I installed SD.Next and ZLUDA works fine, but in A1111 it doesn't for some reason.
What version of PyTorch are you using? I saw 2.2.0 on the screen in passing, but is +cu also included? The reason I'm asking is that I'm getting SD to run fine, with gpu recognized, but when I attempt to load a model I get an error:
20:14:51-079163 ERROR Diffusers failed loading:
model=D:\stablediffusion\SDNext\automatic\models\Stable-diffusion\dreamshaper_8.safetensors
pipeline=Autodetect/NoneType Building PyTorch extensions using ROCm and Windows is not
supported.
20:14:51-083150 ERROR loading
model=D:\stablediffusion\SDNext\automatic\models\Stable-diffusion\dreamshaper_8.safetensors
pipeline=Autodetect/NoneType: OSError
┌───────────────────────────────────────── Traceback (most recent call last)
I'm currently using PyTorch 2.3.0+cu118 (I'm currently using the Vladmantic folk, but this also occurs on my ishggytiger fork as well.)
hi, I have this error after launching webui.bat to install everything:
rocBLAS error: Cannot read C:\Program Files\AMD\ROCm\5.7\bin\/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx1036
rocBLAS error: Could not initialize Tensile host:
regex_error(error_backref): The expression contained an invalid back reference.
Press any key to continue . . .
Any idea what to do? Thanks for your help
same error
hey I just fixed it. disable your integrated gpu in device manager and wait a while as it loads and eventually downloads
@@banned-user thank you, I will try it later. I'm not too sure though how to disable the integrated graphics
Can do it from bios for one.
But you can also set it as an export variable for being used. By rocm and tell it to ignore the igpu
I can tell you something is wrong. See how slashes go from back slashes to forward slashes? And at one spot there is a backslash next to a forward slash? Look at your env variables and check to see if something is weird.
When installing with running user.bat file, it says error 1/2 no space left on device. I have 437GB free space.
RX 7900 XTX
I followed step by step, but getting this error:
“rocBLAS error: Cannot read C:\Program Files\AMD\ROCm\5.7\bin\/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx1036”
…its seeing your integrated GPU…
Either disable it. Or put in hip visible devices = 1
@@FE-Engineer Hah, I guessed it before seing the actual answer (gfx1036 is not 7000 series), and it works now. But thank you anyway :)
@@FE-EngineerWhere to put that?
OMG! I can't believe this worked! I'm running this on a 7800XT with no issues.
One thing to note though, this only worked with Python Version 3.10.6.
And also, for anyone not following FE-Engineers file location and structure, you can run CMD from the address of the file explorer window, just navigate there and type in "cmd" in the address bar and command prompt will open at that directory, made things a bit easier for me.
You are the only person who have workable SD XL AMD guide , also whole other stuff like torch, torch-cu, tensor work well, and this rare
Thank you for the video, took me a while to figure it out, but I finally managed to get a decent generation improvement on my setup - to about 11 it/s in SD1.5 on 7900XTX. If others read this, try out the "--use-zluda" flag in stable-diffusion-webui-directml and SD.next do the patching for you and install the correct torch version - much easier this way.
how does that figure translate to time? I am guessing around an image every 5 or 10 seconds at lowish resolution?
@@matthewfuller9760 you multiply the it/s to the iteration count. That gives 2s for 20it of SD1.5 512x512 or 12s for SDXL base at 25 its 1024x1024. More if you swap models, i.e. if you run an SDXL refiner, but AFAIK that mostly depends on your SSD speed.
how do i install the correct torch version and get it installed into the right folder? complete newbie here and having issues
In SDXL models, can 7900 XTX match performance of nVidia 3090 in a best case scenario? linux etc
OSError: [WinError 127] Não foi possível encontrar o procedimento especificado. Error loading "C:\Users\%userprofile%\Stable Diffusion\zluda\stable-diffusion-webui-directml\venv\lib\site-packages\torch\lib\cublas64_11.dll" or one of its dependencies.
Excuse my language... HOLY SHIT, This is good. I gave up on Windows & been on Linux for a while but now after testing ths on Windows... oooh i love u. I can finally utilize my 7900 XT to its potential. Thank you for the easy tutorial
I know right? It’s sooooo good! While it isn’t perfect. And I still want full rocm on windows. This is in my opinion a very reasonable not quite full rocm alternative finally!
Having to juggle between windows for gaming and Linux for AI was frustrating, but this just so fast, even more than when I was on Linux. Thanx for the work, as I'm sure I'm saying on behalf of the whole AMD community :)
Hey thanks for the video. Managed to use Zluda on 6900XT. However I am randomly getting this error:
"NansException: A tensor with all NaNs was produced in Unet. Use --disable-nan-check commandline argument to disable this check."
I have tried:
1) --no-half and --no-half-vae
2) --med-vram
3) Enabling "Upcast cross attention layer to float32"
4) --disable-nan-check. Ignores errors, but instead produces black images.
5) Switching between diffrent models, including SDXL.
6) Disabling GPU overclock
Does anyone have similar issues?
Have you done
Enable upcast cross attention layer to float32
And also
-no-half-vae
Together?
Hi i have an amd 6800xt which is supported on the list but when i typed in "webui" in cmd to download some stuff i got this error at the end "ImportError: DLL load failed while importing onnxruntime_pybind11_state: A dynamic link library (DLL) initialization routine failed." does anyone know what i can do to fix this?
i have this same error with 6800
OMG! Thank you so much for this one! I tried for so long to get this running... All the text tutorials were just too complicated.
You are welcome! I’m glad it helped. Thanks for watching!
Can u help???
What is the problem? I have an RX6750 XT, installed libraries, tried different ways, the error does not go away. Either the Stable Diffusion defines the graphics card on the gfx90c architecture
"RuntimeError: invalid argument to reset_peak_memory_stats"
getting same error - did you ever find a solution?
will this hip sdk fuck with my adrenalin driver for gaming ?
Any ideas how to fix the "Failed to create model quickly; will retry using slow method" ?
the last time i followed your comfyui + windows with directml guide, it worked like a charm for my rx6600 for sd15. wondered if this is any faster. got myself a 4070s now tho 😁
I believe this should be a decent bit faster than just directml -- if I am remembering correctly, this might be about double the performance of directml alone.
Thanks for the video. How do you start over if you mess up the steps? Is there a way to uninstall every thing and start over?
can u give a link to you SD folder. Doesnt work for me + im saf of this AMD nonsence
Thaaanks a lot for your video! After I spend about 24h bricking everything I finally stumbled across your channel! You helped me get my SD to run so much better than before! I'am looking forwared to your next video with some more SD otpmizations for windows users :)
Up to that point? Is there a paypal or something where I can buy you a coffee? You safed me from insanity!
your the best. hope your family is all good
Thank you so much! Family is getting there. My son has a lot of medical issues. So long road there. But thank you for asking! :)
i love how he just forgets what hes doing in the middle of the video
Would this work with ComfyUi as well?
I don't know on that one, zluda is not complicated to integrate, but it is likely not zero work either, so I have not seen whether they support it or not.
@@FE-Engineer Thanks. If it is/will be possible I hope you made a video about it :-)
I can't seem to get more than 1-2it. I have 7900XT and Ryzen 9 5900X, M2, 64GB RAM.
AMD-Software-PRO-Edition-23.Q4-Win10-Win11-For-HIP,
Python 3.10.11., Git 2.43.0 windows 1, added all the paths as instructed.
Is it possible that the difference between 7900XTX and 7900XT is 10x?
Can't be true, i've also noticed almost a 3X slowdown compared to running it in linux on my 6700 XT.
@@Eminic112 Everything is installed but i get 10x less iterations per second. I don't know what i am doing wrong.
@@bojanrajic It seems to be an issue others are having as well, me included. I honestly couldn't tell you the reason, i've tried so many things and my performance isn't anywhere near where it should be. We might just have to wait for an update.
I thought Python 3.10.6 was the newest we could use? Newer breaks Torch.
@@_TrueDesire_ I'm using 3.10.6 exactly, and i'm having the exact same issue, so i don't think that has anything to do with it.
Hi thx! for the tutvideo, can u make a tut for install reforge+reactor or flux using zluda?
Will take a look. I have been moving across the country and dealing with some family issues but I am looking for some new things to do so I will put it on my list.
Excellent work. Thanks!!!
When trying to launch webui.bat --use-zluda I get this error message:
WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.10.1-amd-10-g2872b02d
Commit hash: 2872b02d3b935665c1b52a32e8bc53b07ec5d540
Failed to load ZLUDA: Could not find module 'C:\ [...] \Stable Diffusion\Files\stable-diffusion-webui-directml\.zluda
vrtc64_112_0.dll' (or one of its dependencies). Try using the full path with constructor
any idea on how to solve this?
I just want to say thanks, it seems to be working on my 7900 XTX, I'm just wondering do you think we can use this in InvokeAI, I kind of like the layout of it and would love to use it on my AMD GPU. When you get the chance let me know if you think its possible.
I can pretty definitively say for right now...on windows...I doubt you will get it to run with zluda.
I spent multiple hours. cudnn is heavily used in here, and while it may entirely be possible, I have not figured out a good way to disable it entirely, and get it running, it is close, I just can not entirely get cudnn disabled, and it seems to be very woven into this program overall.
@@FE-EngineerI guess we'll either have to wait for zluda support or full rocm support on windows, correct?
That or if the devs decide to allow it and make a flag that disables cudnn.
I was really getting frustrated with all that shit.. Thank you so much for this video! Finally I can use SD properly again 🙏
Not working at all!
Cannot read C:\Program Files\AMD\ROCm\6.1\bin\/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx1031
Help!
Code gets updated. It’s possible it doesn’t work anymore. Although I imagine it’s less to do with it not working and more likely putting in a bad path to rocm…
The path you have shown has both forward and back slashes \ / and it should not…
@@FE-Engineer it was a compatibility problem.. Worked after compatibility checking for ZLUDA version, HIP SDK version and Torch version. Thanks!
@@SanyaWoFloride-k5u Can you please go into more detail? i am experiencing the same exact error.
OSError: [WinError 126] The specified module could not be found. Error loading "C:\Documents\Stable Diffusion\stable-diffusion-webui-directml\venv\lib\site-packages\torch\lib\cublas64_11.dll" or one of its dependencies.
I have the same error
I had the same error and managed to figure it out this
Step 1: Make sure you have 3.10.11 installed
Step 2: do the pip cache purge
Step 3: Delete the vevn folder
Step 4: Open webui and let it download and install everything.
@@ВалерийЯкимчук-р9о I had the same error and managed to figure it out this
Step 1: Make sure you have 3.10.11 installed
Step 2: do the pip cache purge
Step 3: Delete the vevn folder
Step 4: Open webui and let it download and install everything.
followed your instructions - it didn't work@@xedor993
Thanks a lot for the tutorial. I could not for the life of me get it to work on Fedora and finally this works really well. I moved from a RTX 2060 to anew 7900XT recently and I was getting 1.5x 2x performance on Comfyui but with this I get at last x5 x6 speed when generating with XL Models.
Hi there, I'm looking at a rx6800 and so just to ask you're quite satisfied with the performance and capabilities of your 7900XT as opposed to the 2060? I have an rx5700 which I am really happy with though for the AI I need more Vram....
@@CapaUno1322 yes definitively. With the 20G of Vram I can run 7B params local AI in Vram with LM Studio and for ComfyUi it's night and day but moving from a 5700XT to a 6800XT I'm not sure the difference will be as big as the gap between a 2060 and a 7900XT. That's a 2 or 3 generation gap for me.
Thank you so much.
10 images at 1024x1536 (Hires fix from 512x768) 7900XT
With previous directml: 16min
Now with Zluda: 5min 30s
Whoah. That’s way better! Nice!
if i try to open it, it just says "models failed to load"
and when i try to run it as admin, i get the Error: "C:\User\Myname\Appdata\\Local\Programs\Python\Python310\python.exe can't open file 'C:\\Windows\\System32\\launch.py' :[Errno 2] no such file or directory
what do i do guys? i tried running it that way as well:
@echo off
set PYTHON=C:\Users\Myname\AppData\Local\Programs\Python\Python310\python.exe
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS=--disable-safe-unpickle
Thank you, Work perfectly on my Rx6800 so fast. Amazing.
Fantastic! I’m glad to hear that. Thank you for watching :)
Thanks for your tutorials, they are really well explained.
For others like me who have an old config:
I tried, even though I knew very well that my gpu wasn't on the list. If you get this message: "rocBLAS error: Cannot read C:\Program Files\AMD\ROCm\5.7\bin\/rocblas/library/TensileLibrary.dat: No such file or directory for GPU" it's dead!
What exactly do you mean with its dead? I also get this error even tho I have an RX 7900 xtx which is most definitely completely supported
I knew my RX580 wasn't anywhere on the list, but it's 8GB VRAM, so I tried it anyway, and it works! Had to replace those library files (third option), put in a couple of ARGS in user.bat (--use-zluda and --no-half), but that got it working. Only issue is how long the image generation takes, which is like 10-15 minutes. I know it's running on the GPU instead of the CPU, because I can hear the GPU's fans working harder, but is there a good way to speed it up, without breaking it?
Can you use FaceFusion with AMD card?
It work using a rx 7600 xt, thanks for this amazing tutorial, the only one that really worked for me. Like and sub.
You are very welcome! Thank you for watching!
Got it running atlast all thanks to you!!
Successfully installed and started SD, but failed to load the model, my python versions is 3.10.11, my Rocm versions is 5.7.1, my graphics board is 7900XTX
hard to say, if you can start and run SD, and you did things in the order that I did, I don't know what the problem would be especially loading the model...might turn the machine off and on again, and retry?
@@FE-Engineer Thanks, it's already working and generating images successfully, I had skipped your instructions about it taking 10-20 minutes to generate the first time, so I mistakenly thought it failed!
on my side it stuck also here: 0%| | 0/20 [00:00
can you please provide exact version numbers for both zluda and stable-diffusion-webui-directml? Newer versions of both have been released and I'm getting errors when I try to run webui.bat at the end of the installation process. I assume this is because I'm using incompatible versions of different packages? Thank you!
these steps = copy zluda cublas and cusparse to
...\stable-diffusion-webui-directml\venv\Lib\site-packages\torch\lib
delete cublas64_11 and cusparse64_11
rename zluda files
cublas.dll to cublas64_11.dll
cusparse to cusparse64_11.dll
back in terminal run webui
webui.bat --use-zluda ---- no longer needed. latest lshqqytiger automatic version auto patches zluda files and renames them and runs without --use zluda arg just fine.
Oh that’s neat!
am getting this runtime error - return torch._c._cuda_memory stats runtimeerror: invalid arguement to memory_allocated. ive left it to render and "nothing is happening" as you initially said. so maybe it will work.
how do i degrade to torch 118?
Not sure. I haven’t seen folks in comments getting that. What’s your GPU?
@@FE-Engineer Hey, great video. I have 6900XT and everything works, but I'm also randomly getting "NansException: A tensor with all NaNs was produced in Unet" error. --no-half, --medvram don't seem to help.
@@FE-Engineer Im getting this too, once i finally managed to navigate through all the steps that you skipped over in the video.
Such a tutorial has been waiting for a long time. Thank you so much for your service to the Amd community, which is so hated by the AI community
You are welcome. I’m glad finally on windows something with relatively decent performance that seems to not be seriously lacking in something.
If I had to chose between ONNX and ZLUDA... ZLUDA by a landslide. Dealing with all that ONNX nonsense (or what was worse, failing) is hopefully a thing of the past!
can't even imagine how tough was that to work it out. Thanks!
Very nice work, thanks a lot!
You are welcome! Thanks for watching!
Well, i just got this to work, but it seems the original source advanced so much, that this video is kinda outdated a bit. For example, i didnt need to replace and rename files, got no errors with my 7800xt.
Speed is great, i bet its almost as fast as 3070ti i was using prior, but honestly i may be confusing things since i used that card like 8 months ago.
Anyway, my biggest problem is that i started all of this to try and use ponyXL(v6), and its actually working! The thing is i cant get it to make high quality images, even im following some advices i read in the description or got from yt vids. It is, for some reason, producing pastel-drawn-like images, even though i'm pretty sure i set clipskip correctly to 2 and set vae to recommended one. Hope u have time to help me out on this, or, if i figure this out myself, i'll try to not forget to put my solution here as another comment.
AAhhh sir , the thing u shows in the end should have shown in the start .. my GPU ( 5500xt ) was not even in the list and I done the whole process : (
Sorry about that. It was sort of mentioned in the video description. But I updated it to be a bit more clear. Again apologies. :-/
@@FE-Engineer it's ok you create very educated content .. here take a sub : )
well there! Thank you :)
Hey thanks for making these videos. I can't seem to fix this issue, any idea?
stable-diffusion-webui-directml\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
rank_zero_deprecation(
RX 570 didn't support right?
It is not on the list. So…probably not. But I think some others said they actually did get their 580 to work. I wouldn’t have high confidence in it working though unfortunately.
How fast is this compared to Olive approach?
Zluda is not officially supported by AMD but they have partnered with Microsoft for Olive and other improvements.
I've always had very long loading times, but with zluda I have ridiculously long loading times when I see the hdd constantly reading/writing, and the python system ram usage goes up to 12GB, it takes like 30s to start generating, 30s to generate an image and then 3 mins to finish the process with my pc almost completely frozen by the high load... maybe there's a command line to avoid all these loads? If I launch webui with only use zluda I get bsod, so I used some of my old directml command args. I have a radeon rx6800 with 16gB ram and a ryzen 5 2600x, before I had a nvidia 2060 which was better in automatic1111 even if it's less than half the ram and half the overall speed of the 6800.. I use this command line arguments --use-zluda --medvram --opt-sub-quad-attention --no-half --no-half-vae --disable-nan-check --theme dark --autolaunch.. any suggestions?
Hey I have a problem,
I have a RX 7900 XT and I have ran through all the steps and am using the skip torch command along with zluda but I get an error saying RuntimeError: No CUDA GPUs are available
It opens the webui but I cant generate anything because of the error.
Any help would be appreciated 🙏
Same error here please help
@@jeromeboyer3401 you have not installed ZLUDA properly
As the other user mentioned you have missed a step or something.
Didn’t install hip sdk? Didn’t get zluda setup? Didn’t copy the files? Didn’t change env? Hard to say. But you missed something.
@@jeromeboyer3401
Hey I think i figured it out. Its currently on the step that takes really long time but i finally got it to get rid of the No CUDA GPUS are avilable. I just had to delete all of the old nvidia programs I had in Control Panel since I upgraded from an old Nvidia card to a new AMD one. Thats probably why it recognized the Nvidia and tried to search for a gpu. Hope this helps.
Thank you for making these, however I’ve done all of the steps very closely and correctly. My AMD card wasn’t supported so I did that thing with the files, getting the Torch is not able to use GPU error. I added the skip-torch thing, then had an error saying no NVIDIA card found so took that out. I’m using zluda all the time, still not working, added the stuff to the env, still not working. I’ve also done the pip cache purge thing and deleted the venv file etc, no luck.
i have the same error
@@doseofjeanI’ve got it working, make sure you fully install the rocm thing and i also installed the AMD driver that is usually turned off. Also make sure zluda is actually installed and in your environment variable by going to cmd and typing zluda -help
@@Jay-js6zr how many it/s? When i installed rocm with everything enabled including the drivers I had an issue where installing the visual portion error’d out the install
I also did that thing with the rocblas files
Thank you very much, it generates pictures on AMD 6800 with around 5it/s
How is rocm compared to SD Amd fork, that's been around? Sorry if mybquestion is incompetent.
It doesn’t work for me, or rather it works, but incorrectly. Instead of the GPU, it uses the CPU and any generation lasts from 30 minutes
Something is not correct then. Check all the steps and make sure everything is right.
@@FE-Engineer I checked all the steps. The only thing that is different is that there were no cublas.dll and cusparse.dll files in the \venv\Lib\site-packages\torch\lib directory.
@@FE-Engineer A small update. A clean setup solved the problem. 46 seconds per image, which took up to 3 minutes to generate with directml. Perfect. Thanks for your video.
could you try to make AMD gpu get detected/work with Applio RVC (text-to-speech) ? mine program just says that it couldn't find nvdia gpu and starts on cpu. It used to work on AMD
At around the 8:25 mark is where you lose me. Because I get this weird clang error message when it's compiling the package and it wont launch the webui. I'm on a 7900XT, but so far this tutorial has taken me further than all the others ever have.
I am having issues to be able to run sd-webui with ZLUDA. I tried to install the same torch version but I keep having to skip CUDA test and then I get: Error loading "C:\AI\stable-diffusion-webui-directml\venv\lib\site-packages\torch\lib\cublas64_11.dll" or one of its dependencies.
Anyone could help please?
rename those files from 11 to 12. it should be cublas64_12.dll for the newer stable diffusion automatic1111
@@HankTTN thanks! I ended up running a dual boot Ununtu with rocm 😇
@@queyjo np! Nice, I’m trying to get SDNext up and working right now. How do you like the Ubuntu dual boot?
@@HankTTN pretty good! I had some issues since the only room driver to work on latest LTD Ubuntu was the rocm6.1, but it just works with pytorch rocm6.0 (until pytorch updates accordingly).
Overall better performance than directml or zluda on my Windows.
I boot into Ubuntu for image generation and keep windows for LLMs with LM studio.
@@queyjo ahhh I see. That’s a good workflow and it isolates your image generation to the Ubuntu dual boot. I might try that if I can’t get sdnext working with zluda right now! Currently generating the first image that takes 10 min
I'm having an issue during installing. I'm using Automatic1111 for DirectML, the latest version. When running the webui.bat it errors out with RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select). I'm using XFX 7900XTX and Ryzen 7950XT.
Hi!, i have a error to try generate image, " RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half", you can helpme please??
hay if anyone is running this with a any of the amd 7000 series gpus it will confuse the program. disable your integrated graphics in device manager and it will make it into stable difusion
In case this helps anyone else... I ended up having to redo my installation. The second time I installed, I copied in AlbedoBase XL and used it for the first inference. I worked immediately without the 30 minutes of doing nothing that I got with the default model. Anyway, good luck out there everyone :)
Best tutorial, this worked for me. Too bad the rx6800 doesn't have the "ai matrix" improvements the rdna3 have, so for
that same test prompt I only got around 2.6it/sec...
Also... it is just an impression or it is more vram--hungry than running on nvidia hardware?
Hey, I'm on the step where you type in webui.bat. When I type it in it says 'webui.bat' is not recognized as an internal or external command,
operable program or batch file.
Finally, after days of trying, i found your video...really good explanation, and i was finally able to make it run
I’m glad it helped! :) thank you for watching!
thank you so much, this was actually not too bad to set up!
Yea, it is not exactly straight forward, but it is not that bad either. Thank you for watching and the kind words :)
Hi. I ran your tutorial through ZLUDA 6700xt. are there any optimizing settings for SD mode of ZLUDA?
is there any way to do LORA training with this set up at the moment? Or is it only available for hypernetworks built in to Automatic1111?
What GPU do you have? I have an 6800xt and only get about 5- 6 it/s
Edit: Nevermind, you said it at the end
I got it working and ran plenty of models and then it stopped working the next day after PC sleep?? somehow cusparce deleted itself, very strange...
i have a problem here. This method works fine with me UNTILL i restart my PC. And then the good old error "torch can't use this GPU" pop up again. I need to delete all the setup and start fresh again like this in order to use SD. Any solutions ?
Hi, I did everything the same as you, but the CPU still handles the generation of images. the speed is extremely low, the GPU is not used (in my pc rx6800)
im getting the skip torch cuda test error
I did not get that error, but you can try adding the skip-torch-cuda-test flag to command line and see if that helps
@@FE-Engineer didn't helped. My error:
C:\stable-diffusion-webui-directml>webui.bat --use-zluda --skip-torch-cuda-test
venv "C:\stable-diffusion-webui-directml\venv\Scripts\Python.exe"
WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.
fatal: No names found, cannot describe anything.
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: 1.8.0-RC
Commit hash: 25a3b6cbeea8a07afd5e4594afc2f1c79f41ac1a
Traceback (most recent call last):
File "C:\stable-diffusion-webui-directml\launch.py", line 48, in
main()
File "C:\stable-diffusion-webui-directml\launch.py", line 39, in main
prepare_environment()
File "C:\stable-diffusion-webui-directml\modules\launch_utils.py", line 618, in prepare_environment
from modules.onnx_impl import initialize_olive
File "C:\stable-diffusion-webui-directml\modules\onnx_impl\__init__.py", line 4, in
import torch
File "C:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\__init__.py", line 141, in
raise err
OSError: [WinError 126] The specified module could not be found. Error loading "C:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\lib\cusolver64_11.dll" or one of its dependencies.
@@Rich_Mr same here... using a 6650xt and ROCm 5.5
I'd be really cool to do a benchmark between an amd card using Zluda (or ROCm) vs nvidia using cuda
我不会讲英文,我需要youtube的翻译字幕才能向您学习.看到您的订阅量和所做出的工作量完全不成正比.使用AMD显示卡的人群还是太少了唉😮💨.
hey, im tryin to add a faceswap extension, but both i tried arent working. Is it because its a fork or the extensions are the issue ? ((the install seems complete, but nothing show up in the UI)ReActor and FaceSwapLab)
you are a legend