March 2024 - Stable Diffusion with AMD on windows -- use zluda ;)

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 พ.ย. 2024

ความคิดเห็น • 768

  • @taffyware1059
    @taffyware1059 8 หลายเดือนก่อน +30

    Performance better, worse or equal to Linux ROCm?

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน +19

      about 20% - 25% worse than ROCm on linux I would say...but has all the normal features of automatic without any ONNX or Olive stuff that were very irritating.

    • @taffyware1059
      @taffyware1059 8 หลายเดือนก่อน +1

      @@FE-Engineer ​ Ig its better than having to the all the optimization stuff over and over again, also likely a lot less space is consumed compared to duel booting linux

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน +2

      Yes. If you hate the idea of dual booting Linux. Or have other reasons why Linux ROCm is not an option. This is a reasonable work around.

    • @ml-qq5ek
      @ml-qq5ek 8 หลายเดือนก่อน +9

      ​@@FE-EngineerI am only getting 1-2it/s on 6900xt with zluda. What is wrong

    • @HamguyBacon
      @HamguyBacon 8 หลายเดือนก่อน +2

      @@FE-Engineer I used ventoy to run linux and i don't see where people say its easier to install and use, i had a hard time trying to get SD to even run.

  • @swietypiotrprzykurwiciel6488
    @swietypiotrprzykurwiciel6488 8 หลายเดือนก่อน +20

    I just bought a new card and once again I am back to your tutorials. Your videos helped me before, your tutorials are extremely up to date and easy to follow. Thanks man, you're doing a great job here!

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน +2

      Whahoo! Glad it worked and went smoothly! :). Thanks for watching!

  • @matthew78917
    @matthew78917 3 หลายเดือนก่อน +3

    I “sidegraded” from an RTX 3070 to an RX 6800. Mainly did it because I wanted that extra VRAM and I found a really good deal. Thank you for this tutorial! Very well put together

    • @CapaUno1322
      @CapaUno1322 3 หลายเดือนก่อน

      Me too, just found a bargain rx6800, this is my best ever card and apart from the bells and whistles this card punches well above it's weight....

  • @shefu689
    @shefu689 8 หลายเดือนก่อน +5

    THANKS A LOT MATE! This is so awesome. I have played with directML and its settings before like hell. My command webui-user.bat argument lines were almost one A4 page.
    i noticed that you need to restart your PC to get new PATH directions to work on WIN11. Without restart you end up getting "failed to load zluda path automatically" and "use skip-cuda-torch-test" info. Also first install will download cublas64_12 and cusparse64_12 instead of 64_11 without using --use-zluda argument with user.bat.Idk why.
    My 6750XT results:
    1.
    1.5 SD models: txt2img 1024x1024: 3.75s/it /average and 1:05min generation time.
    SDXL models: txt2img 1024x1024: 3.50s/it average and 1:10 minutes.
    NOTE: without zluda this was impossible task because instant memory error. and SDXL models generated over 2 minutes with 512x512 resolution.
    2. Memoryusage is now calibrated. With zluda SD using only 10.2gb/12Gb memory and it will free up memory after generation. 15min 1024x1024 -> 2048 upscaling did not encounter memory error. With directML you cant use more than 1.5x upscale and controlnet. No you dont need a control net with zluda. This is awesome.
    3. ControlNet works just fine
    4. Ultimate Upscaler works normally
    5. Inpaint works normally
    AMD pro drivers are slight faster than adrealine version. There is sligh 5-15s delay with adrealine when press "generate" and no delay with ProDrivers. IDK what cause this.

    • @SanyaWoFloride-k5u
      @SanyaWoFloride-k5u 3 หลายเดือนก่อน +2

      How it worked for you.. i've got
      Cannot read C:\Program Files\AMD\ROCm\6.1\bin\/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx1031
      rx6700xt, with no working workaround on that shlt

    • @МишаЛысенко-я3ю
      @МишаЛысенко-я3ю 3 หลายเดือนก่อน

      @@SanyaWoFloride-k5u you need ROCm 5.7.1 and change files in \ROCM\5.7\

  • @FormalPluto
    @FormalPluto 8 หลายเดือนก่อน +17

    Very nice tutorial. I've moved onto the NVidia side, but your tutorials were extremely helpful with setting up SD with Olive when I was still using my RX 7800XT.
    Thank you for making it easier for AMD users stuck in windows who are curious about trying SD.

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน +2

      Thank you :)

    • @f1amezof
      @f1amezof 8 หลายเดือนก่อน +1

      Very nice, because it doesn’t work?

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน +2

      This goes back and forth. About a year ago price / performance was on the side of amd mostly but due to continued improvements now nvidia likely has an edge if you can get a good price for like a 3080 or even maybe a 4070 super.
      With AMD. Yes. Linux will give you better performance 99% of the time because full ROCm.

  • @jcdenton23
    @jcdenton23 2 หลายเดือนก่อน +1

    OMG! I can't believe this worked! I'm running this on a 7800XT with no issues.
    One thing to note though, this only worked with Python Version 3.10.6.
    And also, for anyone not following FE-Engineers file location and structure, you can run CMD from the address of the file explorer window, just navigate there and type in "cmd" in the address bar and command prompt will open at that directory, made things a bit easier for me.

  • @LeshaKhaletskiy
    @LeshaKhaletskiy 5 หลายเดือนก่อน +1

    You are the only person who have workable SD XL AMD guide , also whole other stuff like torch, torch-cu, tensor work well, and this rare

  • @jinxPad
    @jinxPad 8 หลายเดือนก่อน +9

    great stuff! Great tutorial as always, thank you.

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน +1

      Thank you so much for watching :)

  • @jk-ze2bo
    @jk-ze2bo 7 หลายเดือนก่อน

    This was a lifesaver! Fiddled 2 days to get Olive ONNX etc working at at least useable level, and after installing zluda using this tutorial (almost) all works out of box without constant tinkering.
    Inpaint sketch does not work proper (renders whole image instead mask area) but it is prob -directml fork issue

    • @FE-Engineer
      @FE-Engineer  7 หลายเดือนก่อน +2

      Overall if users don’t want to go Linux and for real rocm. And until complete rocm is on windows. I think zluda is an excellent compromise that still provides tons of functionality for folks in windows. Thanks for watching!

  • @Briannoger-j1w
    @Briannoger-j1w 29 วันที่ผ่านมา

    Thank you so much for this tutorial! Haven't even finished the entire video yet but already started generating, even without replacing the files (which I did anyways, didn't seem to affect speed). Getting around 20-25 it/s which seems great! 7900XTX sure is a beast of a card!

    • @FE-Engineer
      @FE-Engineer  24 วันที่ผ่านมา

      Yea they changed some things to make it a lot easier.

  • @bernardy91
    @bernardy91 8 หลายเดือนก่อน +1

    Finally, after days of trying, i found your video...really good explanation, and i was finally able to make it run

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน

      I’m glad it helped! :) thank you for watching!

  • @MortisDG
    @MortisDG 8 หลายเดือนก่อน +1

    I was really getting frustrated with all that shit.. Thank you so much for this video! Finally I can use SD properly again 🙏

  • @Gawdzend
    @Gawdzend 8 หลายเดือนก่อน +1

    I started with one of your other videos, but this one got me officially up and running (on a 6600XT). Much appreciated!

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน

      Glad it helped and worked without issue (hopefully). :) thank you for watching!

    • @White-yz4kw
      @White-yz4kw 8 หลายเดือนก่อน +1

      What is the generation rate of it/s with zluda? Is the generation faster than with directml? Interested to know before installing, I have a rx6600.

    • @Torva01
      @Torva01 8 หลายเดือนก่อน

      ​@@White-yz4kwsame doubt

    • @ottomanherox
      @ottomanherox 8 หลายเดือนก่อน +2

      @@Torva01 sounds like if you've ❌ on HIP SDK it's about 3 times slower than Linux ROCM, atleast according to one test with 6700 XT.
      Safe to say it'd be memory efficient regardless and I'm tempted to try on 6700 but I've to check if it's useful for something else like DLSS maybe because that speed gain is not worth it alone.

    • @ottomanherox
      @ottomanherox 8 หลายเดือนก่อน +1

      @@matthewfuller9760 I've tested it. It's about same speed as shark/vulkan but it didn't do much to help VRAM usage. Well, it consumes less than directML but falls apart when you try to upres on sdnext.

  • @MathieuCruzel
    @MathieuCruzel 8 หลายเดือนก่อน +6

    Thanks a lot for the tutorial. I could not for the life of me get it to work on Fedora and finally this works really well. I moved from a RTX 2060 to anew 7900XT recently and I was getting 1.5x 2x performance on Comfyui but with this I get at last x5 x6 speed when generating with XL Models.

    • @CapaUno1322
      @CapaUno1322 3 หลายเดือนก่อน

      Hi there, I'm looking at a rx6800 and so just to ask you're quite satisfied with the performance and capabilities of your 7900XT as opposed to the 2060? I have an rx5700 which I am really happy with though for the AI I need more Vram....

    • @MathieuCruzel
      @MathieuCruzel 3 หลายเดือนก่อน

      @@CapaUno1322 yes definitively. With the 20G of Vram I can run 7B params local AI in Vram with LM Studio and for ComfyUi it's night and day but moving from a 5700XT to a 6800XT I'm not sure the difference will be as big as the gap between a 2060 and a 7900XT. That's a 2 or 3 generation gap for me.

  • @TrackmaniaKaiser
    @TrackmaniaKaiser 5 หลายเดือนก่อน +1

    Thaaanks a lot for your video! After I spend about 24h bricking everything I finally stumbled across your channel! You helped me get my SD to run so much better than before! I'am looking forwared to your next video with some more SD otpmizations for windows users :)
    Up to that point? Is there a paypal or something where I can buy you a coffee? You safed me from insanity!

  • @OfficialGundiminator
    @OfficialGundiminator 6 หลายเดือนก่อน

    You are the best, sir. I have been struggling with getting my 7900 XTX to work with anything. Only one I got to work with Windows was Amuse, which is very lackluster, and it seems like it's dead at this point, and SD.Next with a workaround, which is not great. With the workaround it lacks the ability to run bigger batches, upscaling, inpainting, the pics look choppy, and a lot more. Not great, tbh. And with Linux, that was just a mess. Most wont open, and the few that works will only work of my cpu.
    But with your help, I can finally generate pictures with all the features.
    All hail the king!

  • @horrid8024
    @horrid8024 8 หลายเดือนก่อน +6

    OMG! Thank you so much for this one! I tried for so long to get this running... All the text tutorials were just too complicated.

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน

      You are welcome! I’m glad it helped. Thanks for watching!

  • @terrestrialman
    @terrestrialman 8 หลายเดือนก่อน +2

    thank you so much, this was actually not too bad to set up!

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน

      Yea, it is not exactly straight forward, but it is not that bad either. Thank you for watching and the kind words :)

  • @andresalcaino7570
    @andresalcaino7570 8 หลายเดือนก่อน +1

    It work using a rx 7600 xt, thanks for this amazing tutorial, the only one that really worked for me. Like and sub.

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน

      You are very welcome! Thank you for watching!

  • @Cessna-172
    @Cessna-172 8 หลายเดือนก่อน +1

    Such a tutorial has been waiting for a long time. Thank you so much for your service to the Amd community, which is so hated by the AI community

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน +2

      You are welcome. I’m glad finally on windows something with relatively decent performance that seems to not be seriously lacking in something.

  • @PSYCHOPATHiO
    @PSYCHOPATHiO 8 หลายเดือนก่อน +2

    Excuse my language... HOLY SHIT, This is good. I gave up on Windows & been on Linux for a while but now after testing ths on Windows... oooh i love u. I can finally utilize my 7900 XT to its potential. Thank you for the easy tutorial

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน

      I know right? It’s sooooo good! While it isn’t perfect. And I still want full rocm on windows. This is in my opinion a very reasonable not quite full rocm alternative finally!

    • @PSYCHOPATHiO
      @PSYCHOPATHiO 8 หลายเดือนก่อน

      Having to juggle between windows for gaming and Linux for AI was frustrating, but this just so fast, even more than when I was on Linux. Thanx for the work, as I'm sure I'm saying on behalf of the whole AMD community :)

  • @koxu857
    @koxu857 8 หลายเดือนก่อน +5

    can't even imagine how tough was that to work it out. Thanks!

  • @krizo96
    @krizo96 8 หลายเดือนก่อน +12

    You're a blessing upon this world.

  • @darthilli
    @darthilli 8 หลายเดือนก่อน +1

    Okay, I finally got it working thank you so much, you’ve earned a sub

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน +1

      Glad it’s up and running! Thank you for watching! :)

    • @darthilli
      @darthilli 8 หลายเดือนก่อน

      @@FE-Engineerkeep up the good work, so much faster now 😌

  • @RimZeime
    @RimZeime 8 หลายเดือนก่อน +2

    Got it running atlast all thanks to you!!

  • @jokinbv5715
    @jokinbv5715 8 หลายเดือนก่อน +1

    Thank you so much.
    10 images at 1024x1536 (Hires fix from 512x768) 7900XT
    With previous directml: 16min
    Now with Zluda: 5min 30s

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน

      Whoah. That’s way better! Nice!

  • @jakkard5026
    @jakkard5026 2 หลายเดือนก่อน +1

    Works beautifully, thanks man!

  • @auchucknorris
    @auchucknorris 7 หลายเดือนก่อน

    jsut got stable difusion installed, failed cuda test then you poped up, thanks heaps

  • @SvenKloevekorn
    @SvenKloevekorn 8 หลายเดือนก่อน +3

    Very nice work, thanks a lot!

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน

      You are welcome! Thanks for watching!

  • @darkenblade986
    @darkenblade986 8 หลายเดือนก่อน +2

    thanks you so much for this tutorial. this worked for me and i have an unsupported 6700xt. first time i got inpaints and sdxl working properly. you do a good job explaining things but the best is how u put the links to everything in the description. makes my life so much easier.

    • @Eminic112
      @Eminic112 8 หลายเดือนก่อน

      what's your performance like with the 6700xt im curious

    • @2ndGear
      @2ndGear 8 หลายเดือนก่อน

      My 6600 XT does 2/its it sucks. Shouldn't have cheaped out on a card lol.

    • @Jay-js6zr
      @Jay-js6zr 8 หลายเดือนก่อน +2

      I also have a 6700xt and am struggling to make it work, would you be able to share any issues you had while setting this up and how you overcame them please? :)

    • @darkenblade986
      @darkenblade986 6 หลายเดือนก่อน

      @@Eminic112 between 1 to 2 iters per sec it depends on the prompt. More tokens takes longer.

    • @darkenblade986
      @darkenblade986 6 หลายเดือนก่อน

      @@Jay-js6zr I just followed the guide. Wasn't to hard. Make sure you are following it to the letter.

  • @rtchannel8171
    @rtchannel8171 8 หลายเดือนก่อน +1

    Thank you, Work perfectly on my Rx6800 so fast. Amazing.

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน +1

      Fantastic! I’m glad to hear that. Thank you for watching :)

  • @darthilli
    @darthilli 8 หลายเดือนก่อน +9

    [WinError 126] The specified module could not be found. Error loading "C:\Users\___\ZLUDA\stable-diffusion-webui-directml\venv\lib\site-packages\torch\lib\cublas64_11.dll" or one of its dependencies. please help

    • @sujimayne
      @sujimayne 6 หลายเดือนก่อน

      Just FYI, you can use a Windows variable %userprofile% to provide an actual full ksth that can be zsed in Windows without exposing your username.

    • @silvermoonk9121
      @silvermoonk9121 6 หลายเดือนก่อน

      Make sure u copied the 2 files he mentioned and renamed them correctly.

    • @darthilli
      @darthilli 2 หลายเดือนก่อน

      @@silvermoonk9121I worked it out, all good 😊

    • @bongpng
      @bongpng หลายเดือนก่อน

      same error here, did anyone solve it?

    • @3vilful
      @3vilful หลายเดือนก่อน

      newer version of zluda has fewer files or am I missing g something?

  • @LighthouseLeads
    @LighthouseLeads 8 หลายเดือนก่อน +3

    your the best. hope your family is all good

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน +1

      Thank you so much! Family is getting there. My son has a lot of medical issues. So long road there. But thank you for asking! :)

  • @figure17tsubasayhikaru43
    @figure17tsubasayhikaru43 หลายเดือนก่อน +4

    Hi your video was really helpful some months ago, but it seems that one update changed something and now there are some errors, do you know what causes:
    "OSError: none is not a local folder and is not a valid model listed on 'huggingface models' if this is a private repository make sure to pass a token having permission to this repo either by logging or by passing 'token='
    And
    Failed to create a model quickly; will retry using slow method.
    Those are the errors I'm getting, I hope you know how can I solve them 🙏.

  • @Mike-ss1ju
    @Mike-ss1ju 8 หลายเดือนก่อน

    Thank you so much for this. 7900xtx is finally worth it. I had to disable intigrated graphics in bios to get this to work. Excellent instructional video. This shit is crazy.

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน +3

      Ah yes. You could likely set it in windows variables I think it is hip gfx visible devices and then set it to 1 but it works disabling bios as well.

  • @lifekraft
    @lifekraft 6 หลายเดือนก่อน +1

    Ty so much for putting time and effort to help random people figure these things. Almost every single one of your recent video helped me navigate this new world of technology and i wouldnt even be able to try it without you. Ty infinitly

    • @FE-Engineer
      @FE-Engineer  6 หลายเดือนก่อน +1

      You are very welcome! I am glad they helped! Thank you for watching!

  • @danielitsfine9818
    @danielitsfine9818 7 หลายเดือนก่อน

    Thank you for this. Using onnx and olive was kind of great, getting faster it/s but not being able to use loras and converting models made it not that enjoyable, but it was still good to learn and practice with.

  • @CapaUno1322
    @CapaUno1322 3 หลายเดือนก่อน +2

    I just got a bargain rx6800 as I heard that you can do the AI stuff without mortguaging your house to Nvidia, and rx6800 is only 20% slower than an rtx3090 and a new one is half the price of a used 3090 so eh, so here I am trying to get it to work....thanks for your videos....good work! ;D

    • @koltendavis969
      @koltendavis969 3 หลายเดือนก่อน +1

      Did you get this to work with the latest SD Direct ML? This tutorial as is is too old and I am getting errors.

  •  8 หลายเดือนก่อน +2

    Gracias , hasta ahora encuentro un tutorial funcional, funcionando con una RX6650XT . Saludos en español comprendo el ingles pero no tengo buena dicción. Gracias

  • @afilthyweeb8684
    @afilthyweeb8684 6 หลายเดือนก่อน

    Damn you were not lying about that first run. I ended up at nearly 30 minutes

  • @konstabelpiksel182
    @konstabelpiksel182 8 หลายเดือนก่อน +2

    the last time i followed your comfyui + windows with directml guide, it worked like a charm for my rx6600 for sd15. wondered if this is any faster. got myself a 4070s now tho 😁

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน

      I believe this should be a decent bit faster than just directml -- if I am remembering correctly, this might be about double the performance of directml alone.

  • @jimmyjupanu
    @jimmyjupanu 8 หลายเดือนก่อน +5

    How to uninstall torch-2.2.0+cu121 and install torch-2.2.0+cu112 , i think that is my problem because when i run sd i run with cpu

  • @Whyidk
    @Whyidk 8 หลายเดือนก่อน +1

    this video is a blessing thank you!!

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน

      You are welcome :) thanks for watching!

  • @jovabre
    @jovabre 8 หลายเดือนก่อน +2

    Excellent work. Thanks!!!

  • @DanDanceMotion
    @DanDanceMotion 8 หลายเดือนก่อน

    There were a lot of mess errors, but I finally succeeded
    Thank you!!

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน

      Yea. It’s kind of crazy how many things say error and don’t matter. But it only takes one to wreck everything.

  • @ИльяАникин
    @ИльяАникин 19 วันที่ผ่านมา

    you are the best, man. still works.

  • @corneduplessis6337
    @corneduplessis6337 8 หลายเดือนก่อน +3

    I appreciate your content. Its so frustrating that it cant just work for AMD on windows like it does for Nvidia cards. Im hoping that'll change in the near future but for now I use my 3070 for SD and my 7800XT for gaming and I'm good with that

    • @featy2671
      @featy2671 8 หลายเดือนก่อน

      do u know how much it/s i should get with a rx 7800xt if i dont all right?

  • @Sbill.
    @Sbill. 5 หลายเดือนก่อน

    Man, this is mind boggling. I've been running SD for over a year now with a 6700XT, and I've been kicking myself for picking AMD over NVIDIA on my last upgrade. This is a game changer. Even getting something like ~3.00it/s is so much faster than I was getting before. And I'm getting hi-res fix running, which I could barely do before. This is awesome!

    • @sei_asagiri
      @sei_asagiri 5 หลายเดือนก่อน

      how did you get it working on a 6700xt? HIP SDK is not compatible with the 6700xt (according to amd) and i get an error 215 every time i try to install it. Are you using CPU only or something else?

    • @Sbill.
      @Sbill. 5 หลายเดือนก่อน

      @@sei_asagiri I was able to get the SDK installation to complete. Then I replaced the library files with the alternate library files provided at the link at the bottom of the video description. If you're getting an error when installing the SDK, I'm not sure what the cause would be.

    • @sei_asagiri
      @sei_asagiri 5 หลายเดือนก่อน

      @@Sbill. I'm going to purchase a nvidia gpu to replace my amd gpu instead. amd feels like its only exclusively designed for linux people while nvidia is exclusively designed for windows people.

  • @TheSnow.
    @TheSnow. 6 หลายเดือนก่อน

    as a 7900 xtx owner i was getting so mad that i couldn't do any proper AI generation, bless you for your tutorials man. You are amazing, the true hero of AMD.
    but you should consider telling people about Compatibility with other models on the beginning of the video to be honest.

    • @FE-Engineer
      @FE-Engineer  6 หลายเดือนก่อน

      That’s fair. I will try to include something at the beginning about this.

  • @KapitanAI.
    @KapitanAI. 7 หลายเดือนก่อน

    you are a legend

  • @fmenguy
    @fmenguy 8 หลายเดือนก่อน +1

    Thanks for your tutorials, they are really well explained.
    For others like me who have an old config:
    I tried, even though I knew very well that my gpu wasn't on the list. If you get this message: "rocBLAS error: Cannot read C:\Program Files\AMD\ROCm\5.7\bin\/rocblas/library/TensileLibrary.dat: No such file or directory for GPU" it's dead!

    • @tolly_HD
      @tolly_HD 8 หลายเดือนก่อน

      What exactly do you mean with its dead? I also get this error even tho I have an RX 7900 xtx which is most definitely completely supported

    • @daveroff4389
      @daveroff4389 4 หลายเดือนก่อน

      I knew my RX580 wasn't anywhere on the list, but it's 8GB VRAM, so I tried it anyway, and it works! Had to replace those library files (third option), put in a couple of ARGS in user.bat (--use-zluda and --no-half), but that got it working. Only issue is how long the image generation takes, which is like 10-15 minutes. I know it's running on the GPU instead of the CPU, because I can hear the GPU's fans working harder, but is there a good way to speed it up, without breaking it?

  • @zygimantastauras
    @zygimantastauras 8 หลายเดือนก่อน +1

    Thank you very much, it generates pictures on AMD 6800 with around 5it/s

    • @jony_tough
      @jony_tough 2 หลายเดือนก่อน

      How is rocm compared to SD Amd fork, that's been around? Sorry if mybquestion is incompetent.

  • @udinmoklet
    @udinmoklet 8 หลายเดือนก่อน

    Thank you so much bro, it's working on RX 6700 XT!
    took 23 mins+ on first generation

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน

      You are very welcome! Thanks for watching :)

    • @joris2032
      @joris2032 8 หลายเดือนก่อน

      very nicee! can it generate fast now?

    • @udinmoklet
      @udinmoklet 8 หลายเดือนก่อน

      @@joris2032 well kinda fast, under 15 seconds maybe? depends on the resolution

    • @joris2032
      @joris2032 8 หลายเดือนก่อน

      @@udinmoklet sound okeay! I am trying to install it for my 6700xt aswell but de hip sdk isn't working for my card, im now trying an other version. 5.5.1

    • @udinmoklet
      @udinmoklet 8 หลายเดือนก่อน

      @@joris2032 there's extra steps that you have to do, read the documentation

  • @dididipradoul9132
    @dididipradoul9132 2 หลายเดือนก่อน +1

    works perfectly on 6800xt thx

  • @user-yingshubo
    @user-yingshubo 8 หลายเดือนก่อน

    我一直用directml,看这个真的是太棒了,非常感谢作者,我竟然配置成功了!!!

  • @crocknroll
    @crocknroll 5 หลายเดือนก่อน

    this tutorial is awasome, finals the 7900xtx is usable in a1111, haleluia

  • @browse7288
    @browse7288 7 หลายเดือนก่อน

    Holy shit it actually worked., big thanks man!

    • @FE-Engineer
      @FE-Engineer  7 หลายเดือนก่อน

      😂😂 you are welcome. I’m glad it worked :). Thank you for watching!

  • @tushkan4ik111
    @tushkan4ik111 8 หลายเดือนก่อน +1

    It worked! Thanks!

  • @f1amezof
    @f1amezof 8 หลายเดือนก่อน +1

    RX 7900 XTX
    I followed step by step, but getting this error:
    “rocBLAS error: Cannot read C:\Program Files\AMD\ROCm\5.7\bin\/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx1036”

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน +1

      …its seeing your integrated GPU…
      Either disable it. Or put in hip visible devices = 1

    • @f1amezof
      @f1amezof 8 หลายเดือนก่อน +1

      @@FE-Engineer Hah, I guessed it before seing the actual answer (gfx1036 is not 7000 series), and it works now. But thank you anyway :)

    • @nenadm5747
      @nenadm5747 8 หลายเดือนก่อน

      ​@@FE-EngineerWhere to put that?

  • @theknightowl2137
    @theknightowl2137 2 หลายเดือนก่อน +2

    Any ideas how to fix the "Failed to create model quickly; will retry using slow method" ?

  • @bigdeutsch5588
    @bigdeutsch5588 7 หลายเดือนก่อน

    Finally one that worked. My iterations/ seconds increased about 500% in speed. Thank you!! I do have a question, does soft inpainting work with this implementation of SD? I have not had success running soft inpainting. Thanks

  • @victorivanov5667
    @victorivanov5667 8 หลายเดือนก่อน

    Hey, thanks for the ongoing amazing videos, worked like a charm the first time, but after the 2nd try I get the skip torch cuda error ; adding the --skip-torch-cuda only results in an error several people in the comments are expieriencing.
    EDIT: Found the solution, had to open cmd in the zluda dir then navigate to the folder with the webui.bat and start it like in the video!

    • @tiago7063
      @tiago7063 8 หลายเดือนก่อน

      For me was that i didin't started zluda.exe or didn't open amd as admin, idk what solved

  • @alanreynolds4262
    @alanreynolds4262 6 หลายเดือนก่อน

    Thank you so much for this video. I was pulling out my hair trying to get this to work. Went through so many guides, but your worked!

  • @DrivEDrivinginEurope
    @DrivEDrivinginEurope 8 หลายเดือนก่อน +3

    hi, I have this error after launching webui.bat to install everything:
    rocBLAS error: Cannot read C:\Program Files\AMD\ROCm\5.7\bin\/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx1036
    rocBLAS error: Could not initialize Tensile host:
    regex_error(error_backref): The expression contained an invalid back reference.
    Press any key to continue . . .
    Any idea what to do? Thanks for your help

    • @banned-user
      @banned-user 8 หลายเดือนก่อน

      same error

    • @banned-user
      @banned-user 8 หลายเดือนก่อน +2

      hey I just fixed it. disable your integrated gpu in device manager and wait a while as it loads and eventually downloads

    • @DrivEDrivinginEurope
      @DrivEDrivinginEurope 8 หลายเดือนก่อน

      @@banned-user thank you, I will try it later. I'm not too sure though how to disable the integrated graphics

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน +1

      Can do it from bios for one.
      But you can also set it as an export variable for being used. By rocm and tell it to ignore the igpu

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน

      I can tell you something is wrong. See how slashes go from back slashes to forward slashes? And at one spot there is a backslash next to a forward slash? Look at your env variables and check to see if something is weird.

  • @taylormurphy2551
    @taylormurphy2551 5 หลายเดือนก่อน +1

    can you please provide exact version numbers for both zluda and stable-diffusion-webui-directml? Newer versions of both have been released and I'm getting errors when I try to run webui.bat at the end of the installation process. I assume this is because I'm using incompatible versions of different packages? Thank you!

  • @fabear4022
    @fabear4022 8 หลายเดือนก่อน

    Noice, works. The only thing different I did from this video is downloaded the latest version of zluda. It's slow though on RX 6700 XT 12GB. I guess my card isn't as good as I thought it was. At least it freaking works.

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน

      I did change to the latest version. Overall I did not honestly see any noticeable difference. But for some it might provide a more noticeable change? Or perhaps it supports more cuda functions?

    • @fabear4022
      @fabear4022 8 หลายเดือนก่อน

      @@FE-Engineer Yes, it's about the functions. Everything I would like appears to work, as previously it would just break. And there definitely is a performance increase.

  • @MegaGranj
    @MegaGranj 8 หลายเดือนก่อน +2

    Great tutorial!
    P.S For my 7900XTX perfect argumatent for SDXL, with minimum crashes(one out of ~500 generations) for 1024x1024 is:
    set COMMANDLINE_ARGS=--use-zluda --disable-nan-check --no-half-vae
    set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:512

    • @badis103
      @badis103 7 หลายเดือนก่อน

      👍👏

  • @ratchetI2606
    @ratchetI2606 5 หลายเดือนก่อน +1

    Hey, I'm on the step where you type in webui.bat. When I type it in it says 'webui.bat' is not recognized as an internal or external command,
    operable program or batch file.

  • @KalidrethBaelric-o1c
    @KalidrethBaelric-o1c 3 หลายเดือนก่อน

    In case this helps anyone else... I ended up having to redo my installation. The second time I installed, I copied in AlbedoBase XL and used it for the first inference. I worked immediately without the 30 minutes of doing nothing that I got with the default model. Anyway, good luck out there everyone :)

  • @finn9552
    @finn9552 6 หลายเดือนก่อน

    Thank you, easy and good guide

  • @i_Max_i
    @i_Max_i 8 หลายเดือนก่อน +1

    Time to install SD again and try it with my 5700XT :D

    • @i_Max_i
      @i_Max_i 8 หลายเดือนก่อน

      Aaaand no, need RX6XXX, linux can override gfx version, but in windows i didn't found how to emulate navi2((

    • @PumpedWalt
      @PumpedWalt 5 หลายเดือนก่อน

      any luck? I got a 5700xt too

  • @randomrandom7724
    @randomrandom7724 5 หลายเดือนก่อน

    Best tutorial, this worked for me. Too bad the rx6800 doesn't have the "ai matrix" improvements the rdna3 have, so for
    that same test prompt I only got around 2.6it/sec...
    Also... it is just an impression or it is more vram--hungry than running on nvidia hardware?

  • @MadMike626
    @MadMike626 8 หลายเดือนก่อน +1

    Hi, thanks for the tutorial! I did everything as you said but I'm getting an error "launch.py: error: unrecognized arguments: --use-zluda". My GPU is RX 7800 XT

    • @kobusdowney5291
      @kobusdowney5291 8 หลายเดือนก่อน

      Did you add the correct path?

    • @MadMike626
      @MadMike626 8 หลายเดือนก่อน

      @@kobusdowney5291Yes. BTW I installed SD.Next and ZLUDA works fine, but in A1111 it doesn't for some reason.

  • @sturmritter1
    @sturmritter1 6 หลายเดือนก่อน +1

    What version of PyTorch are you using? I saw 2.2.0 on the screen in passing, but is +cu also included? The reason I'm asking is that I'm getting SD to run fine, with gpu recognized, but when I attempt to load a model I get an error:
    20:14:51-079163 ERROR Diffusers failed loading:
    model=D:\stablediffusion\SDNext\automatic\models\Stable-diffusion\dreamshaper_8.safetensors
    pipeline=Autodetect/NoneType Building PyTorch extensions using ROCm and Windows is not
    supported.
    20:14:51-083150 ERROR loading
    model=D:\stablediffusion\SDNext\automatic\models\Stable-diffusion\dreamshaper_8.safetensors
    pipeline=Autodetect/NoneType: OSError
    ┌───────────────────────────────────────── Traceback (most recent call last)
    I'm currently using PyTorch 2.3.0+cu118 (I'm currently using the Vladmantic folk, but this also occurs on my ishggytiger fork as well.)

  • @kingyizzus4108
    @kingyizzus4108 6 หลายเดือนก่อน +1

    Thank you very much for the detailed tutorial❤, but I have a little problem which is that the Karras type samplers do not appear. Any solution? 😢

  • @harrisonajones
    @harrisonajones 8 หลายเดือนก่อน

    Thank you soo much for this. I found it really helpful, especially considering that I am running on one of the RX 6XXX GPU's. In the end the only thing I found on stack overflow to get over the issue was to delete the venv folder and then run the webui-user.bat file. But after a reboot, it seems to be outputting black or white solid images again. Even after deleting that folder again. Could you think why this might be?

  • @amj2048
    @amj2048 8 หลายเดือนก่อน +1

    very cool, thank you

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน

      Glad you liked it, thank you for watching! :)

  • @Greyhoundsniper
    @Greyhoundsniper 26 วันที่ผ่านมา

    getting Exception Code: 0xC0000005 with a 6700xt on ROCm 6.1, any tips on what's the issue? Used the python ver you said to use and tried 3.10.11 and still lacking any changes

  • @Karambolagemusic
    @Karambolagemusic 7 หลายเดือนก่อน +2

    RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check.
    Any clues? Do I need to install another version of pytorch? If so, how? Thanks in advance!

    • @limear
      @limear 4 หลายเดือนก่อน

      Did you run "./webui.bat --use-zluda" in the terminal

  • @jcdenton23
    @jcdenton23 2 หลายเดือนก่อน

    Thanks for the video. How do you start over if you mess up the steps? Is there a way to uninstall every thing and start over?

  • @HamguyBacon
    @HamguyBacon 8 หลายเดือนก่อน

    update* I am getting 10-57s/it using rx7800xt text to image
    using Stable Cascade + Zluda with over a dozen browser tabs open i created a 3840 x 2160 image, I'm using as my wallpaper with the highest around 36s/it

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน

      It is not using your GPU.
      Did you add the -use-zluda flag?

  • @MugiwaraRuffy
    @MugiwaraRuffy 7 หลายเดือนก่อน

    Will take a look into it.

  • @jozopako
    @jozopako 6 หลายเดือนก่อน +1

    When installing with running user.bat file, it says error 1/2 no space left on device. I have 437GB free space.

  • @Klaster_1
    @Klaster_1 8 หลายเดือนก่อน +2

    Thank you for the video, took me a while to figure it out, but I finally managed to get a decent generation improvement on my setup - to about 11 it/s in SD1.5 on 7900XTX. If others read this, try out the "--use-zluda" flag in stable-diffusion-webui-directml and SD.next do the patching for you and install the correct torch version - much easier this way.

    • @Klaster_1
      @Klaster_1 8 หลายเดือนก่อน

      @@matthewfuller9760 you multiply the it/s to the iteration count. That gives 2s for 20it of SD1.5 512x512 or 12s for SDXL base at 25 its 1024x1024. More if you swap models, i.e. if you run an SDXL refiner, but AFAIK that mostly depends on your SSD speed.

    • @erwins_arm
      @erwins_arm 7 หลายเดือนก่อน +3

      how do i install the correct torch version and get it installed into the right folder? complete newbie here and having issues

  • @phelix88
    @phelix88 8 หลายเดือนก่อน

    Thanks for the video! Got it up and running with barely any issues. Only question I have is about model training. Is that feature still only an Nvidia thing? When I try to create an embedding it seems to create one but the dropdown in the training tab doesn't seem to function. I also see errors referring to embeddings in command prompt.

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน

      That is possible. Zluda seems to correctly translate a decent amount of cuda functionality correctly and accurately. But even as a translation layer it is definitely not a 1:1 map of all cuda functionality. It is more than reasonable to assume as you get into more complex cuda functionality that the translation layer may not function properly or accurately. Both would result in failures or crashes.
      You might try dreambooth. I have not tested. It might work? Likely you will run into the same or even more issues though. Hard to say without trying.

  • @TinnyFacexD
    @TinnyFacexD 4 หลายเดือนก่อน

    is there any way to do LORA training with this set up at the moment? Or is it only available for hypernetworks built in to Automatic1111?

  • @MRrDoctorWho
    @MRrDoctorWho 7 หลายเดือนก่อน +1

    Can u help???
    What is the problem? I have an RX6750 XT, installed libraries, tried different ways, the error does not go away. Either the Stable Diffusion defines the graphics card on the gfx90c architecture
    "RuntimeError: invalid argument to reset_peak_memory_stats"

    • @kingawesomezack
      @kingawesomezack 2 หลายเดือนก่อน

      getting same error - did you ever find a solution?

  • @Rich_Mr
    @Rich_Mr 8 หลายเดือนก่อน

    hey man thanks for all the help u've delivered. One thing, are you planing to use LM studio efficiently using ZLUDA to run LLMs locally?
    That would be great as when I was using linux, it worked fine but now I'm on windows and it doesn't work.

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน +1

      I was not planning on it. But you are the second person to ask. I will spend some time on it over the weekend and see if I can get it running properly. No promises, it may be using cuda functions that are not supported through zluda.

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน

      As another semi related side note. Supposedly amd has a build that is supposed to work with hip sdk I think for lm studio. I had no luck with getting that to work though. :-/

  • @cartoonworld1000
    @cartoonworld1000 8 หลายเดือนก่อน +1

    I just want to say thanks, it seems to be working on my 7900 XTX, I'm just wondering do you think we can use this in InvokeAI, I kind of like the layout of it and would love to use it on my AMD GPU. When you get the chance let me know if you think its possible.

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน +1

      I can pretty definitively say for right now...on windows...I doubt you will get it to run with zluda.
      I spent multiple hours. cudnn is heavily used in here, and while it may entirely be possible, I have not figured out a good way to disable it entirely, and get it running, it is close, I just can not entirely get cudnn disabled, and it seems to be very woven into this program overall.

    • @cartoonworld1000
      @cartoonworld1000 8 หลายเดือนก่อน

      @@FE-EngineerI guess we'll either have to wait for zluda support or full rocm support on windows, correct?

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน

      That or if the devs decide to allow it and make a flag that disables cudnn.

  • @BanditEssex
    @BanditEssex หลายเดือนก่อน

    Hi, Great guide, - When I run the Webui --use-zluda at the very last step, I get "return torch._C._cuda_memoryStats(device)" - "RuntimeError: invalid argument to memory_allocated" anyidea, it loads the Ui, but of course any attempt to run anything fails. I'm on a 7900XTX

  • @machaoverlord5925
    @machaoverlord5925 8 หลายเดือนก่อน

    where are you when I have amd -.- good job

  • @jcdenton7914
    @jcdenton7914 หลายเดือนก่อน

    I never got into SD or FLux so I'm not going to keep up with what is automatic1111 or what is needed if I want to make images, upscale the res, do SD video, and basically everything.

  • @chaz-e
    @chaz-e 6 หลายเดือนก่อน

    How fast is this compared to Olive approach?
    Zluda is not officially supported by AMD but they have partnered with Microsoft for Olive and other improvements.

  • @tippiebekfast
    @tippiebekfast 2 หลายเดือนก่อน +1

    i went from 14 seconds per iteration to 3 iterations per second on my 7800xt lol thanks

  • @luxiland6117
    @luxiland6117 3 หลายเดือนก่อน +1

    Hi thx! for the tutvideo, can u make a tut for install reforge+reactor or flux using zluda?

    • @FE-Engineer
      @FE-Engineer  3 หลายเดือนก่อน +2

      Will take a look. I have been moving across the country and dealing with some family issues but I am looking for some new things to do so I will put it on my list.

  • @Eminic112
    @Eminic112 8 หลายเดือนก่อน

    Some questions as i'm seeing your tutorial at the moment:
    Can you install the HIP drivers alongside the normal AMD drivers for windows, or do you have to choose between one or the other?
    Regarding ZLUDA, in your video you downloaded version v3.1, however the most recent version is v3.5. Is there any reason for that or does it not matter which version you download?

    • @Eminic112
      @Eminic112 8 หลายเดือนก่อน

      It seems i'm running into an error when running webui.bat. I've successfully installed python 3.10.6 (added to path), git, HIP, added ZLUDA to path, etc. But when i run webui.bat i get this:
      rocBLAS error: Cannot read C:\Program Files\AMD\ROCm\5.7\bin\/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx1031
      rocBLAS error: Could not initialize Tensile host:
      regex_error(error_backref): The expression contained an invalid back reference.
      Press any key to continue . . .

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน

      @Eminic112 at the end of the video and in the video description it talks about replacing rocblas files for some GPU’s…

    • @Eminic112
      @Eminic112 8 หลายเดือนก่อน

      @@FE-Engineer right, i didn't notice my card had one tick and one X. Thanks for that!
      Do you always have to wait so long for the first gen when starting the webui or is it only the first time you do it?
      Also, is this compatible with something like forge? Or are the libraries not compatible?
      Regardless, thanks for yet another basically groundbreaking tutorial! It's really not that nice having to dualboot into an OS you barely have any experience with just for one usecase.

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน +1

      @Eminic112 just the very first one.

    • @АндрейМухарский-п4з
      @АндрейМухарский-п4з 8 หลายเดือนก่อน

      ​@@Eminic112 запусти консоль от имени администратора

  • @nenadm5747
    @nenadm5747 8 หลายเดือนก่อน

    Thank you for your effort for us Amd people 😁
    Can I just add Zluda to my currrent A1111 installation? I use directml for months, everything works, slow but works.
    Is there a chance to break something?

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน +1

      I think if you have a version that is up to date then yes, just add the use-zluda flag

    • @kobusdowney5291
      @kobusdowney5291 8 หลายเดือนก่อน +1

      Perhaps use --reinstall as well

  • @bigdeutsch5588
    @bigdeutsch5588 7 หลายเดือนก่อน

    I'm very surface level with my understanding, does this remove the option for ONNX? I'm happy with the speeds but am curious about if its possible to optimize these models further for AMD? I've installed the ReActor extension which seems to want to call for the onnxruntime-gpu. Everything functions as it should (including ReActor) but I'm curious about if there is a way to increase speed further?

    • @FE-Engineer
      @FE-Engineer  7 หลายเดือนก่อน +1

      Onnx has significant drawbacks.
      Zluda to my knowledge likely will not apply or work for onnx format. I don’t believe that onnx is necessarily unavailable it likely will not use zluda though.
      The onnx drawbacks were that speed is a bit better. Inpainting will not work. Sdxl will not work. And you may have to convert models which can be a little time consuming and has its own issues sometimes.

  • @Rich_Mr
    @Rich_Mr 8 หลายเดือนก่อน

    BTW when is your SD next with Zluda video is dropping out? Just curious and waiting for it as I use SD for my social media.

    • @FE-Engineer
      @FE-Engineer  8 หลายเดือนก่อน +1

      Should be this weekend. Might have two. One for a semi updated guide for this one. It’s not really different just shorter since it now helps you to get the files setup properly. Probably also one on sd.next.
      And I might do one on comfyui. But that is still weird and very manual I believe. :-/

    • @Rich_Mr
      @Rich_Mr 8 หลายเดือนก่อน

      @@FE-Engineer yes personally I hate comfy UI, it's complex to work on for me.

  • @ferluisch
    @ferluisch 5 หลายเดือนก่อน

    I'd be really cool to do a benchmark between an amd card using Zluda (or ROCm) vs nvidia using cuda