Do THIS to speed up SDXL image generation by 10x+ in A1111! Must see trick for smaller VRAM GPUs!

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 ก.ย. 2024

ความคิดเห็น • 83

  • @KeyboardAlchemist
    @KeyboardAlchemist  ปีที่แล้ว +25

    UPDATE: With the release of A1111 webUI version 1.6.0, there is a new command line argument that we can use: '--medvram-sdxl' instead of '--medvram'. '--medvram-sdxl' will only enable model optimization for SDXL models. This is nice because if you wanted to generate images with SD v1.5 models and do not need to enable '--medvram', you won't need to manually edit your webui-user.bat file (having '--medvram' enabled when you don't need it tends to slow down your image generation). I hope this helps, cheers!

  • @mrBrownstoneist
    @mrBrownstoneist ปีที่แล้ว +9

    adding "--medvram" is slower in sd1.5 . Try add "--medvram-sdxl" that only enables medvram on sdxl model only.

  • @rexs2185
    @rexs2185 ปีที่แล้ว +1

    Great content! Thank you for sharing this tip!

  • @FailedMaster
    @FailedMaster ปีที่แล้ว +1

    You are amazing. Holy shit does this work good. I'm generating images in about 30 seconds now on a RTX 3070. Took me about 4 minutes before. This is definitely a game changer, thank you!
    Edit: For some reason it doesn't work sometimes. Restarting the WebUi fixes it, but if I create a few pictures in a row the "magic words" seem to be ignored. Any idea why that could be?

    • @KeyboardAlchemist
      @KeyboardAlchemist  ปีที่แล้ว

      I'm glad to hear this helped you generate images faster in SDXL! Regarding your question, I'm not sure why that would happen. Here is a random guess. It is possible that older versions of A1111 has memory leaks somewhere and you are building up some memory leaks as you generate more images because restarting the webUI fixes your issue. Are you using webUI v1.6.0? If not, it might be worth it to update the webUI version.

    • @FailedMaster
      @FailedMaster ปีที่แล้ว +1

      @@KeyboardAlchemist I am using the newest version, but maybe you're right. Could just be a bug. Well, doesn't matter that much, since restarting and generating is still fast as hell.

    • @KeyboardAlchemist
      @KeyboardAlchemist  ปีที่แล้ว

      @@FailedMaster Cool beans, happy creating!

  • @SupremacyGamesYT
    @SupremacyGamesYT 3 หลายเดือนก่อน

    so nowadays, is it normal for a 25-30sec generation SDXL image on a 3080 10gb?
    I was generating @ 720x1280
    cheers.

  • @neetcoomer
    @neetcoomer 6 หลายเดือนก่อน +1

    It works

  • @pavi013
    @pavi013 8 หลายเดือนก่อน

    I have 4gb and medvram works fine.

  • @kuvjason7236
    @kuvjason7236 10 หลายเดือนก่อน

    I got RTX 3060TI. Before I was going 30s/image. Now I am going 17s/image. Great improvment but I have seen people pump out images with 2s/image. I need this type of speed.

  • @darkjanissary5718
    @darkjanissary5718 11 หลายเดือนก่อน +1

    On my 3070 ti, using exactly same prompt and settings on A1111 1.6.0 with medvram, it takes 5 mins to complete. How can you render it in 41 seconds??? Do you use any other extension or something?

    • @KeyboardAlchemist
      @KeyboardAlchemist  11 หลายเดือนก่อน

      Hello, thanks for watching! Here are the parameters that I use in COMMANDLINE_ARGS: --medvram-sdxl --no-half-vae --xformers. Before --medvram-sdxl was available in the webUI, I used --medvram, which does the same thing.

    • @darkjanissary5718
      @darkjanissary5718 11 หลายเดือนก่อน

      @@KeyboardAlchemist Yeah I fixed it, the problem was the nvidia driver. I upgraded to the latest 545.84 which supports tensorRT and reinstalled A1111 from scratch. Now it is normal and takes 15 seconds to render 1024x1024 image on SDXL

    • @KeyboardAlchemist
      @KeyboardAlchemist  11 หลายเดือนก่อน

      @@darkjanissary5718 That's great to hear! I did not think it could be a driver issue. Just out of curiosity, what version of the driver were you using before the update?

    • @darkjanissary5718
      @darkjanissary5718 11 หลายเดือนก่อน

      @@KeyboardAlchemist it was 537

    • @prixmalcollects9332
      @prixmalcollects9332 7 หลายเดือนก่อน

      @@darkjanissary5718 hi how did you reinstall A1111? is there any quick way? I don't wanna lose my models.. (sorry very noob)

  • @ranga5823
    @ranga5823 27 วันที่ผ่านมา

    Thank you very much sir

  • @guruabyss
    @guruabyss ปีที่แล้ว +2

    Wouldn't this make image generation even faster if you did that trick with a 4090?

    • @KeyboardAlchemist
      @KeyboardAlchemist  ปีที่แล้ว +1

      If you are not using '--xformers' already, adding this command may help you increase speed. But '--medvram' or '--lowvram' are for GPUs with medium or low VRAM; adding these to a 4090 will likely slow it down. A 4090 already have enough VRAM to handle anything SDXL throws at it.

  • @hairy7653
    @hairy7653 ปีที่แล้ว +2

    My issue is not vram but ram...at 16gb loading models is slow and stalls my pc sometimes///what you got?

    • @3diva01
      @3diva01 ปีที่แล้ว +2

      Yeah, loading both the SDXL main model and then the refiner takes AGES to load in AUTOMATIC1111. After that then the first image generation ALSO TAKES AGES. But after the first image generation of the session it speeds up a lot more after that, in my experience. On my machine it takes about 15-20 minutes to load the SDXL models (main model + refiner model) and then render the first image. So I always set it up to render the first image and then go make coffee before I can get started with my image generation session.
      A few things to note in my experimenting so far: The sampling method DPM++ 2M SDE Karras seems to render images pretty fast with SDXL and creates decent looking images even at lower sample steps (14-35 steps). The "refiner" can make things look super ugly super quickly. I recommend not putting it above 3 steps, particularly if you're using a low step number with the main model.

    • @KeyboardAlchemist
      @KeyboardAlchemist  ปีที่แล้ว +1

      If you have VRAM to spare, then use '- -lowram' (no space between the dashes of course). This will load your model/checkpoint weights to VRAM instead of RAM. Hope this helps!
      Alternatively, RAM is cheaper compared to VRAM, you can just get a couple of new sticks and plug them in (this would be a better solution actually).

    • @hairy7653
      @hairy7653 ปีที่แล้ว +1

      @@KeyboardAlchemist Oh wow, im gonna try it, thanks

    • @carlinite
      @carlinite ปีที่แล้ว +2

      i couldn't load sdxl with 16gb of ram, i threw an older 16gb stick in for 24 total and now it's all smooth, loads in about 30 seconds

    • @hairy7653
      @hairy7653 ปีที่แล้ว +1

      @@carlinite same here, i upgraded to 32 and now works fine!

  • @accountgoogle-b9d
    @accountgoogle-b9d ปีที่แล้ว +1

    I just make everithing as u explain on the video but my images on sdxl takes about 8 to 10 minutes to render, idk what can i do :( i have a RTX 3070 8GB

    • @KeyboardAlchemist
      @KeyboardAlchemist  11 หลายเดือนก่อน

      There could be many different factors slowing down your image generation. Here are some ideas to try, (1) turn off your extensions and turn them back on one-by-one and test your image gen speed, as some extensions can cause problems, (2) update your Nvidia driver to the latest version or roll the driver back to an older version (try ver 536.67), (3) use an app like CPU-Z to see what else might be taking up your VRAM in the background. Best of luck!

  • @aiart7702
    @aiart7702 ปีที่แล้ว +1

    Legend!!

  • @mada_faka
    @mada_faka ปีที่แล้ว +1

    THANKS FOR THIS TUTORIAL, SUBSSS

  • @canaldetestes4517
    @canaldetestes4517 ปีที่แล้ว +1

    Hi, first thank you very much for sharing this tip with us. Question do you think that this hack can speed others rendering in 1111? I ask ask because I have Sadtalker 0.01 installed in it, and as I have a Nvidia 970 with 4gb, take too long to render the character talk, it's took almost 4 hours to render a 4 minutes character talk

    • @KeyboardAlchemist
      @KeyboardAlchemist  ปีที่แล้ว +1

      You're very welcome! Thank you for the question, unfortunately, I don't have experience with SadTalker. But just to take an educated guess, you can try using '--lowvram' and '--xformers' together to see if it will help. Best of luck!

    • @canaldetestes4517
      @canaldetestes4517 ปีที่แล้ว

      @@KeyboardAlchemist Hi, thank you for your attention and answer, let's do one thing I will do what you said and later I will come back here and write the result.

  • @sairampv1
    @sairampv1 ปีที่แล้ว +1

    i have a laptop with 3050ti with 4gb vram is it possible to produce images on my laptop?

    • @KeyboardAlchemist
      @KeyboardAlchemist  ปีที่แล้ว

      Yes, you absolutely can produce images on a 3050Ti with 4GBs of VRAM. BUT it will likely be very slow if you are going to use the SDXL model. I would recommend to do image generation using SD v1.5 fine-tuned models. Fine-tuned v1.5 models will give you great results and the generation speeds are a LOT faster than SDXL.

    • @sairampv1
      @sairampv1 ปีที่แล้ว

      @@KeyboardAlchemist i read linux based computers give higher it/sec is it a good idea to download a vm and try sd1.5 on that ?

  • @Woolfio
    @Woolfio ปีที่แล้ว

    I am using amd's 6750xt with 12gb and i cannot do 1024x1024, while others with nvidia 8gb can do 1024 and even 2x upscale. I regret getting amd's gpu so much.

    • @alt666
      @alt666 ปีที่แล้ว

      Sure you may not get all the ai goodness but hey gaming is way better on the amd side for the price

    • @Woolfio
      @Woolfio ปีที่แล้ว

      @@alt666 Why is it way better? It is just cheaper to get same performance.

  • @Samogub
    @Samogub 11 หลายเดือนก่อน +1

    ths

  • @earm5779
    @earm5779 ปีที่แล้ว +1

    what about 4 gb vram? does it work?

    • @KeyboardAlchemist
      @KeyboardAlchemist  ปีที่แล้ว

      Try using '--lowvram' and '--xfromers' together. Hope it helps!

    • @earm5779
      @earm5779 ปีที่แล้ว

      @@KeyboardAlchemist tried already but not working. If I use Fooocus it works

  • @DrAmro
    @DrAmro ปีที่แล้ว +5

    Keep up the good work, you're really good in explaining everything rather than many out there with thousands of subscribers, you are the best bro 👍

    • @KeyboardAlchemist
      @KeyboardAlchemist  ปีที่แล้ว +1

      Thank you for your kind words! I appreciated it!

    • @DrAmro
      @DrAmro ปีที่แล้ว +1

      ❤@@KeyboardAlchemist

  • @lordsirmoist1594
    @lordsirmoist1594 ปีที่แล้ว +3

    That refiner extension will be amazing, also love the generated images at the end

  • @KeyboardAlchemist
    @KeyboardAlchemist  ปีที่แล้ว +3

    Did this trick help you speed up your image generation with SDXL 1.0? I hope it did. Feel free to comment below. I would love to hear your success stories.

    • @tobinrysenga1894
      @tobinrysenga1894 ปีที่แล้ว +1

      I've never gotten xl to work. I have enough of vram but still battling random errors. May just reinstall.

    • @KeyboardAlchemist
      @KeyboardAlchemist  ปีที่แล้ว

      @@tobinrysenga1894 When all else fails, a fresh install is probably the way to go. I hope you get SDXL to work for you soon.

    • @tobinrysenga1894
      @tobinrysenga1894 ปีที่แล้ว +1

      @@KeyboardAlchemist Yay the reinstall worked - well I also noticed that I had multiple versions of python installed for some reason which could have also been hurting me. Too much playing with AI on the same computer...

    • @KeyboardAlchemist
      @KeyboardAlchemist  ปีที่แล้ว

      @@tobinrysenga1894 Haha, I'm glad to hear that it worked out! 🙂

    • @tripleheadedmonkey6613
      @tripleheadedmonkey6613 ปีที่แล้ว

      @@tobinrysenga1894 You're supposed to install the dependencies for each AI in their own virtual environments. Which keeps them contained within the install folder for the AI webUI you are using, instead of replacing the main windows files.

  • @3diva01
    @3diva01 ปีที่แล้ว +2

    Thank you SO MUCH! This helped me a LOT! It brought the image generation time down from around 5 minutes to about 2 1/2 minutes. So cut my render time in half. THANK YOU!
    I'm hoping that someone will eventually find a way to get SDXL to run as fast as SD 1.5 when it comes to image generation times. The same size image that renders in about 2.5 minutes in SDXL renders in about 45 seconds in 1.5. It's a pretty big time increase and so far the images, at least in A1111, don't seem to be as high quality as a lot of the SD 1.5 models. I'm sure the community will eventually make amazing models for SDXL, we just have to be patient. I have to remember that when SD 1.5 first launched the images it produced weren't great eather. lol
    Sorry for the rant! Thank you again for the great video! SO HELPFUL!
    Edit- It seems lowering the Steps makes a much bigger impact on render time with SDXL. I find that lowering the steps by quite a bit REALLY helps speed things up. I'm sure there's probably a loss in image quality doing that, but I'll have to do some testing with the same seed to see.

    • @KeyboardAlchemist
      @KeyboardAlchemist  ปีที่แล้ว +1

      Thank you for sharing your experience! I'm so glad this video helped you cut down your render time! Yeah, I'm sure A1111 will provide optimization fixes very soon. And like you said, it's still very early for SDXL. There are definitely going to be a LOT more fine tuned models in the near future that will make things easier. Also, thanks for sharing the tip on lowering Steps. Cheers!

  • @ehsanrt
    @ehsanrt ปีที่แล้ว +2

    hi, just here for moral support, you doing amazing

  • @MadazzaMusik
    @MadazzaMusik ปีที่แล้ว +1

    Would i get better times with this command if you had a 24gb card

    • @KeyboardAlchemist
      @KeyboardAlchemist  ปีที่แล้ว

      If you are not using '--xformers', that command may help you increase speed. But '--medvram' or '--lowvram' is just for machines with medium or low VRAM. In your case, it will probably make things slower for you.

  • @streamdungeon5166
    @streamdungeon5166 ปีที่แล้ว +1

    Tried the same with a Pinokio install of SD XL 1.0 on my laptop with 16GB DDR5 RAM 4800 and a mobile RTX 3050 Ti (4GB+8GB shared). Without your setting an image at 512x512px takes about 1min and is rendered step by step at about equal intervals. With your setting, the steps get slower and the final step freezes the entire machine almost completely and takes an extra 2-3min. Just to let you know this is not a universally good setting for low VRAM GPUs. Might be because mobile GPUs can use shared RAM anyway? Or because my laptop uses DDR5?

    • @KeyboardAlchemist
      @KeyboardAlchemist  ปีที่แล้ว

      Thank you for sharing your experience! I'm very surprised that you were able to run SDXL with just 4GBs of dedicated VRAM. Did you have to tweak any other settings? Maybe it is because your mobile GPU is making good use of the shared memory. But you make a great point in that if a rig has more than 8GBs of VRAM, putting in '--medvram' might actually be detrimental to the image generation speed, and this command is by no means a magic bullet. It's a tool to help those who has less VRAM and will need to be tested depending on the individual's PC and situation.

    • @streamdungeon5166
      @streamdungeon5166 ปีที่แล้ว +1

      @@KeyboardAlchemist I used the SDXL 1.0 install that is default for Pinokio with no changes (other than your parameter for comparison and then again without):
      set COMMANDLINE_ARGS=--no-download-sd-model --xformers --no-half-vae --api

  • @DARKNESSMANZ
    @DARKNESSMANZ ปีที่แล้ว

    It went to 1 hour .. my graphic card is 1080 32 gb ram.. vlad diffusion is much faster then SD 1.6.0. but does not support SDXL

  • @Nrek_AI
    @Nrek_AI ปีที่แล้ว +1

    Thank you so much for this dude... your content has been truly valuable for the community

  • @maxstepaniuk4355
    @maxstepaniuk4355 10 หลายเดือนก่อน +1

    SDXL should work better with --opt-sdp-attention instead of xformers. It takes ~ 11 sec on 3080ti with --opt-sdp-attention --no-half-vae only

    • @KeyboardAlchemist
      @KeyboardAlchemist  10 หลายเดือนก่อน

      Hello, thanks for the tip! The point I was making in the video is that for GPUs with less VRAM, it is more about the memory optimization with '--medvram' that helps to speed things up by keeping all the processing within your GPU. I have tested both '--xformers' and '--opt-sdp-attention' previously, and on my 8GB 3060ti, both commands generated a 1024x1024 image in about 7 minutes, but adding '--medvram-sdxl' resulted in the image generation completing in about 28 seconds for both commands. I'm guessing your 3080ti GPU has 12GBs of VRAM, so you don't have to use '--medvram-sdxl' when running SDXL, and maybe in that case '--opt-sdp-attention' edges out '--xformers', but based on what I can test, both of these cross-attention optimization commands provide similar results. I would be interested in knowing how long it takes you to generate the same image using '--xformers' versus '--opt-sdp-attention'. Thanks for watching!

    • @prixmalcollects9332
      @prixmalcollects9332 7 หลายเดือนก่อน

      hi, can you share your commandline? I don't know how to write this

    • @prixmalcollects9332
      @prixmalcollects9332 7 หลายเดือนก่อน

      I tried this and it is still slow T_T

  • @chrisfox961
    @chrisfox961 ปีที่แล้ว +1

    Thank you for these great tips!

  • @testtest-bb2dt
    @testtest-bb2dt ปีที่แล้ว +1

    Thank you so much!!

  • @akarshrao47
    @akarshrao47 ปีที่แล้ว

    Hey, it might be out of context, but clicking on 'Load Available' in extensions gives me this error: URLError:

  • @prixmalcollects9332
    @prixmalcollects9332 7 หลายเดือนก่อน

    Hi, so I somewhat fixed it, but I ran into another issue that generation with LORAS is pretty slow, and generation without it is so fast, do you have a solution to that? Thank you

  • @shaolinmonk1537
    @shaolinmonk1537 7 หลายเดือนก่อน +1

    Works awesome, thanks

    • @KeyboardAlchemist
      @KeyboardAlchemist  7 หลายเดือนก่อน

      I'm glad to hear that it helped! Cheers!

  • @meko264
    @meko264 ปีที่แล้ว

    Doesn't Stable Diffusion use the Tensor cores when in half precision mode?