NVIDIA Update Solves CUDA error (but very slow) -Train Dreambooth, SDXL LoRA with Low VRAM

How to

มุมมอง 5 360

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 5 ก.ค. 2024
#stablediffusion #a1111 #nvidia #update #cuda #cudaerror #lowvram #kohyass #LoRA #dreambooth #tensorRT
(Update: while the update is able to solve CUDA memory errors, I have seen it to be very slow with SDXL... it is not very practical to use with low VRAM... works, but slow, hopefully in the next update, we get better performance than the current one, as this is a new feature, and very likely to improve soon)
00:00:00 introduction and note about NVidia drivers
00:00:21 NVidia control panel System memory fall back feature
check if the driver update includes System memory fall back features and selecting it
00:01:07 downloading the NVidia driver manually
00:01:47 Testing SDXL training on 8GB VRAM
show case sample training in SDXL to show that now it works without CUDA errors
00:03:26 testing dream booth training on 8GB VRAM
show case that dreambooth can now work as well without CUDA errors
00:05:05 side note about TensorRT extension for stable diffusion
another update that may help double the speed of image generation but with limited usecases
Computer Specs:
Laptop: Legion 5 Pro
Processor :AMD Ryzen 7 5800H , 3201 Mhz
System RAM: 16.0 GB
Graphics GPU: NVIDIA GeForce RTX 3070 Laptop GPU 8GB
This is a new Update for NVidia CARDs which allows anyone to train SDXL LoRAs, Dreambooth with large image sizes without getting CUDA errors even with 6 or 8GB VRAM, possibly with less.
the new Feature from NVIDIA Driver update gives System memory fall back features that allows to extend existing VRMA to system memory when no more VRAM is available.
This is an amazing update that allows more people to train Dreambooth and SDXL LoRAs on thei PCs even with 8GB VRAM and less.
NVIDIA Driver update: choose based on your GPU
www.nvidia.com/Download/index...
LoRA Character Training for SD 1.5 AND SDXL in stable diffusion
• SDXL 1.0 vs SD 1.5 Cha...
Style LoRA training guide in stable diffusion
• Style LoRA Training gu...
A1111
github.com/AUTOMATIC1111/stab...
Kohya Training
github.com/bmaltais/kohya_ss
side update: TensorRT, see how to install for A1111 to double generation speed in some cases
nvidia.custhelp.com/app/answe...
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 31

@cebesten 4 หลายเดือนก่อน
I must say that after 3 weeks surfing yt and trying all advices, lowering settings, after your settings on the nvidia controller, is working dreambooth for the training with my rtx 4060! thanks!
@AI-HowTo 4 หลายเดือนก่อน ⁺¹
You are welcome, slow, but might be uesful in some cases...hopefully soon we get more optimized training methods that require less memory and that are faster.
@hatuey6326 8 หลายเดือนก่อน
excellent tuto !!! thanks !!
@AI-HowTo 8 หลายเดือนก่อน
You're welcome!
@brottop2089 4 หลายเดือนก่อน
Thanks!
@stefanocesaretti8962 7 หลายเดือนก่อน
Will this let me use sdxl models as well other than just training? Will I be able to generate images with sdxl? Thanks im advance :)
(didn't see the whole video yet sadly, so please excuse me if you already said this in your video)
@AI-HowTo 7 หลายเดือนก่อน
yes, but speed will most likely be disappointing, if you are unable to run SDXL with --no-half-vae --medvram-sdxl options already, then this option should in theory allow you to run it... in general, if speed is not good enough, then using SDXL is not practical at all.
@stefanocesaretti8962 7 หลายเดือนก่อน
@@AI-HowTo I'll give it a shot, just in case, are there any methods to speed it up?
@AI-HowTo 7 หลายเดือนก่อน
not sure, but i think in future driver updates, they might make it faster, since this is new.
@HestoySeghuro 8 หลายเดือนก่อน ⁺¹
Question here. And finetunning for VRAM 24 GB? Did this do the same offloading VRAM?
@AI-HowTo 8 หลายเดือนก่อน
Yes, if enabled, when VRAM is not enough, it will offload the work to the RAM despite what kind of work you are doing with Stable diffusion, finetuning, training, generation... it is really a great update from NVidia.
@AI-HowTo 8 หลายเดือนก่อน
it only offloads what it needs to the RAM only, so VRAM will still get like 100% utilization, but things will get slower since RAM is alot slower, but still, very useful.
@alexlux147 8 หลายเดือนก่อน
Can this works with other AI model? I have a 3060 laptop with 6Gb and i'm interest in NeRF and Gaussian Splatting, small vram is a big problem for me.
@AI-HowTo 8 หลายเดือนก่อน
I think NVidia has primarly updated the driver for stable diffusion, but in general, this should work for any other model that uses CUDA and gets CUDA memory errors, based on the feature names and description which is just (CUDA System memory fall back policy) ... I have not tested it however outside stable diffusion related models.
@Dante02d12 8 หลายเดือนก่อน ⁺¹
How is performance with this trick with 6GB VRAM and 16GB RAM? Personally, I get OOM trying to load a SDXL model.
@AI-HowTo 8 หลายเดือนก่อน ⁺²
I got acceptable speed with dreambooth training with 1024x1024 image sizes, but not with SDXL, anything related to SDXL seems too slow to be any practical, hopefully in their new driver update things get better, because this is they first update that allows memory offloading, so i expect it gets better in future updates and faster.
@AI-HowTo 8 หลายเดือนก่อน
I will update the video title and thumbnail to include (Very slow), to avoid deleting the video, and add a note inside description regarding that, because I was expecting performance to be better than this too.
@marcozisa6317 8 หลายเดือนก่อน
Thanks for the update but unfortunately it didn't work for me. Tested on a 40GB ram pc with NVIDIA RTX 2060 SUPER, 8GB of VRAM, the NVDIA driver was up to date. It get stuck right before the training start with the progress bar on screen set to 0% then it raises CUDA: OutofMemoryError
@AI-HowTo 8 หลายเดือนก่อน ⁺²
I see, possibly the driver is new and still has some issues, I set CUDA Sysmem fall back policy to (prefer system fall back) and the CUDA errors are no long, I only have 16GB Ram, and can see on performance that it is now using 80% of my RAM, even using some part of my internal Sharred Video CARD RAM too, but training was slow, very slow compared to Runpod, it was interesting that they worked it out, but the speed makes it impractical for me to use it effectively.... most likley they will do new updates soon, especially that this is a new feature.
@user-kk2ve1un4u 7 หลายเดือนก่อน ⁺¹
Could you please make for us a Tutorial regarding Dreambooth Training ?.
@AI-HowTo 7 หลายเดือนก่อน ⁺¹
unfortunately it takes a long time to do a Dreambooth tutorial, which I may not find at the time being, will do so however if I could in the near future.
@generalawareness101 8 หลายเดือนก่อน
I did this on my 4090 because I was stuck on 531.79 otherwise the moment I came close to the 24GB of my vram I fell to a crawl and most trainings I am at 23.4-23.6GB so oof. What I noticed is that it has a global setting. I want access to that setting and I bet I know where it lives, but to get at it is within another more master level program. I am being vague here on purpose because I don't want someone to read it and poof nothing works when they go in and start messing with stuff.
I have not tried training yet but I am shortly going to which will be about 23.5GB normally (DB).
@generalawareness101 8 หลายเดือนก่อน
Just tested and the new drivers let me BS16 vs BS14 for my max with 900mb free. I cannot go higher.
@AI-HowTo 8 หลายเดือนก่อน
offloading can be slow, so if things are working with your current GPU, dont offload, disable it...this shoudl only be enabled for lower GPUs... despite that only a small portion is offloaded, the communication overhead between the RAM and VRAM makes it slow, hopefully in future updates to the driver, things get faster.... still for me it is interesting that i was able to train dreambooth or SDXL for first time on my 8GB graphics card, but seeing how long it takes, it is not worth it....only suitable to leave things running overnight for simple models.
@generalawareness101 8 หลายเดือนก่อน ⁺¹
TBH, anything less than 16GB just don't try to train SDXL.@@AI-HowTo At 24GB I need a 32-40GB to sit as comfortable as I was in 2.1.
@AI-HowTo 8 หลายเดือนก่อน
100% true.
@CMak3r 8 หลายเดือนก่อน
The guide from NVIDIA officially states that this option DISABLES system memory fallback. Which means that in cases when your GPU run out of VRAM, it will no longer be able to access your RAM to continue generation and will crash. Why this feature even exist is another question.
@AI-HowTo 7 หลายเดือนก่อน ⁺¹
Before this update, Disable was always the case, so we always get Out of Memory Error, now this is the official description of the three options available as given by the Driver option description:
"Typical usage scenarios:
• Set to "Driver Default" for the driver recommended behavior ,
• Set to "Prefer No Sysmem Fallback" to prefer allocations not fallback to system memory.This option sets the preference to returning an Out Of Memory error instead of utilizing system memory to allow allocations to succeed. Choosing this option may cause application crashes or TDRs."
• Set to "Prefer Sysmem Fallback" to prefer fallback to system memory while under memory pressure"
it is very useful for locally training small size models only, since it takes a long time...but when trying to train something big, you know that the time it takes is not worth it.
@luisff7030 7 หลายเดือนก่อน
It's to remember us to purchase a new GPU
@akairas6 8 หลายเดือนก่อน
i have the latest driver, but i don't find that option
@AI-HowTo 8 หลายเดือนก่อน
I did a manual installation till that option showed up as in the video, this driver update was like three days ago I think, possibly in future releases things will get faster for CUDA ops... you could also google BIOS update too, because older BIOS might have limit on Driver version update, not sure though.

ต่อไป

เล่นอัตโนมัติ

Faster Video Generation in A1111 using LCM LoRA | IP Adapter or Tile + Temporal Net Controlnet