FAST Flux GGUF for low VRAM GPUs with Highest Quality. Installation, Tips & Performance Comparison.

แชร์
ฝัง
  • เผยแพร่เมื่อ 23 พ.ย. 2024

ความคิดเห็น • 115

  • @NextTechandAI
    @NextTechandAI  3 หลายเดือนก่อน +7

    How is your performance with the low-VRAM GGUF quantized models?
    UPDATE 2: Speed increased x2 with flux1-DEV/SCHNELL-Q5_K_S.gguf in comparison to the original models (tested on AMD GPU). Important, you have to start with runtime parameter ‘--force-fp32’ and although this parameter speeds up the quantized models, it slows down the original ones! T5 text encoder model has currently zero influence on my machine's performance, so I choose T5xxl_fp16.
    UPDATE 1: Quantized t5 text encoders are available from city96 huggingface.co/city96/t5-v1_1-xxl-encoder-gguf/tree/main.
    Many different sizes are available, choose Q5_K_M or larger. Place in your ‘clip’ folder.
    Update the GGUF node (do a 'git pull' in this node's directory), most probably you have to update ComfyUI as described in my vid, too.
    Replace the clip loader in your model with the new 'DualCLIPLoader (GGUF) to be found in 'Add Node->bootleg'.

    • @vivekkarumudi
      @vivekkarumudi 3 หลายเดือนก่อน +1

      Pretty amazing to be honest

    • @reyoo24
      @reyoo24 3 หลายเดือนก่อน +2

      It's insane. I can finally run the dev model on my rtx 2070 super with a generation time of 1m 15s. Thank you so much for the tutorial

    • @nholmes86
      @nholmes86 3 หลายเดือนก่อน

      this is the 1st tutorial on flux that worked for me, on my potato M1 8G in comfyui-CPU with GGUFflux-schnell and t5xxl-fp8, got images with 768x768 15min( better then nothing) lol.

    • @NextTechandAI
      @NextTechandAI  3 หลายเดือนก่อน +2

      @@nholmes86 Thank you very much for your feedback, I'm happy that it works now.

    • @NextTechandAI
      @NextTechandAI  3 หลายเดือนก่อน +2

      @@reyoo24 I'm glad the tutorial is useful. Thanks a lot for your feedback.

  • @TheGalacticIndian
    @TheGalacticIndian 3 หลายเดือนก่อน +3

    I get "RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x3072 and 1320x18432)
    mat1 and mat2 shapes cannot be multiplied (1x3072 and 1320x18432)" when trying to use these new GGUF models with Forge UI. Does it even work with Forge?

  • @cyberceel
    @cyberceel 2 หลายเดือนก่อน

    Thank you so much for this video! I only just got started with flux today, but I am already generating images with it successfully after following your guide.
    I am able to generate images with flux-schnell-Q4_0 in ca. 20 seconds on my RTX 3070, despite it having only 8 GB of ram!

    • @NextTechandAI
      @NextTechandAI  2 หลายเดือนก่อน +1

      Thank you very much for your feedback and for sharing this detailed information. I'm happy to read that you can use the quantized models with such a great performance!

    • @rifatshahariyar
      @rifatshahariyar 2 หลายเดือนก่อน

      ​@@NextTechandAI i have a rtx 3050 4gb which version suit for me?please help

    • @NextTechandAI
      @NextTechandAI  2 หลายเดือนก่อน

      @@rifatshahariyar 4GB is not much. Start with Q2_K and parameters --lowvram or --novram in order to check if it works. Then move on to next bigger models, like Q4_K_S and Q5_K_S. Both have much higher quality than Q2_K.

  • @crypt_exe8709
    @crypt_exe8709 3 หลายเดือนก่อน +8

    This use slightly less vram (q8 vs fp8) and (f16 vs fp16) but it's not faster, as long as you don't overpass your vram pool, the speed will remain the same. Same thing for schnell, q4 render in 4seconds on a 4090, just like fp16. Unified or "baked" BNB NF4 flux models was way faster to load, but not compatible with LoRA and now considered as Deprecated.

    • @KlausMingo
      @KlausMingo 3 หลายเดือนก่อน

      Well then, deleting BNB NF4.

    • @NextTechandAI
      @NextTechandAI  3 หลายเดือนก่อน +1

      This agrees with my observations at higher resolutions. Low resolutions are significantly faster with my GPU. Thanks for the details about NF4.

    • @fixelheimer3726
      @fixelheimer3726 3 หลายเดือนก่อน

      Not sure I will try some loras this evening and in the comments it said nf4 would work with Loras,in forge, although I ve heard otherwise before. EDIT: Some work, some don't support is still in progress (experimental state), I would say it remains unclear how well nf4 will stand his ground and if it will be working with all LORAS. For low VRAM /RAM constellations without LORA it's still an option I'd say, though I cant really check for myself as I'm on the higher end of RAM/VRAM constellations.. at github you find the forge/nf4 Lora-Support thread, I cannot link here it seems.

    • @Markgen2024
      @Markgen2024 3 หลายเดือนก่อน

      NF4 seems to be faster and high quality. I tested these Q models and others took me 6-8 mins to generate 1 image, running rtx 3060 12Gb for a test performance. Looking forward for NF4 with lora compatibility.

    • @NextTechandAI
      @NextTechandAI  3 หลายเดือนก่อน +1

      @@Markgen2024 Interesting. There are different opinions circulating online at the moment, I'm excited to see where the journey goes. May I ask which resolution you have used for your tests and how long the generation takes with the original Flux models when the Q models need 6-8 mins?

  • @BrunoOrsolon
    @BrunoOrsolon 2 หลายเดือนก่อน

    I'm very new to this world and i'm learning quite a bit from your videos. One thing I don't quite fully understand, maybe you can help:
    My system is: Radeon RX 7700XT (12gb vram) GPU / Ryzen 78000X3D / 64GB DDR5 .
    I'm running comfyUI in Ubuntu, followed one of your great tutorials.
    I just can't run any fp16 versions of the models because it says it's running out of memory.
    So, does the the 16 in fp16 there means 16gb vram required?
    Can I somehow leverage my big pc ram size for something?
    Is the CPU useful for anything in those scenarios?

    • @BrunoOrsolon
      @BrunoOrsolon 2 หลายเดือนก่อน

      Also, even for fp8 models, i always need to switch the VAE nodes for VAE (tiled) nodes.
      With the normal ones it finished, but in the middle of processing it says theres not enough ram so it changes to tiled automatically

    • @NextTechandAI
      @NextTechandAI  2 หลายเดือนก่อน +1

      Thank you very much for your feedback! Regarding your questions:
      1. FP16 means, the weights of the models are in 16 bit, FP8 in 8 bit. Hence fp16 models are very large, fp8 obviously smaller. Regarding quantized models I've added a link to the description, with some tricks they reduce the model size by sacrificing some quality.
      2. ComfyUI does already uses your PC RAM by swapping. See the runtime parameters below.
      3. No, the CPU doesn't do much here. The GPU is most important.
      4. Try using quantized models, I like the Q5_K_S very much. For its small size it brings a very good quality.
      5. Try using runtime parameters --lowvram or even --novram.
      With your hardware it shouldn't be a big deal running FLUX. Maybe you just need to use the right runtime parameters, quantized models and start with smaller resolutions.
      Nearly forgotten: In case you try quantized models, adding --force-fp32 to runtime parameters might increase generation speed.

  • @Giorgio_Venturini
    @Giorgio_Venturini 3 หลายเดือนก่อน +1

    Great video. Bravo. I wanted to ask you if you can do an update using the new t5_v1.1-xxl GGUF
    Thank you

    • @NextTechandAI
      @NextTechandAI  3 หลายเดือนก่อน +1

      Thanks a lot for your feedback and the hint. I'm thinking about a new video or just updating the description of this video - haven't decided so far.

  • @luxiland6117
    @luxiland6117 2 หลายเดือนก่อน

    hi! im lost in the Zluda step, im not using comfyui portable, i have cloned comfyui-zluda from patientx, in wich folder need to pip install gguf? thx for ur videos

  • @vivekkarumudi
    @vivekkarumudi 3 หลายเดือนก่อน +4

    I have nvidia 3060 6gb vram and am able to run flux-dev model easily which takes roughly 1 min 15 sec to generate with the the 6.2gb gguf file

    • @NextTechandAI
      @NextTechandAI  3 หลายเดือนก่อน +1

      Thanks for sharing, that's good news.

    • @rennynolaya2457
      @rennynolaya2457 3 หลายเดือนก่อน

      What resolution did you use and how many steps? Did you use the 4_0.gguf model?

    • @vivekkarumudi
      @vivekkarumudi 3 หลายเดือนก่อน +1

      @@rennynolaya2457 it always crashed with 1024 by 1024 i was stuck with 1024*768 every single time.Yes i used the q4 quant at something like 6.6gb

    • @DaniDani-zb4wd
      @DaniDani-zb4wd 3 หลายเดือนก่อน

      @@vivekkarumudithanks for sharing? Did you try the q6k version?

    • @vivekkarumudi
      @vivekkarumudi 3 หลายเดือนก่อน

      @@DaniDani-zb4wd no

  • @Gwaboo
    @Gwaboo 3 หลายเดือนก่อน +1

    im on AMD, using Zluda and ComfyUI is up to date and i can see flux support in the patch notes inside comfy manager but it cannot get the dualclip into flux mode is there an extra step required that i could have missed ?

    • @NextTechandAI
      @NextTechandAI  3 หลายเดือนก่อน

      We're talking about starting with Flux and not already using the quantized models, right? Please see my vid on how to install Flux on ComfyUI: th-cam.com/video/52YAQZ-1nOA/w-d-xo.html

    • @sheg9629
      @sheg9629 2 หลายเดือนก่อน +1

      dude can u please tell me how u got FLUX working with AMD GPU, ive been dying to get it working with my 7900 XTX, is there a tutorial u followed? please let me know

    • @Gwaboo
      @Gwaboo 2 หลายเดือนก่อน

      @@NextTechandAI works now for some reason my comfymanager wasnt able to update comfyui to the newest version reinstalled everything and now it works thanks

    • @NextTechandAI
      @NextTechandAI  2 หลายเดือนก่อน

      @@sheg9629 Please check the link as already mentioned in this thread: th-cam.com/video/52YAQZ-1nOA/w-d-xo.html

    • @NextTechandAI
      @NextTechandAI  2 หลายเดือนก่อน +1

      @@Gwaboo I'm happy that it works now on your machine, thanks for sharing!

  • @JohnVanderbeck
    @JohnVanderbeck 2 หลายเดือนก่อน

    How much do you "lose" using these models? Like my GPU can handle the full dev model, but its slow. Would using these models be faster but at the same quality or do you lose noticeable quality? Also what do the different Q* files mean?

    • @NextTechandAI
      @NextTechandAI  2 หลายเดือนก่อน

      To be honest, I didn't do any scientifically based research, I just compared the results visually. I generated many images with a resolution of 1280x720 and some 1024x1024. The quality of Q5_K_S was completely sufficient for me and could hardly be distinguished from the FP16 original. With higher resolutions (FLUX allows 2 megapixels) there are clearly noticeable differences and I had to use FP16 or Q8_0. However, I couldn't find any relevant difference in quality between these two. Regarding quantization, I have included a link in the description.

  • @L3X369
    @L3X369 2 หลายเดือนก่อน +1

    For a 12 GB 3080Ti what models do you recommend?

    • @NextTechandAI
      @NextTechandAI  2 หลายเดือนก่อน +1

      I'd start with Q4_K_S and if this works well, try Q5_K_S. In most cases this is good enough and Q8_0 not required.

  • @DarwinsGreatestHits
    @DarwinsGreatestHits 2 หลายเดือนก่อน

    To uninstall or remove nf4, do I just delete the ComfyUI_bitsandbytes_NF4 folder?

    • @NextTechandAI
      @NextTechandAI  2 หลายเดือนก่อน

      In general, yes. I usually move the folder outside of my ComfyUI folder structure and restart. If everything works as expected then the moved folder can be deleted.

  • @Larimuss
    @Larimuss หลายเดือนก่อน

    Hmmm why do you install gguf pythom package in automatic folder or environment? Shouldnt it go with comfy venv? Im not even sure how to install package for comfyu environments 😢

    • @NextTechandAI
      @NextTechandAI  หลายเดือนก่อน

      As mentioned in the vid this is only required if you're using an external Zluda installation. Without Zluda you don't need to touch this.

  • @bolt52r
    @bolt52r 3 หลายเดือนก่อน

    Do you need the 23GB F16 files also? for scnel and dev

    • @NextTechandAI
      @NextTechandAI  3 หลายเดือนก่อน

      No, if you use quantized models only then you don't need the original files.

  • @shushens
    @shushens 3 หลายเดือนก่อน

    I am struggling a bit with Flux. I have a GeForce 3080 Ti, which is nothing to be scoffed at, and driver version 560 is installed on Windows. I tried a bunch of different workflows with dev FP8, and all of them are super slow. I only have 64 gigs of DDR5 RAM. But I haven't read anywhere that it should be a problem.

    • @internetperson2
      @internetperson2 3 หลายเดือนก่อน +1

      Start comfyui with --lowvram

    • @NextTechandAI
      @NextTechandAI  3 หลายเดือนก่อน

      Your specs are perfectly fine if the resolution is not too high. Which resolution are you using and what generation times do you achieve? Anyhow, I suggest using the quantized models or for high quality the FP16 model, I do not use the FP8 anymore.
      The suggested --lowvram might slow down your machine, you have to try with different resolutions.

    • @Sumojoe-g3q
      @Sumojoe-g3q 3 หลายเดือนก่อน

      you should not have a problem with swaping from vram to ram on those specs

  • @lostinoc3528
    @lostinoc3528 หลายเดือนก่อน

    Can gguf models run on comfy on pinokio?

  • @letslearn2674
    @letslearn2674 3 หลายเดือนก่อน

    what should i put in the DualClip Loader? where do i download the file from?

    • @NextTechandAI
      @NextTechandAI  3 หลายเดือนก่อน

      Please follow my guide for Comfy & Flux here: th-cam.com/video/52YAQZ-1nOA/w-d-xo.html.
      You only have to download the T5 as described there, you can select the (most probably only installed) default model for Clip.

  • @user-fo9ce3hr5h
    @user-fo9ce3hr5h หลายเดือนก่อน

    hi, btween nf4 and gguf, which one is faster?

    • @NextTechandAI
      @NextTechandAI  หลายเดือนก่อน

      NF4 is designed for speed, GGUF for low VRAM. GPUs with very little VRAM may therefore even run faster with GGUF or not at all with NF4. Furthermore, the quality is higher with the larger quantized GGUF models.

  • @SamiH3D
    @SamiH3D 3 หลายเดือนก่อน

    Hi I'm new to all this - I don't see bootleg within the add node dropdown? Little help... Ah I got it the cmd didn't install on the first try for some reason :P

    • @NextTechandAI
      @NextTechandAI  3 หลายเดือนก่อน

      Hi, then you haven't successfully installed the gguf node or haven't updated ComfyUI.

  • @lowaura
    @lowaura 2 หลายเดือนก่อน

    Which model do you recommend for Rtx 4070ti 12gb v ram?

    • @NextTechandAI
      @NextTechandAI  2 หลายเดือนก่อน

      Definitely Q5_K_S, I have had a very good experience with it. Fast loading times with good quality. For high resolutions you could also try Q8_0, but that's not a must and it already has 12GB.

  • @tamizhanVanmam
    @tamizhanVanmam 2 หลายเดือนก่อน

    But I only have a black image as a result.Anything I did bad ? Please help me adv thanks

    • @NextTechandAI
      @NextTechandAI  2 หลายเดือนก่อน +1

      Are you using the proposed workflow? Which models? Do you have same effect with FP16?

    • @tamizhanVanmam
      @tamizhanVanmam 2 หลายเดือนก่อน

      ​@@NextTechandAI Necessary to download the FP16 model because I installed it for the first time and I'm the 6gb vram 3060 user so I chose the dev model Q4. Please guide me to that and at the same time zluda folder is not on my pc so I didn't use the cmd dir step. I thought that was the reason for the black output.

  • @luxiland6117
    @luxiland6117 3 หลายเดือนก่อน

    hi! have a 6700xt 12gb vram comfyui with zluda and stable diffusion 1.0 xl when put 1024 x 1024 crash the app,the terminal show a message CUDA out of memory tried to allocate 2.50 GiB sorry my question is not from this video T_T thx for all your excellent videos.

    • @NextTechandAI
      @NextTechandAI  3 หลายเดือนก่อน +2

      Thanks for your feedback. Even with my 16GB VRAM, SDXL was often at its limit. You might try adding the runtime parameters --lowvram --use-split-cross-attention. Anyhow, I would use e.g. epicrealism with 512x768 and upscale according to my upscale-vid.
      But Flux is currently a better option, maybe one of the quantized models fits your VRAM 😉

    • @luxiland6117
      @luxiland6117 3 หลายเดือนก่อน

      @@NextTechandAI thank you! I'm going to try.

  • @jumbomeatloaf
    @jumbomeatloaf 3 หลายเดือนก่อน

    Hi, My VGA is AMD Radeon RX5500M. Do you think it would handle this?

    • @NextTechandAI
      @NextTechandAI  3 หลายเดือนก่อน +1

      Hi, it's from 2019 and has 4GB, right? I'm afraid it's not worth the try.

    • @jumbomeatloaf
      @jumbomeatloaf 3 หลายเดือนก่อน

      @@NextTechandAI Thanks! :(

    • @NextTechandAI
      @NextTechandAI  3 หลายเดือนก่อน

      ​@@jumbomeatloaf Now, there was a positive report from an owner of a 2050 with 4GB. You might try with option --lowvram or even --noram. Will be very slow in case it works.

  • @greathawken7579
    @greathawken7579 2 หลายเดือนก่อน

    Thank you

    • @NextTechandAI
      @NextTechandAI  2 หลายเดือนก่อน

      Thanks a lot, I'm glad the video is useful.

  • @rmeta3391
    @rmeta3391 3 หลายเดือนก่อน

    About 355 seconds running "flux1-dev-Q4_K_S" in ComfyUI on a Mac Studio (96GB/38-Core GPU). So, still unusable for me, but par for the course because Apple doesn't care about MPS and open source.

    • @emmanuelgoldstein3682
      @emmanuelgoldstein3682 3 หลายเดือนก่อน

      You have to be doing something terribly wrong because I can run that same quant on forge with a 16GB RAM M1 Macbook Pro and get 90-120 sec generations.... You should be getting around 30-45 sec with a Q4.

    • @nholmes86
      @nholmes86 3 หลายเดือนก่อน

      @@emmanuelgoldstein3682 he's is using comfyui with this method, forge is another way.

    • @greenpulp.
      @greenpulp. 3 หลายเดือนก่อน

      @@emmanuelgoldstein3682 What do you use? Thinking of trying it out on a M2 Pro MacMini

  • @generalawareness101
    @generalawareness101 3 หลายเดือนก่อน

    NF4 > GGUF. GGUF is slower due to being compressed and NF4 was optimized for speed. As a trainer I wish either I could use to train with and this is hideously slow being forced to BS1 on a 4090.

  • @kiransurwade3576
    @kiransurwade3576 3 หลายเดือนก่อน +1

    🙏🏻I have a potato PC with GTX 1060, 6 GB graphics card. Will it work? Please reply 🙏🏻

    • @NextTechandAI
      @NextTechandAI  3 หลายเดือนก่อน +1

      There was a positive report from an owner of a 2050 with 4GB. You might try with option --lowvram or even --noram. Will be very slow in case it works.

    • @TheGalacticIndian
      @TheGalacticIndian 3 หลายเดือนก่อน

      Do these new GGUF models fromcity96 work with the Forge UI?

  • @Pauluz_The_Web_Gnome
    @Pauluz_The_Web_Gnome 3 หลายเดือนก่อน

    WIll it support Loras?

    • @NextTechandAI
      @NextTechandAI  3 หลายเดือนก่อน +1

      It already does.

  • @tomschuelke7955
    @tomschuelke7955 3 หลายเดือนก่อน

    Weeeelll i got a geforce 1080 potato graphics card... which ... only to an extend can use SDXL models but starts to struggle as soon as i integrate controlnets... fortunately we gotin our company A4000 graphic cards for our professional work..
    But would i get this running also at home?

    • @NextTechandAI
      @NextTechandAI  3 หลายเดือนก่อน

      I would give it a try if there are current drivers for CUDA available. In the comments I saw quite good performance with a 2070 Super.

  • @iDeker
    @iDeker 2 หลายเดือนก่อน +1

    Any available on amd gpu?

    • @NextTechandAI
      @NextTechandAI  2 หลายเดือนก่อน

      Have you watched the video?

    • @iDeker
      @iDeker 2 หลายเดือนก่อน

      Oh sorry. The zluda page shows closed so how do I install it now?

    • @NextTechandAI
      @NextTechandAI  2 หลายเดือนก่อน

      @@iDeker Still according to my vids about 1. Zluda on SD.next and 2. ComfyUI with Zluda (based on SD.next). As far as I know lshqqytiger's fork of Zluda is still accessible.

    • @iDeker
      @iDeker 2 หลายเดือนก่อน

      @@NextTechandAI i didnt get the run_zluda.bat file when running the pip install gguf command. how come

  • @kevinmiole
    @kevinmiole 3 หลายเดือนก่อน

    i have win 11 rx6600 can i run this?

    • @NextTechandAI
      @NextTechandAI  2 หลายเดือนก่อน

      Most likely yes, either with Zluda on Windows or ROCm on Linux.

  • @megapin1
    @megapin1 2 หลายเดือนก่อน

    how to use flux on rx580?))))

    • @anemborgleril164
      @anemborgleril164 2 หลายเดือนก่อน

      Stop asking, find a job, buy a better graphic card

  • @sasisuman
    @sasisuman 3 หลายเดือนก่อน

    Can we run this on mobiles?

    • @NextTechandAI
      @NextTechandAI  3 หลายเดือนก่อน

      This will take a while.

  • @CodeMania-y3e
    @CodeMania-y3e 3 หลายเดือนก่อน

    hello, can you make for me dreembooth model?

  • @brianmolele7264
    @brianmolele7264 3 หลายเดือนก่อน

    I think my RTX 4060 should handle it well

  • @ericcheah6528
    @ericcheah6528 2 หลายเดือนก่อน

    3060ti. Took 10 minutes to finish 8 image on flux. 😂😂😂😢😢

    • @NextTechandAI
      @NextTechandAI  2 หลายเดือนก่อน

      Compared to my results, that's not bad. Thanks for sharing.