Unlimited Length AI Videos that's Consistent, Easy and Free with ComfyUI Hunyuan Model
ฝัง
- เผยแพร่เมื่อ 7 ก.พ. 2025
- In this video, I will share how to use the LeapFusion Image To Video Lora to generate very consistent unlimited AI videos durations using the new hunyuan model with ComfyUI
All run locally on my desktop with a RTX 3090
Sharing my workflow with video gen Consistency with Image to Video ++
Explaining how to use real image to video to create final video output as long as your patience last
Full workflow execution from prompt and image reference to preview videos and final generation
Important trick that enables continuous stable videos
Added combine and upscale to the workflow
Bonus video generated with hunyuan model with this flow
Note: This is not controlnet like yet, but a better way do image to video.
This presented Workflow Download (click link, then click [...] then download):
github.com/Jae...
If you would like to support me, as I had to drink lots of coffees for this one ;) buymeacoffee.c...
Tools used:
--
ComfyUI: download and installation instructions
github.com/com...
ComfyUI Hunyuan Extention: download and installation instructions
github.com/kij...
Hunyuan LeapFusion Lora:
512x320 resolution version (the one used in this tutorial):
huggingface.co...
960x544 version (brand new that I'm testing in progress):
huggingface.co...
General Vid Gen before Image to Video:
• ComfyUI Hunyuan Video ...
Hunyuan Other Loras:
civitai.com/se...
Hunyuan Model Files:
huggingface.co...
huggingface.co...
huggingface.co... - ภาพยนตร์และแอนิเมชัน
This workflow is outstanding. Not over complicated, and highly effective. Best I've found!
Happy to hear that @jamesdonnelly8888. Be sure that will pump me up to create even more videos. Stay tune ;)
@@jamesdonnelly8888 totally, I hate when people have like 65 nodes. You know before you even download it that at least half the nodes won’t work.
U put lots of work into this mate,subbed ! Keep it up !
Hi @wizards-themagicalconcert5048, thanks for the sub, it gives me the push to create more content ;) Very appreciated!
Thanks for taking the time to make this tutorial, it is really well explained.
Hi @OllyV, super happy you this is helping you. Thanks for the great comment.
Hundreds and hundreds of hours... pls dont tell anyone, it's a "Secret"...
Thanks for posting the workflow, great work
Hehe, was wondering if this will ever show up ;) Thanks for making my day
@vazgeraldes !
It was my next step with my workflow Thank you for posting. Subbed and liked.
Hi @RubenVillegas-z6n, I'm glad I can be of help. Let me know if you find something to suggest that would be better in my workflow ;)
Hi! New in this field, but learning to use some AI stuff. Thank you! Subbed too!
Hi @Witty_Grtty, welcome onboard! It’s a never ending ride once you’re on… so get ready for the adventure!
Thanks for tutorial, very interesting how to generate a video from image and video reference like on the homepage of hunyuan tencent in the bottom section, you don't know or maybe it's not yet possible to realize?
Hi @CloudFilms-auf, my understanding is that the hunyuan features are been released to the open source community slowly (but surely?). Seeing those tells me that it's all possible, just a matter of time before we can get access ;) We just need to be patient I guess ;)
@@AIArtistryAtelier yeah)
Is anyone getting a lot of transformations? I get it with this workflow, but not others. By transformation, I mean when the video morphs from, say, dancing into a character holding a gun. These are entertaining, but not terribly welcome. I have a different workflow that is much more reliable. But the magic in this here workflow is the way you can hand it a starting image, and it really does start with that image. The loras don't immediately distort or modify it, at least not until a sudden, shocking transformation takes place.
Hi @prattner67, during my test I have noticed this also. In one way, it does make sense that the morph happens. You are giving it a image and it has to stabilize toward a prompt and lora that is probably very different than what the image is, no matter how good or how hard we try to write a good prompt or choose a good lora. The only way I was able to minimize this transformation is to make sure I use a source image that has been generated from a lora and prompt and using this model to minimize this impact 12:38. Still, not all lora works well with this, and this lora to image is very new and very experimental. I hope a real image video feature will be supported soon that would probably have much better result. These things are very in the early adopter phase, it comes with its quirks and a lot of trial and error. Have fun. :)
Hi there! A really great workflow. However, I have a problem with the colouring. When I generate several videos one after the other, they gradually become lighter in colour, or the sutaration becomes more extreme from clip to clip. Does anyone have the same problem or an idea for a solution?
@macmotu, indeed, I get that also. You probably saw in my video at th-cam.com/video/m7a_PDuxKHM/w-d-xo.html where after a few gens iteration, the color saturation does become very off. Make sure you are using the solution I proposed. I am building another workflow that will also help with this issue and will give extra tips, like changing the seed often so that the stabilization does not go "over saturation" and kinda resets by having new seeds to work with. Stay tuned.
Minimum: The minimum GPU memory required is 60GB for 720px1280px129f and 45G for 544px960px129f.
Recommended: We recommend using a GPU with 80GB of memory for better generation quality.
like, what the actual fuq??
Hi @aegisgfx, indeed the cost of video generation on VM is very high. But it boils down to what resolution and how many frames you want it to be. Since most of us has very much lower GPUVRAM, the workaround is to generate something smaller and upscale it afterwards. There’s a price to pay for that as you upscale regarding consistency and quality, but it’s a good workaround to get something decent with the hardware limitation we have. I hope doing less frames or smaller resolution is something that would be acceptable for you. Good luck.
Wow, that sure is an interesting workflow. But I wonder how much VRAM and RAM memory do I need to have to run this. I have a RTX 4060 TI with 16 GB of VRAM and more 48 GB of RAM, is that enough or do you know any setting or workflow that could work to me? Thanks.
Hi @edimilson847, i’m not sure how much this needs but I know it needs a lot. It’s really the VRam that is a key requirement, regular RAM is not that critical. I have a 3090 with 24 GB of the VRAM and I know this works. I think the other workflow with video to video uses less VRAM, th-cam.com/video/aDDU3p72mkQ/w-d-xo.htmlsi=i-W8Epx3-FTLXsL2 You can give that a try first. I will post a tutorial and workflow where I will focus on minimizing the VRAM usage for a generation video. I think this will help many of you guys. Stay tuned.
Really thank you so much for providing the workflow json for this stunning workflow. I have encountered what is likely a silly n00b error I hope will be easy for you to address. Once I generate the first video from image I right click on the video to save the preview, this seems to be stored in comfyui\temp folder, I select the clip from the temp folder and change the switch to generate from video and immediately receive a no frames generated error. What did I miss?
Hi @MichaelMoellerTRLInc, i’m doing more test with it and realize that the output of the video does not always equal to the number of frames that we have set at the beginning. Hence, my current way is not the best to get the last frame. I will try to update my workflow tonight so that this issue will be resolved. In sum the issue that you’re getting it’s because it’s keeping more frames in the video that they actually is available, hence it ends up having no friends at all in the input. Very sorry for that ;)
@@AIArtistryAtelier no worries, I'm just pleased it wasn't my fault and that I had just missed something basic. I appreciate the effort it takes to do something like this and the risk of it turning into a tech support nightmare. You're achieving legendary status in my book.
💠Thank you, Master! I've applied all your techniques and created my very first music video. I would love to hear your thoughts!
OMG @EpicLensProductions... I mean, I'm good with tools and such, but you definitely got the creative eye! I'm really amazed you can do this with what I shared. You just won yourself an instant subscriber ;) Keep up the great work, I just feel great to know I got some part of that outcome :)
@@AIArtistryAtelier thanks, much appreciate Master teacher! 👍 keep up your great work! 😊
whre can i find the fastvideo lora. i have the leapfusion lora but not the fastvideo one at the very top left Lora Select node.
Hi @TailspinMedia, it was in the description links, but was kinda hidden behind all the other files. Here's a direct link :) huggingface.co/Kijai/HunyuanVideo_comfy/blob/main/hyvideo_FastVideo_LoRA-fp8.safetensors
@ thank you!
can u help? getting this error
DownloadAndLoadHyVideoTextEncoder
Failed to import transformers.models.timm_wrapper.configuration_timm_wrapper because of the following error (look up to see its traceback):
cannot import name 'ImageNetInfo' from 'timm.data' (C:\pinokio\api\comfy.git\app\custom_nodes\ComfyUI-tbox\src\timm\data\__init__.py)
Hi @ParvathyKapoor, I'm sadly not a great installation debugging expert. The fact it mentions "custom_nodes" tells me maybe something is wrong with that plugin / node? Looking online, it's this one: github.com/ai-shizuka/ComfyUI-tbox, but I am not using it... maybe try my workflow by default first, before adding more nodes? If that does not help, best way in general is to google the errors you're getting and hope to find some guidance. Good luck!
Hi, I did some more search, It might also be related to this: github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/291
Good luck!
Your version of timm does not include ImageNetInfoRun ,you should upgrade timm inside Pinokio’s environment,just run pip install --upgrade timm directly in your terminal. If upgrading doesn’t work, try downgrading to an older version
@@dailystory6678 ok i ll try! thanks
at 10:50, you fast fowarded, but I'd like to know how long it took on your 3090. I'm using a 5080 and I lowered the steps to 20 and fps to 12 and it took about 18min before it got an allocation error
Hi @g4zz4tr0n, a 5080!! Nice! Yeah, it should run much faster with your gpu! On that run at 10:50, it took ~300s, not counting the 1st one time load of all the models. Hence, don’t count the first run time. Then you mention about steps and fps… those 2 have little to no impact on the VRAM memory usage, where the allocation Error usually means there’s not enough vRAM memory. I suggest for a first test to lower the resolution and the amount of total frames to be generated. That should help if you have enough to load those models.
So I was able to generate 1 video and now i'm getting OOM errors each time.
Hi @prdsatx4467, I get this sometime too after generating a few videos even if I didn’t change any resolution or amount of frames that would cause this. My workaround so far is to unload and reload everything so that there’s nothing still left in memory. There’s a button on the top right of the comfy UI interface that does this automatically. But another way it’s just to simply restart comfy UI. I hope this work around resolve your current problem.
Hi. How can I create just one node with video combine and just output the last frame without the rest.
Hi @onlineispections, is your question how to get that last frame from the video you have generated to be brought back into the loop? If yes, then, like I’ve shown in the video at 11:13 , just generate the video then save it to disk then on the input section load video. In that note perimeter, there’s the skip frames where you would skip all the frame except the last one. That’s how you get the last frame in a video. I hope that answer the question and that I’ve understood it correctly.
Hi. Already done. No need math expression. If I load video losd there is no "skip_last_frames" input but I solved it. Finally. There is an option to increase the definition, not upscale, but definition@@AIArtistryAtelier
First I must learn to use Comfy uuu
Hi @nvadito7579, would it be interesting if I create a very quick intro to ComfyUI as a quick starting tutorial so that anyone can jump into the video gen rapidly? I have to admit I've been using ComfyUI for a while that I forgot this can be overwhelming if one has never used it before. If there's enough vote on this I might look into it ;)
@@AIArtistryAtelier Now there are several videos about Comfy for beginners, but it is still complicated to learn and master. Although no one is talking about SwarmUI, which is recommended for those who want a simple interface while still having the power of Comfy. That catches my attention.
I’ve never tried swarmui, hope it’s a good starting point for you abs help you begin on this then ;)
Omgggggggggggggggg I can’t believe that worked wttttttttfffff 21:03
This honestly freaked me out for a second
LOL @v1nigra3, this is exactly why I put these in the videos. I know it's a bit "out of context", but I also know many don't know what or how this works (or that is even possible!). The results is amazing when you get that cross eye to work, and once you know... believe me, just google up and there's tons of images for you to view. And btw, if this is your first experience of this, you might also want to check out "stereograms" and "magiceye". Note that it's much more harder to see them (I remember taking hours to figure out how to do it), but the results is amazing also. Anyways, happy you got to experience something that amazed me for years ;)
@@AIArtistryAtelier this is going to sound crazy but this justifies an experience I had while visually/mentally stimulated, when I looked at the person beside me I could see 6-10 versions of them exactly like that. And now it makes sense how it’s possible it could happen if I was so visually impaired that my brain was creating split layers of light, as I was processing light a bit slower. Leading to the diffraction 🤯🤯
@@v1nigra3 wow, that’s quite a unique experience. Not sure if I want to experience it myself :p but this one I’m sharing is easy to access or even generate ;) glad you’re having fun ;)
@@AIArtistryAtelier oh it was something else my friend, something else….aaand you know it ;)
face when someone with six-fingered hands asks you on a date 14:57
Hi @AxionSmurf, I looked so hard to see that extra finger... but I didn't find it in that specific gen (but I know it happens often)... then rereading your comment, I realize what you mean :)
getting this weird error when running on colab-
Error(s) in loading state_dict for AutoencoderKLCausal3D:
Hi @shivanshjoshi1189, I did a quick search and saw this: www.reddit.com/r/StableDiffusion/comments/1glh436/runtimeerror_errors_in_loading_state_dict_error/
I never got that error myself, but hope that helps. If not, do search and see what other suggestions others would solve their case. I don't use colab neither, so can't say if that could be an issue. Good luck!
@ yes bro! Got it solved. There were few video processing libraries that just don’t work with colab + my issue was due to vae mismatch.
Can you go from one image to a totally different image using your method?
Hi @bkdjart, the current Lora that allows image to video has been trained specifically to have a still image at the beginning, then followed by a video. There is currently no other Lora that I’m aware of that does what you ask. Still, as a funny work around, you could take a video and reverse it then attach it together going from still image to video then video to still image ;) I know it’s not exactly what you’re looking for, but it fits the requirements? ;p but in all seriousness, I think that feature that you’re looking for will be coming out soon. When it’s out, be sure I’m gonna be sharing a video about it. :)
I am confused about the stereoimage section at the end - how are you doing that?
Hi @Karsticles, are you confused because you can’t see it in 3D? Or because you like what you see and want to do it yourself but don’t know how?
So, if one was to do this for an hour video, you would literally spend the whole day doing this for over 720 videos, pick the last frame, prompt, process, save them, and merge them. Is that the process? if it is, is that practical?
hi @tiwale6387, you are totally right, this is very unpractical. Reality is with free and local, You can only run as good as the set up and GPU that you have while still staying in the “free” spirit. 5 to 10 second video that is rather consistent is the best I seen so far that can run locally. That being said, I’ve thought of creating a workflow that can loop automatically so that you just let it run and go away. My hesitation is about quality control as one bad gen would cause all the remaining to be bad afterwards hence having a review after each is OK for me. I’m seriously counting my days that my 3090 would die at this point of so many generation… so I’m worried about my GPU Being burn out if I leave this running a long long time. With the current process every 10 or 20 minutes there’s a pause which makes me feel better. Then again all this is personal choices, I guess if someone has a water cooling system and great ventilation, That might not be an issue..
So to answer again your question, yes, very unpractical… But also very fun? ;)
Hi. How do you get a workflow with just inserting an image and the lora processes it without video etc? You can make a simple one, without the subsequent discussion where we will think about it later with the edit
Hi @onlineispections, the really amazing thing here is that the Image to Video lora that just came out by LeapFusion. Basically, It can animate and turn into a consistent video any image. It's not just an "influence" or "look alike", but really Image to Vid. It's not perfect, but from my opinion, really good considering it's the first version of it. Normally, you could get a video with the text prompt using Hunyuan model, but now with the Image to Video Lora, you can use the Image for the heavy work, then add on top extra input to control it. And with the Image to Vid, the "infinite" loop here applies by appending these videos together. Hope that helps?
Note that if you want to just generate a video without image image input and just a prompt, you can use my other workflow. th-cam.com/video/99JjTcWfTBg/w-d-xo.html or th-cam.com/video/aDDU3p72mkQ/w-d-xo.html
@AIArtistryAtelier Hi I already have Hunyuan text to video with uned and guff clip, can you send me a workflow with lira image to video five I can insert it?
@AIArtistryAtelier Hello I need please since I already have hunyuan text to video how can I create a workflow with only the image node and lora I2V fusion And that's it, if you send me a workflow like this I'll use it, without any other additions. Or where do I insert the image node and the fusion time in the text to video workflow? I use guff. Bye
Great video, If I use SDXL to make an image, will this still work? You mention it's key to know the prompt of the image you created... Different models produce different images. Looks like you used FLUX for your image. If I use flux or SDXL, how do I get the img to video working correctly?
Hi @thays182, happy to see that you really took into consideration that information. Try it out and see how it goes. There's alot of trial and error in all this. But in my opinion, since we're using the Hunyuan model and the Lora that is very new and has limited capability, getting an too detailed image won't give the results you expect. I think after 1 or 2 gen loop, it will start to deter from the original image. The way I got it going was to use the hunyuan model itself, as this will ensure it's the same model that's stabilizing. But again, I could be wrong, try it out and experiment with it :)
@@AIArtistryAtelierI missed in your video that you used hunyuan to make the img, makes sense. Ty!
All good, super happy I can be of help! Enjoy the many many video gens to come ;)
@@AIArtistryAtelier Are you also saying to just a take a single frame from one of you videos? I'm not finding a comfyui workflow for text to image. is that not a thing for hunyuan?
Oh, just check out my other 2 recent vid… both had video gens from prompt, workflow included ;)
Very Exciting Workflow and thanks for sharing. Im getting this error with the HY vae: RuntimeError: Error(s) in loading state_dict for AutoencoderKLCausal3D:
Missing key(s) in state_dict: "encoder.down_blocks. Do you have any idea what it could be? I have never seen it before. I have the Hunyuan video VAE bf16 in the vae folder. I can def select it.
Hi @dkamhaji, I've not gotten that error, but I did a quick search on your error (i assure you, these kind of issues are solved 99% of the time by seeing if others had similar issues and it gives hints on how to solve it). In your case, it seems that you don't have the right VAE file. Make sure you get it from Kijai. Here's the thread of others having similar issue: github.com/kijai/ComfyUI-HunyuanVideoWrapper/issues/307
I hope it works after that. Good luck!!!
well done !!!! but having issues with allocation tho , tried downscale of 256x256 and upscale by 2.0 and vae tile of 256 res
@DeathMasterofhell15, try reducing the vae resolution to 128. Still it depends on what GPU you have. How much vram do you have on your GPU? Since there’s others that have small graphic cards, I will soon create a tutorial and a workflow with a much reduce VRAM footage that will help you guys out. Stay tuned.
Running 2080ti oc on 11gb vram
Can you please share a workflow for a basic video creation for hunyuan? I copied your prompt and got a horrendous result and I think the workflow I"m using is off, because your image quality is great... Any direction on where to get just a basic video created (Pre input image) would be greatly appreciated! (Can you point me/us where to go for basic workflows like this?)
Hi @thays182, I got 2 other videos that shows basic workflow to generate a video simply without necessary input. Give them a try ;)
Just look for my last 3 videos ;)
That is very clever way to increase video length. Would it be possible to do this without the Hunyuan video loader, because it doesnt support GGUF. I am able to generate at max 2 second videos, with 4gb vram rtx3050. But its pretty good quality.
If I can use your technique to that workflow it would be very helpful, although with 2 second segments, I fear the merging wont be good anyway.
Hi there @adiadiadi333, the special thing with Hunyuan is to allow that consistency of the video (no flickering / changing elements as the video progresses) while what's new here, it's the Image to Video that greatly matches the initial image (not just "looks like" the image). This combo is critical if you want to use the trick of appending videos one after another and go infinite. Hence, if you can do it with other models to get that Image to Video and Consistency maintained, then you got yourself a solution. Even at 2 seconds only videos, it will work (just a bit choppier since its shorts video connected?) Let me know what other solution is out there ;)
@@AIArtistryAtelier Im using the image to video and getting no consistency whatsoever, it doesn't throw any errors but just creates something a bit like the prompt but nothing like the image.
I see, not sure if what I shared at mid video about the trick regarding using an image source that you know what are all the details of how it was generated, and reuse the same so that the stable diffusion doesn't "stable diffuse itself toward something else". I mean, it would be very hard to have the perfect prompt / lora / model combo to reproduce any image... but using that trick, then suddenly you're giving it all the chances for stable diffusion. Note that it's far from perfect, but you should be able to get similar results that you saw in my tutorial. Good luck!
dont works with 11 vram
Hi, indeed, local ai needs lots of VRAM sadly. I’ll see if there’s a way to reduce the usage and minimize the impact on results. Big something will have to give for sure.
Great Video! Have you been thinking about I&V to V? I'm playing with V2V with my Lora and having a rough model and simple animation in blender that drives the video overall (very testing phase of it), and thinking about next step, so extending the video with last frame and continuing the movement as planned with ref video. any idea how to achive that?
Hi @PulpetMaster (had to read your username a few times... :) anyways, yes, that's for sure what I want to do, to have greater influence / control over what I get, since now we can have very very long lengths of videos that's consistent. Not sure if you saw my other Hunyuan V2V tutorial? I'm thinking of a way to combine both together. Keep in mind, better V2V would be some controlet involved... yet I havent seen it working yet. I saw a few others doing some attemps, but the output was not great yet. Something brewing I guess... but we're almost there. Check out the other video if you haven't, as maybe you will figure somethin out and share with us! :) th-cam.com/video/aDDU3p72mkQ/w-d-xo.html
@@AIArtistryAtelier Thank you for response! really appriciate it. I've found link to your channel some time ago on reddit and watched most of your videos, Workflow from mentioned one is base of one of my workflows. Will be testing and if I'll find out something for sure will leave a note. Right now I'm playing with experimental euler_ancestral'ish Sampler made by Blepping - gives plenty of sampling control while doing V2V and love the outcomes of that.
Regarding UserName - it is inspired by GiTS series - PuppetMaster, but first part Pulpet is Polish - means meatball ;)
hehe, yeah, I googled Pulpet just to see what I get. But good to know the gits reference of puppetmaster. That's a classic movie ;) I thought it was just a general reference to a puppet master. Anyways, happy to know I got a dedicated follower. There's just so many elements in these to test with and more are coming out faster than I can try. Let's all share our findings so that we get the most out of these, I feel we're part of the pioneers user of this tech, which is really fun!
what is the man in the bottom right corner doing?..
lol @zinheader6882… he’s trying very hard to represent the speaker and be the avatar he should be… key word is “trying”
what is the max resulution i could get outputs
Hi @GriiZl797, I don’t think there’s a max per se. It really depends how much VRAM your GPU has. The more VRAM you have, and lower the frames you want to output, then the more memory available you have to generate a bigger resolution video. That being said, the Lora of image to video has been trained with specific resolution (biggest is 544x960), if you go much higher than what it has been trained on, I think the result would not look good. Try it out :)
For a noob like me do you have or could you make a video showing how to set this up, I already have hunyuan functioning but what do I need aditionally, I have never used Loras.
Hi @wwitness, if you got hunyuan working with ComfyUI already, then you're 80% done! Really, what's addition is just extra nodes that enable to load the Img2Vid lora and other loras. Make sure all the nodes are installed from my shared workflow (see in my description), then download the Img2Vid lora and copy it to the comfyui folder under "models/lora". Once the file is there, then on the HunyuanVideo Lora Select node, click on the lora name and the list of lora will show up (from that folder files) once refreshed (reload the page). Hpe that helps!
@AIArtistryAtelier Okay thanks for the detailed explanation, really appreciate it 👍 earned a sub 💪
Hey boss, question: the video I'm getting doesn't look anything like the reference image, does it have to be with the denoise?
Hi @Corholio_Zoidberg, there’s multiple possible reasons for this assuming that you have set up everything correctly without errors. For example, the resolution of the image cannot be too high as there’s a optimal resolution that has been trained for this. There’s also Lora where not all Lora works well with the image to video process. It’s a hit and miss trial and error kind of thing. your prompt should be describing well what is being shown so that the generation stabilizes toward what you wrote … there’s also the image itself make sure you use something simple and not to detail. I suggest you start with an image of a human since the training on such seems to be doing better than most other cases. I tried the other day with the image of aliens and spaceship, and the generation was very confused of what to do with that. Best of wishes.
Any setting I have to adjust for 3060 12GB ? please make another the video for 12GB Vram
Hi @badguy4839. Another user pointed out that Kijai/llava-llama-3-8b-text-encoder-tokenizer is very big, which is the cause of this. I'm so used to my 24GB VRAM that I don't notice when things are using alot of VRAM. I'll look around to see if there's a workaround for this. Thanks!
@@AIArtistryAtelier Thank for pointing this out for me. That made me crash a lot yesterday, I will try another one
thanks for the tuto, sadly i get just colorfull noise, Ive checked all settings... no luck so far
Hi @lc7ineo, something must be different. Is this the first time you run a hunyuan video gen? Try my other more basic workflow to see if they work (my other vid tutorial). Just to see if it's specific to this tutorial or all gen in general. Hope that will help decipher the issue.
sounds like denoise value is too low
very nice, thanks for this. i have getting issue while executing the script is "" Trying to set a tensor of shape torch.Size([128256, 4096]) in "weight" (which has shape torch.Size([128320, 4096])), this looks incorrect." any idea how to fix this. thanks
Hi, I think you need to update your plugins / nodes. Go to your ComfyUI Manager and updates those nodes (mainly the Kijai ones). I think that will do the trick. Good luck!
@@AIArtistryAtelier I have the same issue and update nodes does not work
Buenas tardes hermano, lo siento, al cargar su flujo de trabajo, parece que faltan nodos, no puedo encontrar los que faltan, parece
Tipos de nodo que faltan
Al cargar el gráfico, no se encontraron los siguientes tipos de nodos
GetNode
SetNode
Hi there @RobertMendoza-s5y, from google translate, it seems you're asking how to get the "GetNode" and "SetNode". I am not even using those nodes, but they are part of the ComfyUI tool. I think your tool might be outdated, you should update ComfyUI itself. Once updated, then I think those will resolve correctly. Good luck!
@@AIArtistryAtelier ok, thanks bro, i will check
What video do you use for the first run before you have any videos to use?
Hi @AxiomaticPopulace74, if you notice from my video tutorial at 7:40 (th-cam.com/video/m7a_PDuxKHM/w-d-xo.htmlsi=-qWOQxnxclgRNhdj&t=440), you will see that I got a switch that you can choose between an "image" input and a "video". Make sure you start with the "image" input. From there on... forever loops it goes ;) Good luck!
Thanks for sharing will this work for under 12 gb vram gpus?
Hi @YoungBloodyBlud, I saw from other users that they can make Comfui + Hunyuan work with as as low as 8GB Vram (search "comfyui hunyuan vram"). I haven't tried myself, so i'm not certain. Especially with these new LORA's and nodes, not sure. But like anything, I hope that if you lower enough the resolution of the video, and have smaller lengths videos (that you will append all together), then i sure hope this works with 8GB. If you ever get to try it out (or anyone else), it would be great to know if it does work or not! Thanks for the interest!
Hi sorry to bother again, so I can only run the GGUF/ Low VRam version, struggling to get this one to work, would it be possible to create a workflow that allows for this as I can't find anything else essentially achieving the results you have here.
Hi @wwitness, i’m not sure if that’s possible or not but the tokenizer for the prompt is another one that use a lot of vram also. I think as time permits I’ll try to scale down everything to do very bare minimum to see how much vram does it need to get something out. for now I think if you use my other workflow that is V2V, I think that one you can actually use the reduced model, but I’ll have to doublecheck. One thing that makes me hesitate is that already the current quality is very limited, I’m worried if I lower everything even more that even if it will work, the output will be so bad quality that one would question why use this at all. It’s sad but in the realm of AI Gens, VRAM is so critical and so expensive. I guess this is how all those cloud services are gaining so much customer and momentum because there’s a clear need right now that is not easy to meet locally.
@@AIArtistryAtelierOkay, you mustn't stress or anything, I was able to adjust temporal tiling a bit and spatial tile, nothing to drastic, and it worked also offloaded to my CPU.
@@AIArtistryAtelier If I'm adding additional Loras, am I only copying the HunyuanVideo Lora Select?
Hi. Is this Hunyuan's official image to video model or your experiment?
Hi @onlineispections, my contribution to the community is really about the explanation and sharing my knowledge. I also try to make the workflow as easy to use as possible, and share my own experience and findings as I use what's available out there. The models are all available online made by others. In this case, it was made by leapfusion: leapfusion.ai/en. You can also check their github: github.com/AeroScripts/leapfusion-hunyuan-image2video
@AIArtistryAtelier Hi, when is the official version of image to video coming out? Hunyuan?
Well I got no idea (or even if it will ever exist). We can only hope when things are all free ;) This seems to be the best we got for now.
No, the official I2V model is not yet released but it's on the roadmap and it's coming soon.
@BoyanOrion Well where will it be found? On which site? On hunyuan vudeo of GitHub?
hey great video, any idea on how to install Kijai/llava-llama-3-8b-text-encoder-tokenizer? ty
Hi @ZiosNeon, download huggingface.co/Kijai/llava-llama-3-8b-text-encoder-tokenizer
Files go to ComfyUI/models/LLM/llava-llama-3-8b-text-encoder-tokenizer
All details are found here: github.com/kijai/ComfyUI-HunyuanVideoWrapper
Hi good Tutorial everything Works but i took Super long To Load The Text Encoder LLM model, is there anyway to faster? or its supposed to be slow i use HDD
Hi @zacharywijaya6635, I think it's normal that the first run is always longer for all the loading that takes place. But after that, it should go much faster as everything is loaded already. But you're right, depending on your hardware, there's a big difference also. My main disk is an NVE drive that's very fast and has most of the ComfyUI and Models. I got a few hard drive as slow extension but has the big yet less frequently used data like Loras. I think it helps accelerate the load.
i need workflow to do not all animated element only blink eyes, hair wind, or other. but all videogenerators moves all girl body((.
Hi @AI4kUPSCALING, you could check out my other Video to Video Hunyuan workflow. It's not perfect, but I think if you input a still video (aka single frame repeated) instead of a real moving video, I think you could get something close to what you want (very limited movement). Obviously, the prompt would play a role too (put in there like "immobile, frozen etc") with eye blinking and hair flowing etc. With all of those, hoping in only a few tries, you will get what you want. Good luck! Here's the workflow for V2V: th-cam.com/video/aDDU3p72mkQ/w-d-xo.html
Hello, I have installed everything, both on my local PC and on run pod, however it does not work, I have no error message, the only clue I have is: the preview of the video is not playing (on my local PC it was working before I updated) / I don't know any more, do you have any ideas? Thanking you!!
When I reload the node (at least when I delete the one at the bottom left - and generate a new one (without the skip) the preview works of the video inside this node works
Hi @sachaschmidt1080, so you were able to get it to work? Idk, I seldom get no preview. Maybe a new bug? So far I just reload / restart ComfyUI in those cases and all start working again. Note that even if the preview video is not showing up, the video is still there (you can right click on the node and save or play it).
@@AIArtistryAtelier Bro i reinstall all depandencies and nodes etc and its ok ! thank u verry much GOOD JOB !
@@sachaschmidt1080 super happy that out worked! My success (and failures) actually helped someone 🎉 but yeah, this is the kind of issues we have to deal with since, within the spectrum of adoption, we’re in the very very early adoption phase! Have fun!!!
Thanks for the video! Everything is updated, but I keep getting this error: module 'comfy.latent_formats' has no attribute 'HunyuanVideo' any ideas?
Hi @kittrdge, I didn't get that error, but when I did a quick "google search", this is what I found, I think it will help :) github.com/welltop-cn/ComfyUI-TeaCache/issues/16
how much vram do you need for this cause i have 16 gb and i allways get allocation on device error
Hi @Jeremy-sk9qh. Another user pointed out that Kijai/llava-llama-3-8b-text-encoder-tokenizer is very big, which is the cause of this. I'm so used to my 24GB VRAM that I don't notice when things are using alot of VRAM. I'll look around to see if there's a workaround for this. Thanks!
@@AIArtistryAtelier awesome thanks
Is it possible to run it on Google colab ????
Hi @sinavalizadeh1802, I've never tried it myself, but I don't see why it wouldn't work. Doing a quick search and it seems other are doing it, just follow their guidance I guess, then once you got a basic working, then you can upload this workflow to gen some nice vids ;) Good luck, let us know if you get it to work on google collab ;)
Great video, but I am getting error with VAE loader. I have checked everything with the models, loras and nodes have been found by Comfyui Manager. So not sure?
Hi @RonCort, I remember reading about errors and what others did. One of the things is to ensure you are loading the Kijai VAE (the one from his own share, as apparently they are not the same if you used somewhere else). Another element was to update ComfyUI and also all your needed nodes, especially the Kijai ones since this is very new thing that just came out and needs update for compatibility. But I can tell you, with these new techs and features... my best friend was always to google those "errors" that I get. I know it's a pain, but it's part of the deal sadly. On that, hope you get it working!!!
@@AIArtistryAtelier Thanks.
I get the Lora but where do I get the program to use it?
Hi there @buster5661, if you check my description link, I've put it there with the rest: github.com/comfyanonymous/ComfyUI. Welcome onboard to the new adventure (addiction?)
@@AIArtistryAtelier your tutorials are not clear and are confusing. Please work on that
@buster5661, feedback well received and noted. I assure you there's a bunch of things I want to improve in my tutorial. I'll have to find more time to plan / script etc to make it better. But feedback is always appreciated. Thanks.
I have many models in my stable diffusion automatic 1111, can I use those too for comfyui ?
Hi @faisalu007, Hunyuan and video gen are very specific to this model, hence, you can only use that model and the associated Hunyuan Lora (using other lora wont work or will give strange results). So yes, you will have to download more models, and yes, currently the selections are limited, but growing. Have fun!
I keep getting this kind of error given by the sampler:
RuntimeError: shape '[1, 625, 1, 128]' is invalid for input of size 2640000
I'm using an image input instead of video
Hi @ivoxx_, I think you need to update your plugins / nodes. Go to your ComfyUI Manager and updates those nodes (mainly the Kijai ones). I think that will do the trick.
@@AIArtistryAtelier Will try
I think your acpect ration is of, maybe try another combination
@@YungSuz I tried a lot of combinations 1:1 doesnt even work
@@AIArtistryAtelier This seemed to fix the issue! Thanks!
I think the LoRA on Civitai is no longer available. Do you know if it was removed or if there’s another way to access it?
search the same in google and u will find in hugging face
Hi @dailystory6678, could you check again? I checked and the links to the loras (general hunyuan lora as well as the img2vid lora) seems to both be working. Share with me the link that you're using?
@@ParvathyKapoor Found it! Thanks. Also they just updated it
@@AIArtistryAtelier Yes, I found it! Thanks. I think they were updating it or something
I have a problem with the AI models for video. I originally saw this in Stable Cascade then SD3.0 to now in these modern video models. Outdoors you will see how bad it is because it is well lit so it can't hide the issues. I look at trees, and even the ground in ltxv was bad. Not sure why these have the issues, and I have used ever WF I can find just refuse to use LoRA as I don't do people more of a style guy (and concepts).
Hi @generalawareness101, from what I get from your comment, you're saying you're not into gen of characters, but more about nature / style and that you're seeing that the current models don't generate good nature output. Even if I didnt try it myself, I do get how that can turn out not that great. I mean, it comes back to what was the model been trained on. I'm guessing there's more characters being trained than nature, but it's just a guess. But like Yor Forger, what you would need is a good Lora about nature! But... that is more easier said than done. There's much less interest in such it seems. But there's tons of lora being created weekly, just give it a bit of time and there will be more nature / style lora that will come up. Once that happens, i'm sure the output will look much better.
@@AIArtistryAtelier I could not get your WF to run as it threw so many missing keys error at me. edit: Nope, I cannot get this to work. Everything was just updated to make sure.
Hi, when you mention "missing keys"... do you mean like many of the nodes / rectangles in the user interface shows up as "red"? Then you mention that you cannot get it to work, what seems to be the issue?
@@AIArtistryAtelier I meant like this: RuntimeError: Error(s) in loading state_dict for AutoencoderKLCausal3D:
Missing key(s) in state_dict: "encoder.down_blocks.0.resnets.0.norm1.weight", "encoder.down_blocks.0.resnets.0.norm1.bias", "encoder.down_blocks.0.resnets.0.conv1.conv.weight", "encoder.down_blocks.0.resnets.0.conv1.conv.bias", "encoder.down_blocks.0.resnets.0.norm2.weight", "encoder.down_blocks.0.resnets.0.norm2.bias", "encoder.down_blocks.0.resnets.0.conv2.conv.weight", "encoder.down_blocks.0.resnets.0.conv2.conv.bias", "encoder.down_blocks.0.resnets.1.norm1.weight", "encoder.down_blocks.0.resnets.1.norm1.bias", "encoder.down_blocks.0.resnets.1.conv1.conv.weight", "encoder.down_blocks.0.resnets.1.conv1.conv.bias", "encoder.down_blocks.0.resnets.1.norm2.weight", "encoder.down_blocks.0.resnets.1.norm2.bias", "encoder.down_blocks.0.resnets.1.conv2.conv.weight", "encoder.down_blocks.0.resnets.1.conv2.conv.bias", "encoder.down_blocks.0.downsamplers.0.conv.conv.weight", "encoder.down_blocks.0.downsamplers.0.conv.conv.bias", "encoder.down_blocks.1.resnets.0.norm1.weight", "encoder.down_blocks.1.resnets.0.norm1.bias", ....
It goes on and on. It is the node for the vae. All my other hunyuan workflows do not use that node they used the load vae. I tried that in your workflow and had a different error (that error was basically saying this node is not compatible with the next node while the key errors I don't know why it has that error. one of my hunyuan videos or an image for input same error.)
@@AIArtistryAtelier Sorry, your wf doesn't work and after everything I typed YT deleted it.
Just a shame you need a beast of a machine to run Hunyuan
@TPCDAZ... Yeah I know... that "free" claim is a bit of a stretch since you need a good desktop to begin with. Note that I heard from others that Hunyuan can work as low as 8GB VRam (just search Hunyuan VRAM). I didnt try it myself, but I would assume some sacrifice of quality will have to occur (shorther clips, lower resolution etc...). If you're not ready to invest that much, maybe rending out online is a way to go? It's pay per use, but entry cost would be tiny compared to buying a machine. Last recommendation: check out the 2nd hand market? If you're lucky, you might score a good VRAM GPU that's at a good price... but yes, there's some risk there about quality... Goodluck no matter what solution you choose!
If I go to hunyuans huggingface it shows me that image to vidoe is UNchecked!!
Hi @Entity303GB, I've put the link to hugging face of the 2 lora image to video model to make sure you're getting the right one. Still, maybe I misunderstood, what do you mean "unchecked"?
Thanks. In which (comfyui) folder should i put the image2vid-960x544 file?
models > lora
@ling6701, my hard drives are so full of models... that I had to change the default path of Lora's to something else ;)
@@AIArtistryAtelier What is your setup? Ram, gpu, ssd, cpu and total hhd storage?
Hi @labibhasan255, I got a strange setup. Let's say I got an NVME drive more my intense usage, then a bunch of just normal hard drive (yeah, normal... old big SATA ones) attached since all the models and loras download and soo many gens are taking LOTS of space. I'm kinda a hoarder... deleting gens is not something I like doing. I got 64 GB Ram and a AMD Ryzen 5 5600X CPU. Nvidia 3090 24GB VRAM GPU (the most important piece of hardware for AI... all the rest could have been rather anything ok ish). In fact, I actually got 2 GPU, but the 2nd one is useless as ComfyUI is not allowing the combine of VRAM. Still, the 2nd GPU is great when doing LLM, as that can be combined. What about yours?
@@AIArtistryAtelier Woah, I did not expect a reply. I love how you went on detail about each hardware component of your setup to paint the most visual representation through words. The build you have is actually close to the one I am saving for and planning to buy in the future. As of now, I am rocking a good old Intel corei3 2100 cpu with integrated graphics, 8 GB DDR3 RAM and 500 GB HDD drive. (200 GB of which are just pictures) with a GiGABYTE motherboard from fifteen years ago. As of now, I am performing all the AI models stuff through APIs and I have got some secret methods to have unlimited llm usage of any model of choice. Well, for all its worth, my build can run GTA 4 on 25 FPS and 30 FPS on a good day.Not right now because my cpu fan is kinda busted, so my cpu is running at 80-90 degree Celsius with basically no cooling system so I am not gaming right now. Lol. I will buy a new pc as soon as I save up to it. Thanks for responding.
friend how much VRAM do I need for these workflows because if I have to spend 1500 euros for a 4090 it's not free do you understand what the problem is?
Hi @worldchannel879, I totally know what you mean. I'm lucky enough to have a 3090 GPU with 24GB Vram to get this running. But from what I read, to run Hunyuan, you could do it with as low as 8GB Vram (search "comfyui hunyuan vram"). I can't assure you since I haven't tried, but other seems to say it works. Note that they are not using this specific workflow, so it's not impossible it needs a bit more than just 8GB Vram. Still, with lesser VRAM, you are sacrificing quality, speed etc. Think smaller resolution, longer to gen, smaller duration of each piece of video. But, you potentially can still make unlimited lengths as you append all of those small videos together :). When I started my adventure 2 years ago about AI SD and LLM, I knew I didnt wanted to go the cloud route (for privacy purpose, but also the feeling of "the more I do, the more it cost" is not a great feeling for me). In my case, the more I do, the more "it was worth it". Note that it's also my gaming PC, my multimedia station etc... so in the end, I know "free" is a bit of a stretch here, but I can tell you after doing Terrabytes and TB of gens, in my case, it surely does feel it's free :)
My recommendation: if you're not sure, start with something small / low cost... then once you feel you really want to get into this, then yes, invest in it a bit (2nd hand PC market is not that bad). What's fun with a desktop is that the GPU can be easily upgraded as long as you bought something that can take it. Good luck!
try google colab
@@AIArtistryAtelierHey, could you please tell us the exact VRAM type or model you are using?
@@dailystory6678 hi, I’m using a GeForce GTX 3090 with 24gb of VRAM ( video ram on that gpu graphic card) it’s not a cheap card I know, I was lucky to score one on 2nd hand market. That card is the single most important piece when doing AI locally.
The issue, I think, most will face is that Kijai/llava-llama-3-8b-text-encoder-tokenizer is >12GB VRAM to load. While Hunyuan can work as alow as 8GB, you will need 24+ GB VRAM to use this method. If someone finds a
You got a good point there @jarettlangton8246, I'm so used to my 24GB VRAM that I don't notice when things are using alot of VRAM. I'll look around to see if there's a workaround for this. Thanks!
Try using the auto cpu offload switch, in the video model loader. I even slightly changed the python code to replace the device with cpu for the text encoder init script to force cpu and offloading to system ram instead. For the basic T2V generation, the best workflow that allows me to go higher resolution and offload up to 45GB model from vram to system ram is the official comfy workflow. For the moment things with the other implementations are still a work in progress but this also means we need a new gpu haha. Also the bits and bytes nf4 quantization helps a lot for the text encoder.
@@BoyanOrion Hey, I appreciate the reply! For offloading the video model to the CPU; it's counterproductive. The bulk of the generative process is going to be with that model so you want it on the GPU. It also doesn't address the VRAM usage of the text encoder model (+20GB). I'll check the code for the text encoder, if it's possible to offload it shouldn't be hard to hack and repack the existing node. T2V generally isn't an issue for me, I2V/V2V is where we're loading an entirely extra model to interpret the input media. I'm still interested in a much smaller alternative model to replace the text encoder (Kijai/llava-llama-3-8b-text-encoder-tokenizer).
any idea how to fix this text encode error "" Trying to set a tensor of shape torch.Size([128256, 4096]) in "weight" (which has shape torch.Size([128320, 4096])), this looks incorrec"
HyVideoTextEncode
list index out of range
Can you help me? ☺