Cog video is the clear winner. But with Pyramid flow, I can see what it's trying to achieve. Its model is prioritized around less consistency and more complexity around motion. Cog video X, has an issue with consistent body proportions, and off pan into frame scenarios. It's like you said, these models were trained using the SD3 model. These models should have been trained with pony for animation, or flux for real life.
This comment left is encouraging about Pyramid Flow: "feifeiobama commented 3 days ago We are working on a new model checkpoint trained from scratch (instead of using the SD3 weight initialization). It has shown much improvement in human faces and bodies. Please stay tuned."
my tests indicate cogvideox wins most of the time. plus, you can use LCM and cut creation times down to 2.5 mins per clip - detailed prompts are important too
Thanks for the Local Video content. I love locally run AI tools that arern't requiring a cloud or subscription. The 5090 with 32gb is going to keep moving the VRAM standards up and up as well, which I'm ok with, if the tools are good enough.
By now, I'm beginning to think that Pyramid Flow is a waste of everyone's time. I even noticed significant degradation of quality in the Pyramid Flow outputs, especially 10-second videos, where the image turns to garbage in the final frames. I wonder if I'm doing something wrong?
@@TheFutureThinker Have you tried loading Pyramid Flow in 32fp mode? I suspect that the bf16 mode might be the reason for the quality degradation (lower precision). But I can't test it for myself: if the model overflows out of the VRAM and in the shared memory space, things slow down so much that I don't even know of the program is running. However... I don't know if it's even worth testing at this point. I wonder if Kuaishou is going to release a better model instead.
Thank you for the video and workflow. I have problems with Pyramid. After 2 seconds the scenes go wild and unusable. The start from a image2video is good. With CogX I2V I still could not get the results like you had. I would be happy with just a bit movement in the videos but mostly the output is trying to animated everthing and it results in an morphing blurry mess that looks like MPEG artefacts. How can I get the models to produce videos that only some objects move and the rest stays still like in the girl walks forrest example ? Render times are totally different: CogX takes 33 minutes and Pyr. only 11.5 minutes with the standard example settings. I don't know if I have update 2 with opendiff and nexfort installed. I guess not, that might be the cause.
I really hope these models get better on speed, I want to use them, but Jesus my 3060 HATES them. Your 6-8 min is really good.. I get 24 min for 5 seconds.. like damn...
@@aivideos322 all new DiT Video Models that so called open source run in local are just prototype. The Transformer architecture first used by Google, its built for big tech servers.
Have you played with the new Tora option? This is great fun, with CogVideoX, you can sketch out the movement using Spline, and the person (or other feature) follows that pattern.
i dont know but i use simple video workfloiw with svd xl and xl1.1 and with rtx3080 make me video in a 2.30m and all videos are better like you got out with this cog and .... so ....
Both are still imperfect. Pyramid Flow can handle non-human content, but hands remain a mystery. It’s not a question of too many digits, they just morph horribly.
@@TheFutureThinker Text to image was pretty bad at first too. I'm hyped to see local models AT ALL even if they're still very glitchy. It puts AI tools in the hands of the people, not just those with limited access to the very high end server cards.
CogVideoX vs Pyramid Flow AI Video Model Article : thefuturethinker.org/ai-video-showdown-pyramid-flow-vs-cogvideox-on-comfyui/
Can you tell what system with GPU you used locally.
ASUS TUF Gaming GeForce RTX™ 4090 OG amzn.to/3C04bHC
Cog video is the clear winner. But with Pyramid flow, I can see what it's trying to achieve. Its model is prioritized around less consistency and more complexity around motion. Cog video X, has an issue with consistent body proportions, and off pan into frame scenarios. It's like you said, these models were trained using the SD3 model. These models should have been trained with pony for animation, or flux for real life.
Pyramid Flow, if train with Pony it might be totally different story today.
Thanks for a comprehensive initial comparison, I was just wondering which one I should use to make content!
Glad it was helpful!
Getting some good results changing sampler to LCM - 15 steps on Cog - results in 4.5 minutes!
This comment left is encouraging about Pyramid Flow: "feifeiobama commented 3 days ago
We are working on a new model checkpoint trained from scratch (instead of using the SD3 weight initialization). It has shown much improvement in human faces and bodies. Please stay tuned."
THank you so much. I was just about to set up Pyramid and then you help me a lot with this video.
Very good comparison. thank you. CogVideoX is the winner for now. Glad to learn.
And with Controlnet able to control the action. It is what I was looking for in DiT video model
Is it simple to use and install what type of hardware do we need ?
my tests indicate cogvideox wins most of the time. plus, you can use LCM and cut creation times down to 2.5 mins per clip - detailed prompts are important too
yes, the training AI image base model does matter a lot.
You can use LCM... ? really?
@@aivideos322 yes jusr found it... just update your comfyui nodes and select it as a scheduler . It's as simple as that.
Thanks for the Local Video content. I love locally run AI tools that arern't requiring a cloud or subscription. The 5090 with 32gb is going to keep moving the VRAM standards up and up as well, which I'm ok with, if the tools are good enough.
By now, I'm beginning to think that Pyramid Flow is a waste of everyone's time.
I even noticed significant degradation of quality in the Pyramid Flow outputs, especially 10-second videos, where the image turns to garbage in the final frames. I wonder if I'm doing something wrong?
No you are not wrong. It happen on my generate videos too. Last 1-3 seconds always F ed up.
@@TheFutureThinker Have you tried loading Pyramid Flow in 32fp mode? I suspect that the bf16 mode might be the reason for the quality degradation (lower precision). But I can't test it for myself: if the model overflows out of the VRAM and in the shared memory space, things slow down so much that I don't even know of the program is running.
However...
I don't know if it's even worth testing at this point.
I wonder if Kuaishou is going to release a better model instead.
Thank you for the video and workflow. I have problems with Pyramid. After 2 seconds the scenes go wild and unusable. The start from a image2video is good. With CogX I2V I still could not get the results like you had. I would be happy with just a bit movement in the videos but mostly the output is trying to animated everthing and it results in an morphing blurry mess that looks like MPEG artefacts.
How can I get the models to produce videos that only some objects move and the rest stays still like in the girl walks forrest example ?
Render times are totally different: CogX takes 33 minutes and Pyr. only 11.5 minutes with the standard example settings.
I don't know if I have update 2 with opendiff and nexfort installed. I guess not, that might be the cause.
I really hope these models get better on speed, I want to use them, but Jesus my 3060 HATES them. Your 6-8 min is really good.. I get 24 min for 5 seconds.. like damn...
@@aivideos322 all new DiT Video Models that so called open source run in local are just prototype. The Transformer architecture first used by Google, its built for big tech servers.
Thanks for the comparison, very helpful!
Can you make video how to use CogvideoX Factory? You can train you own Video Model with it!!
Have you played with the new Tora option? This is great fun, with CogVideoX, you can sketch out the movement using Spline, and the person (or other feature) follows that pattern.
please make a comparison between Cogvideox 5B i2v and Cogvideox Fun-V1.1-5b-InP
These are still gonna be fun for hallucinating. Might make some classic memes. LOL Thanks for the vids.
Oh yes forgot the Will Smith eating spaghetti. Will do that one on X
thanks, I don't bother to test Pyramid Flow then
i dont know but i use simple video workfloiw with svd xl and xl1.1 and with rtx3080 make me video in a 2.30m and all videos are better like you got out with this cog and .... so ....
Both are still imperfect. Pyramid Flow can handle non-human content, but hands remain a mystery. It’s not a question of too many digits, they just morph horribly.
Yes, so i want to show people don't get hype up, " oh ! Open source! Running local, NSFW AI video" etc...
@@TheFutureThinker Text to image was pretty bad at first too. I'm hyped to see local models AT ALL even if they're still very glitchy. It puts AI tools in the hands of the people, not just those with limited access to the very high end server cards.
What, 68 minutes for a 5 sec. video? Pfff... this is unusable in that state.