CogVideoX vs Pyramid Flow AI Video Model Run Locally - Img2Vid Comparison Which Is Better?

แชร์
ฝัง
  • เผยแพร่เมื่อ 16 พ.ย. 2024

ความคิดเห็น • 36

  • @TheFutureThinker
    @TheFutureThinker  หลายเดือนก่อน +6

    CogVideoX vs Pyramid Flow AI Video Model Article : thefuturethinker.org/ai-video-showdown-pyramid-flow-vs-cogvideox-on-comfyui/

    • @synaestesia-bg3ew
      @synaestesia-bg3ew หลายเดือนก่อน

      Can you tell what system with GPU you used locally.

    • @TheFutureThinker
      @TheFutureThinker  หลายเดือนก่อน

      ASUS TUF Gaming GeForce RTX™ 4090 OG amzn.to/3C04bHC

  • @megamayo2500
    @megamayo2500 หลายเดือนก่อน +9

    Cog video is the clear winner. But with Pyramid flow, I can see what it's trying to achieve. Its model is prioritized around less consistency and more complexity around motion. Cog video X, has an issue with consistent body proportions, and off pan into frame scenarios. It's like you said, these models were trained using the SD3 model. These models should have been trained with pony for animation, or flux for real life.

    • @TheFutureThinker
      @TheFutureThinker  หลายเดือนก่อน +2

      Pyramid Flow, if train with Pony it might be totally different story today.

  • @synesthesiaharmonics
    @synesthesiaharmonics หลายเดือนก่อน +4

    Thanks for a comprehensive initial comparison, I was just wondering which one I should use to make content!

  • @geoffphillips5293
    @geoffphillips5293 27 วันที่ผ่านมา

    Getting some good results changing sampler to LCM - 15 steps on Cog - results in 4.5 minutes!

  • @geoffphillips5293
    @geoffphillips5293 27 วันที่ผ่านมา

    This comment left is encouraging about Pyramid Flow: "feifeiobama commented 3 days ago
    We are working on a new model checkpoint trained from scratch (instead of using the SD3 weight initialization). It has shown much improvement in human faces and bodies. Please stay tuned."

  • @electronicmusicartcollective
    @electronicmusicartcollective หลายเดือนก่อน

    THank you so much. I was just about to set up Pyramid and then you help me a lot with this video.

  • @ysy69
    @ysy69 หลายเดือนก่อน +1

    Very good comparison. thank you. CogVideoX is the winner for now. Glad to learn.

    • @TheFutureThinker
      @TheFutureThinker  หลายเดือนก่อน +1

      And with Controlnet able to control the action. It is what I was looking for in DiT video model

  • @HypnoticVocals
    @HypnoticVocals หลายเดือนก่อน

    Is it simple to use and install what type of hardware do we need ?

  • @christiandarkin
    @christiandarkin หลายเดือนก่อน +1

    my tests indicate cogvideox wins most of the time. plus, you can use LCM and cut creation times down to 2.5 mins per clip - detailed prompts are important too

    • @TheFutureThinker
      @TheFutureThinker  หลายเดือนก่อน +1

      yes, the training AI image base model does matter a lot.

    • @aivideos322
      @aivideos322 หลายเดือนก่อน

      You can use LCM... ? really?

    • @christiandarkin
      @christiandarkin 29 วันที่ผ่านมา

      @@aivideos322 yes jusr found it... just update your comfyui nodes and select it as a scheduler . It's as simple as that.

  • @rhadiem
    @rhadiem หลายเดือนก่อน

    Thanks for the Local Video content. I love locally run AI tools that arern't requiring a cloud or subscription. The 5090 with 32gb is going to keep moving the VRAM standards up and up as well, which I'm ok with, if the tools are good enough.

  • @Veselin_Angelov
    @Veselin_Angelov หลายเดือนก่อน +3

    By now, I'm beginning to think that Pyramid Flow is a waste of everyone's time.
    I even noticed significant degradation of quality in the Pyramid Flow outputs, especially 10-second videos, where the image turns to garbage in the final frames. I wonder if I'm doing something wrong?

    • @TheFutureThinker
      @TheFutureThinker  หลายเดือนก่อน +1

      No you are not wrong. It happen on my generate videos too. Last 1-3 seconds always F ed up.

    • @Veselin_Angelov
      @Veselin_Angelov หลายเดือนก่อน

      @@TheFutureThinker Have you tried loading Pyramid Flow in 32fp mode? I suspect that the bf16 mode might be the reason for the quality degradation (lower precision). But I can't test it for myself: if the model overflows out of the VRAM and in the shared memory space, things slow down so much that I don't even know of the program is running.
      However...
      I don't know if it's even worth testing at this point.
      I wonder if Kuaishou is going to release a better model instead.

  • @ScoobyToursXL
    @ScoobyToursXL หลายเดือนก่อน

    Thank you for the video and workflow. I have problems with Pyramid. After 2 seconds the scenes go wild and unusable. The start from a image2video is good. With CogX I2V I still could not get the results like you had. I would be happy with just a bit movement in the videos but mostly the output is trying to animated everthing and it results in an morphing blurry mess that looks like MPEG artefacts.
    How can I get the models to produce videos that only some objects move and the rest stays still like in the girl walks forrest example ?
    Render times are totally different: CogX takes 33 minutes and Pyr. only 11.5 minutes with the standard example settings.
    I don't know if I have update 2 with opendiff and nexfort installed. I guess not, that might be the cause.

  • @aivideos322
    @aivideos322 หลายเดือนก่อน +1

    I really hope these models get better on speed, I want to use them, but Jesus my 3060 HATES them. Your 6-8 min is really good.. I get 24 min for 5 seconds.. like damn...

    • @TheFutureThinker
      @TheFutureThinker  29 วันที่ผ่านมา

      @@aivideos322 all new DiT Video Models that so called open source run in local are just prototype. The Transformer architecture first used by Google, its built for big tech servers.

  • @AB-wf8ek
    @AB-wf8ek หลายเดือนก่อน

    Thanks for the comparison, very helpful!

  • @HolidayAtHome
    @HolidayAtHome 24 วันที่ผ่านมา

    Can you make video how to use CogvideoX Factory? You can train you own Video Model with it!!

  • @geoffphillips5293
    @geoffphillips5293 25 วันที่ผ่านมา

    Have you played with the new Tora option? This is great fun, with CogVideoX, you can sketch out the movement using Spline, and the person (or other feature) follows that pattern.

  • @kayTran-y1v
    @kayTran-y1v หลายเดือนก่อน

    please make a comparison between Cogvideox 5B i2v and Cogvideox Fun-V1.1-5b-InP

  • @insurancecasino5790
    @insurancecasino5790 หลายเดือนก่อน

    These are still gonna be fun for hallucinating. Might make some classic memes. LOL Thanks for the vids.

    • @TheFutureThinker
      @TheFutureThinker  หลายเดือนก่อน +1

      Oh yes forgot the Will Smith eating spaghetti. Will do that one on X

  • @xyzDist79
    @xyzDist79 7 วันที่ผ่านมา

    thanks, I don't bother to test Pyramid Flow then

  • @robertaopd2182
    @robertaopd2182 หลายเดือนก่อน +1

    i dont know but i use simple video workfloiw with svd xl and xl1.1 and with rtx3080 make me video in a 2.30m and all videos are better like you got out with this cog and .... so ....

  • @wonder111
    @wonder111 หลายเดือนก่อน

    Both are still imperfect. Pyramid Flow can handle non-human content, but hands remain a mystery. It’s not a question of too many digits, they just morph horribly.

    • @TheFutureThinker
      @TheFutureThinker  หลายเดือนก่อน +1

      Yes, so i want to show people don't get hype up, " oh ! Open source! Running local, NSFW AI video" etc...

    • @rhadiem
      @rhadiem หลายเดือนก่อน

      @@TheFutureThinker Text to image was pretty bad at first too. I'm hyped to see local models AT ALL even if they're still very glitchy. It puts AI tools in the hands of the people, not just those with limited access to the very high end server cards.

  • @AInfectados
    @AInfectados 14 วันที่ผ่านมา

    What, 68 minutes for a 5 sec. video? Pfff... this is unusable in that state.