Jia-Bin Huang
Jia-Bin Huang
  • 51
  • 527 407
How I Understand Flow Matching
Flow matching is a new generative modeling method that combines the advantages of Continuous Normalising Flows (CNFs) and Diffusion Models (DMs).
In this tutorial, I share my understanding of the basics of flow matching and provide an overview of how these ideas evolve over time.
Check out the resources below to learn more about this topic.
===== Paper/blog survey =====
[Papamakarios et al. 2021] Normalizing flows for probabilistic modeling and inference arxiv.org/abs/1912.02762
[Kobyzev et al. 2020] Normalizing Flows: An Introduction and Review of Current Methods arxiv.org/abs/1908.09257
[Tor Fjelde et al. 2024] An Introduction to Flow Matching
mlg.eng.cam.ac.uk/blog/2024/01/20/flow-matching.html
===== Research talks =====
[Yaron Lipman] Flow Matching: Simplifying and Generalizing Diffusion Models
th-cam.com/video/5ZSwYogAxYg/w-d-xo.html
[Michael S Albergo] Building Normalizing Flows with Stochastic Interpolants
th-cam.com/video/cejbXob8rvE/w-d-xo.html
[Alex Tong] Conditional Flow Matching
th-cam.com/video/AfKhr89RfpY/w-d-xo.html
Thumbnail background image credit: unsplash.com/photos/a-close-up-of-a-white-wall-with-wavy-lines-75xPHEQBmvA
มุมมอง: 699

วีดีโอ

3D Texture Made Easy
มุมมอง 64914 วันที่ผ่านมา
Introducing TextureDreamer! TextureDreamer transfers textures from a few images to arbitrary 3D shapes. Excited about democratizing 3D content creation! TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion Yu-Ying Yeh, Jia-Bin Huang, Changil Kim, Lei Xiao, Thu Nguyen-Phuoc, Numair Khan, Cheng Zhang, Manmohan Chandraker, Carl S Marshall, Zhao Dong, and Zhengqin Li IEEE...
How We Can Convert Any Videos to 3D
มุมมอง 1.7K28 วันที่ผ่านมา
Videos are windows to another world. But the videos today are *flat*, confined to the original viewpoints. We showcase a method for converting any 2D videos into 3D videos that allow free-view synthesis. Fast View Synthesis of Casual Videos Yao-Chih Lee, Zhoutong Zhang, Kevin Blackburn-Matzen, Simon Niklaus, Jianming Zhang, Jia-Bin Huang, and Feng Liu arXiv preprint 2023 📝 Paper: arxiv.org/abs/...
How Do Computers See Motion? Lucas-Kanade Method Explained
มุมมอง 9072 หลายเดือนก่อน
How can machines perceive the dynamic world around us? In this video, we discuss an influential Lucas-Kanade tracking method. The core algorithm and its variants are used in a wide variety of computer vision applications. Stay until the end to learn about the inspiring story behind this seminal paper! Reference: - Bruce D Lucas and Takeo Kanade, An iterative image registration technique with an...
What are Good Features to Track? Shi-Tomasi Corner Detector Explained
มุมมอง 7522 หลายเดือนก่อน
Identifying reliable features for tracking is an important step for many computer vision systems, including video stabilization, object tracking, and simultaneous localization and mapping (SLAM). This video covers the basics of corner detection algorithms. References: Jianbo Shi and Carlo Tomasi, Good Features to Track, CVPR 1994 C Harris, M Stephens, A combined corner and edge detector, Alvey ...
How does OpenAI's Sora work?
มุมมอง 48K3 หลายเดือนก่อน
OpenAI presents Sora, a text-to-video model for generating high-quality video from text prompts. In this video, we explain a high-level overview of how Sora works.
Removing Unwanted Objects in Videos
มุมมอง 5894 หลายเดือนก่อน
Flow-edge Guided Video Completion Chen Gao, Ayush Saraf, Jia-Bin Huang, and Johannes Kopf ECCV 2020 📝 Paper: www.chengao.vision/FGVC/files/FGVC.pdf 🌐 Website: www.chengao.vision/FGVC/ Abstract: We present a new flow-based video completion algorithm. Previous flow completion methods are often unable to retain the sharpness of motion boundaries. Our method first extracts and completes motion edge...
Compositional Text-to-Image Generation Made Easy
มุมมอง 1K4 หลายเดือนก่อน
Fast View Synthesis of Casual Videos Yao-Chih Lee, Zhoutong Zhang, Kevin Blackburn-Matzen, Simon Niklaus, Jianming Zhang, Jia-Bin Huang, and Feng Liu arXiv 2023 📝 Paper: arxiv.org/abs/2312.02135 🌐 Website: casual-fvs.github.io/ Abstract: Novel view synthesis from an in-the-wild video is difficult due to challenges like scene dynamics and lack of parallax. While existing methods have shown promi...
How I Understand Diffusion Models
มุมมอง 19K4 หลายเดือนก่อน
Diffusion models are powerful generative models that enable many successful applications like image, video, and 3D generation from texts. In this tutorial, I share my understanding of the diffusion model basics, including training, guidance, resolution, and speed. Below are some other great resources to learn more about diffusion models. Slides Here are the slides used in this video Training: b...
3D Human Digitization from a Single Image!
มุมมอง 32K6 หลายเดือนก่อน
Single-Image 3D Human Digitization with Shape-Guided Diffusion Badour AlBahar, Shunsuke Saito, Hung-Yu Tseng, Changil Kim, Johannes Kopf, and Jia-Bin Huang ACM SIGGRAPH Asia 2023 📝 Paper: human-sgd.github.io/ 🌐 Website: human-sgd.github.io/ 💻 Code: human-sgd.github.io/ Abstract: We present an approach to generate a 360-degree view of a person with a consistent, high-resolution appearance from a...
AI 3D Generation, explained
มุมมอง 8K6 หลายเดือนก่อน
3D generation (text-to-3D, single-image to 3D) has significantly progressed this year. This video tries to summarize several ideas behind this technology and recent trends. The papers discussed in the video: • DreamFusion dreamfusion3d.github.io/ • Magic3D research.nvidia.com/labs/dir/magic3d/ • Fantasia3D fantasia3d.github.io/ • DreamGaussian dreamgaussian.github.io/ • RealFusion lukemelas.git...
Visualizing Climate Change Impacts
มุมมอง 1.1K8 หลายเดือนก่อน
ClimateNeRF: Extreme Weather Synthesis in Neural Radiance Field Yuan Li, Zhi-Hao Lin, David Forsyth, Jia-Bin Huang, and Shenlong Wang International Conference on Computer Vision (ICCV), 2023 📝 Paper: arxiv.org/abs/2211.13226 🌐 Website: climatenerf.github.io/ 💻 Code: github.com/y-u-a-n-l-i/Climate_NeRF 📄 Abstract: Physical simulations produce excellent predictions of weather effects. Neural radi...
Expressive Text-to-Image with Rich Text
มุมมอง 6K8 หลายเดือนก่อน
Expressive Text-to-Image Generation with Rich Text Songwei Ge, Taesung Park, Jun-Yan Zhu, and Jia-Bin Huang International Conference on Computer Vision (ICCV), 2023 📝 Paper: arxiv.org/abs/2304.06720 🌐 Website: rich-text-to-image.github.io/ 💻 Code: github.com/SongweiGe/rich-text-to-image 🤗 Demo: huggingface.co/spaces/songweig/rich-text-to-image 🖥️ A1111 extension: github.com/songweige/sd-webui-r...
Seeing Subtle Motion in 3D
มุมมอง 9309 หลายเดือนก่อน
3D Motion Magnification: Visualizing Subtle Motions with Time-Varying Radiance Fields Brandon Y. Feng* (University of Maryland College Park), Hadi Alzayer* (University of Maryland College Park), Michael Rubinstein (Google Research), William T. Freeman (Massachusetts Institute of Technology, Google Research), and Jia-Bin Huang (University of Maryland College Park) * Equal contributions Internati...
Immersive 3D Video is Coming
มุมมอง 1.3K11 หลายเดือนก่อน
HyperReel: High-Fidelity 6-DoF Video with Ray-Conditioned Sampling Benjamin Attal, Jia-Bin Huang, Christian Richardt, Michael Zollhöfer, and Johannes Kopf, Matthew O'Toole, and Changil Kim IEEE/CFV Conference on Computer Vision and Pattern Recognition (CVPR), 2023 (Highlight ⭐⭐⭐) 📝 Paper: arxiv.org/abs/2301.02238 🌐 Website: hyperreel.github.io/ 💻 Code: github.com/facebookresearch/hyperreel 📄 Ab...
Immersive 3D Rendering from Casual Videos
มุมมอง 17K11 หลายเดือนก่อน
Immersive 3D Rendering from Casual Videos
Step into the World from a Single Image
มุมมอง 2.1Kปีที่แล้ว
Step into the World from a Single Image
Out-of-focus Photos No More
มุมมอง 1.2Kปีที่แล้ว
Out-of-focus Photos No More
Pose with Style: Detail-Preserving Pose-Guided Image Synthesis with Conditional StyleGAN
มุมมอง 6K2 ปีที่แล้ว
Pose with Style: Detail-Preserving Pose-Guided Image Synthesis with Conditional StyleGAN
Learning to See Through Obstructions with Layered Decomposition
มุมมอง 5652 ปีที่แล้ว
Learning to See Through Obstructions with Layered Decomposition
Hybrid Neural Fusion for Full-frame Video Stabilization
มุมมอง 6K3 ปีที่แล้ว
Hybrid Neural Fusion for Full-frame Video Stabilization
[ECCV 2018] Learning Blind Video Temporal Consistency
มุมมอง 4733 ปีที่แล้ว
[ECCV 2018] Learning Blind Video Temporal Consistency
[ECCV 2020] Flow-edge Guided Video Completion
มุมมอง 2.4K3 ปีที่แล้ว
[ECCV 2020] Flow-edge Guided Video Completion
Martin Luther King, Jr.'s "I Have A Dream" Speech - 3D Photo Inpainting
มุมมอง 1763 ปีที่แล้ว
Martin Luther King, Jr.'s "I Have A Dream" Speech - 3D Photo Inpainting
[CVPR 2020] 3D Photography using Context-aware Layered Depth Inpainting
มุมมอง 6833 ปีที่แล้ว
[CVPR 2020] 3D Photography using Context-aware Layered Depth Inpainting
[CVPR 2020] Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline
มุมมอง 1.4K3 ปีที่แล้ว
[CVPR 2020] Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline
[CVPR 2020] Learning to See Through Obstructions
มุมมอง 1.2K3 ปีที่แล้ว
[CVPR 2020] Learning to See Through Obstructions
Virtual Dolly Zoom Effects - 3D Photo Inpainting
มุมมอง 20K4 ปีที่แล้ว
Virtual Dolly Zoom Effects - 3D Photo Inpainting
2.5D Motion Parallax - 3D Photo Inpainting
มุมมอง 30K4 ปีที่แล้ว
2.5D Motion Parallax - 3D Photo Inpainting
Comparisons with Multi-plane Image based Methods - 3D Photo Inpainting
มุมมอง 13K4 ปีที่แล้ว
Comparisons with Multi-plane Image based Methods - 3D Photo Inpainting

ความคิดเห็น

  • @julienblanchon6082
    @julienblanchon6082 10 ชั่วโมงที่ผ่านมา

    This is brilliant !

    • @jbhuang0604
      @jbhuang0604 7 ชั่วโมงที่ผ่านมา

      Glad that you enjoyed the video!

  • @jackshi7613
    @jackshi7613 11 ชั่วโมงที่ผ่านมา

    excellent video!

    • @jbhuang0604
      @jbhuang0604 8 ชั่วโมงที่ผ่านมา

      Thanks for watching!

  • @DimitrivonRutte
    @DimitrivonRutte 12 ชั่วโมงที่ผ่านมา

    Awesome to see easy-to-understand explanations of current research topics, keep up the great work!

    • @jbhuang0604
      @jbhuang0604 8 ชั่วโมงที่ผ่านมา

      Glad you liked it!

  • @catherineyang5199
    @catherineyang5199 14 ชั่วโมงที่ผ่านมา

    Thank you for the video! This is the most clear explanation of flow matching on the internet ❤

    • @jbhuang0604
      @jbhuang0604 11 ชั่วโมงที่ผ่านมา

      Thank you so much for your kind words!

  • @r00t257
    @r00t257 16 ชั่วโมงที่ผ่านมา

    Legend comeback 🙇! Your educational video is worth more than gold.💓🙏

    • @jbhuang0604
      @jbhuang0604 16 ชั่วโมงที่ผ่านมา

      Thanks a lot! Glad you like it!

  • @SurajBorate-bx6hv
    @SurajBorate-bx6hv วันที่ผ่านมา

    Thankyou for great step by step explanation. Can you share any good resources and insights for implementing diffusion for own custom images?

  • @pi5549
    @pi5549 10 วันที่ผ่านมา

    Any arxiv paper with a video like this goes above any without, on my TO_READ list at least. +1

    • @jbhuang0604
      @jbhuang0604 10 วันที่ผ่านมา

      Thanks for your interest! Let us know if you have any questions!

  • @crispinotechgaming
    @crispinotechgaming 13 วันที่ผ่านมา

    Honestly it'd be really nice to see the open source community catch up to this scale of operation one day! They are the backbone of ai progress but they rarely manage to innovate with actual public models to use

    • @jbhuang0604
      @jbhuang0604 11 วันที่ผ่านมา

      I completely agree. Most of these models are close-source so it’s mainly showcasing their R&D capability but as you said, public don’t benefit much from these. I also hope the open source community can catch up soon.

  • @arashnozari4042
    @arashnozari4042 13 วันที่ผ่านมา

    imagine it in the next 5 years

    • @jbhuang0604
      @jbhuang0604 11 วันที่ผ่านมา

      Yup, the rate of progress is incredible. Can’t imagine what this will look like…

  • @4thlord51
    @4thlord51 17 วันที่ผ่านมา

    I'm building my own diffusion model myself. This is the best breakdown and visualization of the mathematics and implementation. Well done.

    • @jbhuang0604
      @jbhuang0604 17 วันที่ผ่านมา

      Thank you! This comment just made my day!

  • @Yenrabbit
    @Yenrabbit 17 วันที่ผ่านมา

    Very cool work, and top notch video with the little sound effects etc :D

    • @jbhuang0604
      @jbhuang0604 17 วันที่ผ่านมา

      Thanks! I also enjoyed these little sound effects. Pika pika~!

  • @youtube_showcase
    @youtube_showcase 17 วันที่ผ่านมา

    Exciting work. Thank you for creating and sharing this explanation video.

    • @jbhuang0604
      @jbhuang0604 17 วันที่ผ่านมา

      Yes, we are excited as well. Glad you enjoyed the video!

  • @mcarletti
    @mcarletti 24 วันที่ผ่านมา

    My like comes with the 5th Symphony (9:39) 😸🎶

    • @jbhuang0604
      @jbhuang0604 24 วันที่ผ่านมา

      Oh My! Finally one person noticed that! (Spent a lot of time making that lol)

  • @khalilsabri7978
    @khalilsabri7978 28 วันที่ผ่านมา

    Just one minute in the video, you know it's extremely well done. Thanks for the video !

    • @jbhuang0604
      @jbhuang0604 27 วันที่ผ่านมา

      Glad you liked it! Thanks so much for the comment!

  • @importon
    @importon 28 วันที่ผ่านมา

    very cool! will you be putting the code on github?

    • @jbhuang0604
      @jbhuang0604 27 วันที่ผ่านมา

      We are working on getting approval. The process is complex but we are hopeful.

  • @TayuYoung
    @TayuYoung หลายเดือนก่อน

    Hi Professor, thank you for your explanation. However, I think that at 1:03 in the video, the up-sampling mechanism for image is performed by the 'decoder', not by the diffusion model. The animation here seems to suggest that the diffusion model produces the high-resolution images. Thanks for your time.

    • @jbhuang0604
      @jbhuang0604 หลายเดือนก่อน

      Sorry for the confusion. I introduced two mechanisms for high-resolution generation: 1) cascade diffusion models and 2) latent diffusion models. In cascade-based approaches, the upsampling is done via a super-resolution diffusion model. The model in Sora is likely using only a video decoder that upsampling the denoised clean latent to high-resolution images/videos.

    • @TayuYoung
      @TayuYoung หลายเดือนก่อน

      ​@@jbhuang0604 Thanks for your explanation. I checked the IMAGEN paper, they use the text2image diffusion model and the SR-resolution diffusion model to produce the high-resolution image, which is the output of decoder in the latent diffusion model. Because I used to think that the main difference between cascade and latent diffusion model was just one uses the low-resolution image and the other uses the latent representation, with both employing an encoder-diffusion-decoder pipeline. In IMAGEN, it seems that the diffusion model can also serve in the 'decoder' role. Am I right?

  • @youtube_showcase
    @youtube_showcase หลายเดือนก่อน

    Amazing work! Thank you for sharing 😀

    • @jbhuang0604
      @jbhuang0604 หลายเดือนก่อน

      Thank you! Cheers!

  • @TechCindy
    @TechCindy หลายเดือนก่อน

    Amazing work!

    • @jbhuang0604
      @jbhuang0604 หลายเดือนก่อน

      Thank you!

  • @orisenbazuru
    @orisenbazuru หลายเดือนก่อน

    Great video! At 1:21 should be maximizing similarity between two distributions. Or minimizing the distance between two distributions.

    • @jbhuang0604
      @jbhuang0604 หลายเดือนก่อน

      Thanks for pointing this out! Yes, you are right! It should be *maximizing* the similarity between the two distributions.

  • @Raymond-zv5gr
    @Raymond-zv5gr หลายเดือนก่อน

    BRO YOU ARE EPIC

    • @jbhuang0604
      @jbhuang0604 หลายเดือนก่อน

      Thank you thank you!

  • @Beauty.and.FashionPhotographer
    @Beauty.and.FashionPhotographer หลายเดือนก่อน

    i wish this would be simpler. You know Notebook colab google and video which shows what to click etc etc....

    • @jbhuang0604
      @jbhuang0604 17 วันที่ผ่านมา

      That would definitely be great! We will be working on making this easier to use!

  • @ohjein
    @ohjein หลายเดือนก่อน

    Very good! But what surprises me is that pifu is still at the core. 4+ years and no better model has arrived? With all the developments? Anyway, great work.

    • @jbhuang0604
      @jbhuang0604 หลายเดือนก่อน

      Yes, we were surprised as well. We tried the recent ICON/ECON methods. These methods produce better shape reconstruction for challenging/uncommon poses (eg dancing). But ironically produce unnatural shape for nature poses (like the one shown in the video). For those natural poses, PiFU-HD still performs the best!

  • @truonggiangnguyen8844
    @truonggiangnguyen8844 หลายเดือนก่อน

    I have a question: Are all distribution mentioned is distribution of a continuous variable, since we're using integral here?

    • @jbhuang0604
      @jbhuang0604 หลายเดือนก่อน

      Good question! I think there are some development of discrete variational autoencoder and diffusion models. Those methods can deal with discrete variables.

  • @user-kq9cu8wy9z
    @user-kq9cu8wy9z หลายเดือนก่อน

    The world and my brain after this: 💀

    • @jbhuang0604
      @jbhuang0604 หลายเดือนก่อน

      Indeed!

  • @curiousobserver2006
    @curiousobserver2006 หลายเดือนก่อน

    seriously one of the best educational videos I've ever watched.

    • @jbhuang0604
      @jbhuang0604 หลายเดือนก่อน

      Thank you so much!

  • @nutshell1811
    @nutshell1811 2 หลายเดือนก่อน

    Best video on diffusion!!

    • @jbhuang0604
      @jbhuang0604 2 หลายเดือนก่อน

      Great! Glad that it’s helpful!

  • @rtluo1546
    @rtluo1546 2 หลายเดือนก่อน

    This is truly a great tutorial video, so well-made. Cannot believe covering so many things within only 17 minutes.

    • @jbhuang0604
      @jbhuang0604 2 หลายเดือนก่อน

      Thanks a lot! Happy that you enjoyed the video!

  • @wangy01
    @wangy01 2 หลายเดือนก่อน

    Thank you for your great work removing the need of the audience to know much prior knowledge before they could enjoy your video. For example, you mentioned maximum likelihood and explain what it is immediately. It is such a challenge to straighten all these in a 17-minute video, but you did a great work. Thank you!

    • @jbhuang0604
      @jbhuang0604 2 หลายเดือนก่อน

      Glad that you liked it! Appreciate your kind words! This made my day!

  • @vfhfnvecnfaby5362
    @vfhfnvecnfaby5362 2 หลายเดือนก่อน

    I LOVE U!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    • @jbhuang0604
      @jbhuang0604 17 วันที่ผ่านมา

      Thank you thank you!

  • @NobleSpartan
    @NobleSpartan 2 หลายเดือนก่อน

    Your production on these videos is incredible

    • @jbhuang0604
      @jbhuang0604 2 หลายเดือนก่อน

      Thanks so much for your kind words! Glad that you like it!

  • @MrNoipe
    @MrNoipe 2 หลายเดือนก่อน

    Awesome! Would also love a video on SLAM!

    • @jbhuang0604
      @jbhuang0604 2 หลายเดือนก่อน

      Yup! It’s a complex topic, but I will get to there!

  • @MrNoipe
    @MrNoipe 2 หลายเดือนก่อน

    The linear algebra portion went by very quickly for someone who hasn't worked with them in several years. I guess this video is more tailored for current researchers?

    • @jbhuang0604
      @jbhuang0604 2 หลายเดือนก่อน

      Ah, sorry about that! I probably should slow down a bit on those math derivations. Will do so in future videos!

  • @nikitadrobyshev7953
    @nikitadrobyshev7953 2 หลายเดือนก่อน

    OK, this is the best video explanation of diffusion models I saw. Ideal ratio between simplifications and depth☺👏

    • @jbhuang0604
      @jbhuang0604 2 หลายเดือนก่อน

      Glad it was helpful! Thank you so much for your kind words!

    • @wangy01
      @wangy01 2 หลายเดือนก่อน

      I agree. The author must have carefully chosen the most efficient way cutting into the complex concept hierarchy and every single word to achieve that efficiency.

  • @madhavkumar9942
    @madhavkumar9942 2 หลายเดือนก่อน

    Great video, Can you please explain how to get the 'sparse' folder used in your project for a video ( or a folder containing video frames )?

    • @jbhuang0604
      @jbhuang0604 2 หลายเดือนก่อน

      Thanks! It comes from the COLMAP preprocessing.

  • @yuelinxin3684
    @yuelinxin3684 2 หลายเดือนก่อน

    This looks like covariance matrix 🤔

    • @jbhuang0604
      @jbhuang0604 2 หลายเดือนก่อน

      Yes, second-moment matrix is a local covariance matrix of the gradient vector field. It captures the local image structure.

  • @mityashabat
    @mityashabat 2 หลายเดือนก่อน

    Great vid! Quick question - it wasn't clear how many of these second-order matrices we're making. One for each patch, right? A single one for the whole image wouldn't provide us with local information. The explanation confused me a bit Also, if I understand correctly, the intuition is that the second-order matrix helps us compute the curvature of the edge in the patch. And checking the eigenvalues provides us with info on that curvature. Is my mind in the right place? 😅

    • @jbhuang0604
      @jbhuang0604 2 หลายเดือนก่อน

      Thanks! For every pixel location, we will form a second-order matrix. So yes, it's one for each patch. The eigenvalues of the second-order matrix tell us how fast the summed square error (between the reference patch and the translated patch) will go up. The eigenvectors tell us where we should move to get the fastest or the slowest error changes. So, to find a corner, we look for patches with a *large smallest eigenvalue*. This means that the error still goes up quickly, even in the direction with the slowest change. That's the criterion for good features to track.

  • @pedroenriquelopezdeteruela6545
    @pedroenriquelopezdeteruela6545 2 หลายเดือนก่อน

    Awesome post, Jiang, thank you so much for the great job! Anyway, a small comment/question on your video (without too much importance, I assume). At minute 5:56 you comment that (direct derivation of formula (7) in the paper "Denoising Diffusion Probabilistic Models"), mu^hat_t(x_t,x_0) is on the line joining x_0 and x_t. And, while this is approximately true for "normal" beta_t scheduling, I think that the estimated mean as a function of x_0 and x_t need not be exactly on such a line since, in general, the respective multipliers of x_0 and x_t in such an equation need not (in general) add up to one. In fact, in "normal" scheduling, as t increases, it seems that this sum keeps progressively moving away from 1, so that although obviously mu_t will continue to be a simple linear combination of both x_t and x_0, the fact is that it will progressively move away (although by a small amount) from this line. Would you agree with this observation? Greetings, and again, congratulations for the video and thank you very much for clarifying us the inners of diffusion models!

    • @jbhuang0604
      @jbhuang0604 2 หลายเดือนก่อน

      Thank you so much for your comment! You are right! It won’t be on the line when the multipliers are not adding up to one.

  • @r00t257
    @r00t257 2 หลายเดือนก่อน

    You are a god of presenting things in such a concise and intuitive way! Thank you very much, Professor! It really, really helps.

    • @jbhuang0604
      @jbhuang0604 2 หลายเดือนก่อน

      Thanks so much!

  • @user-pk4yz7wn3s
    @user-pk4yz7wn3s 2 หลายเดือนก่อน

    BRAVO! No one ever have explained the diffusion model in such an easy way with all the details.

    • @jbhuang0604
      @jbhuang0604 2 หลายเดือนก่อน

      Thank you so much for your kind words! This makes my day!

  • @rasmuseriksson6805
    @rasmuseriksson6805 2 หลายเดือนก่อน

    Thanks for a great video! What are we talking in terms of space use of the material? All the diffrent samples i find is always extremely short. Or what is the limitations?

  • @dailymind77
    @dailymind77 3 หลายเดือนก่อน

    Matrix incoming

    • @jbhuang0604
      @jbhuang0604 2 หลายเดือนก่อน

      Indeed!!

  • @BoredT-Rex
    @BoredT-Rex 3 หลายเดือนก่อน

    Still have no clue. I don't think I ever will though.

    • @jbhuang0604
      @jbhuang0604 3 หลายเดือนก่อน

      You can do it bruh!

  • @ebrarbasyigit3234
    @ebrarbasyigit3234 3 หลายเดือนก่อน

    The cat photograph used in the video belongs to Tombili (chubby in Turkish), a street cat from Istanbul. Tombili was known for its stylish sitting pose

    • @jbhuang0604
      @jbhuang0604 3 หลายเดือนก่อน

      I love that cat! I think they even made a sculpture of Tombili after the cat died!

    • @ebrarbasyigit3234
      @ebrarbasyigit3234 3 หลายเดือนก่อน

      @@jbhuang0604 Yeah true, the statue was stolen and found after a month 😄

  • @coyi3884
    @coyi3884 3 หลายเดือนก่อน

    Like having your dad do you homework. Grow up and get other creating yourself. So many fake artists are incoming with this crap

    • @jbhuang0604
      @jbhuang0604 3 หลายเดือนก่อน

      I don’t think this will not entirely replace the human creativity. In fact, it will allow more people with limited artistic skills to express themselves visually.

  • @soulkill3579
    @soulkill3579 3 หลายเดือนก่อน

    Sora = THE MATRIX pre alpha version 0.1

    • @jbhuang0604
      @jbhuang0604 2 หลายเดือนก่อน

      The progress is incredible!

  • @SIDOLOMI
    @SIDOLOMI 3 หลายเดือนก่อน

    too much blah blah how can i make my video is the point

    • @jbhuang0604
      @jbhuang0604 3 หลายเดือนก่อน

      I hope we can have access soon! It’s super exciting!

  • @TheSonic1685
    @TheSonic1685 3 หลายเดือนก่อน

    So in other words you feed it copyrighted videos and it spits those copyrighted videos out again? So much for "AI"

    • @jbhuang0604
      @jbhuang0604 3 หลายเดือนก่อน

      I believe they would need to make sure that the training videos are either synthetically generated (e.g., via Unreal Engine) or licensed.

    • @radioreactivity3561
      @radioreactivity3561 3 หลายเดือนก่อน

      It won't generate the exact same video found in the training data.

  • @draken5379
    @draken5379 3 หลายเดือนก่อน

    It doesnt use keyframes for long video gen. They most likely train by cutting chunks of the space-time latent out, and having it predict the missing frames.

    • @jbhuang0604
      @jbhuang0604 3 หลายเดือนก่อน

      What I meant is that they likely used cascade diffusion method. The “keyframe” of course, is not RGB images. Sora demonstrated great results on interpolating two frames/videos. So I guess that’s probably how they handle long video generation.

  • @Mr.MawYT1498
    @Mr.MawYT1498 3 หลายเดือนก่อน

    When we can use this?

    • @jbhuang0604
      @jbhuang0604 3 หลายเดือนก่อน

      Hopefully soon! But OpenAI definitely needs to put many guardrail to avoid misuse of such technology.

  • @thelozer6311
    @thelozer6311 3 หลายเดือนก่อน

    Oh god this is horrible