3D Gaussian Splatting

แชร์
ฝัง
  • เผยแพร่เมื่อ 19 ม.ค. 2025

ความคิดเห็น • 38

  • @radubenea9343
    @radubenea9343 ปีที่แล้ว +40

    The color of each gaussian is not constant as noted @ 1:21:00, but different depending on view direction. Spherical harmonics are like fourier decompositions but on a spherical domain, with more basis functions giving more detailed spherical images. This is what lets them model the surface lighting response and is why that shiny TV screen is possible: those "glossy" screen gaussians change color depending on view direction faster than more "diffuse" gaussians, based on their SH coefficients

    • @彭鲲
      @彭鲲 ปีที่แล้ว

      Hi, I stiff confused with their splat renderer, Is there any literature to blog to learn it? This work's loss function didn't contain variables like R, S, or opacity ,so How the grad descent work which lets the gradient of opacity and those variables backpropagation and optimizes themselves?

    • @bookerml8502
      @bookerml8502 ปีที่แล้ว

      hello,why the color of gaussian is not constant? I read the "EWA splatting" and the assumation is that the color is constant.

    • @retsu-h6460
      @retsu-h6460 ปีที่แล้ว +1

      @@彭鲲 hi, did you find any source for the backprop in 3D Gaussian?

    • @vizzyb8400
      @vizzyb8400 10 หลายเดือนก่อน

      @@彭鲲 the equations are written in the paper page 12-14

    • @yanisfalaki
      @yanisfalaki 6 หลายเดือนก่อน

      @@retsu-h6460 Hey did you end up finding anything as well?

  • @radubenea9343
    @radubenea9343 ปีที่แล้ว +19

    Your comments at 1:46:10 need clarification: rasterisation only happens front-to-back, often not reaching the back once saturated, as you correctly noted. The tile rasterization threads still need to know when to stop loading gaussians and that's why they're initialized with both the first and last entry, but that's all this last entry is used for. Loading all gaussians that might affect a tile is faster if it's done in one go at the start, even if the back gaussians end up not contributing much, because of memory latency and shared access by all the pixels in the tile.

  • @Scaryder92
    @Scaryder92 ปีที่แล้ว +5

    Thank you so much, this was amazing! I really hope you keep releasing these great and in-depth videos :)

  • @radubenea9343
    @radubenea9343 ปีที่แล้ว +19

    1:50:34 but they do move gaussians around, their positions are optimized parameters. The tile+depth sorting must happen once per rasterization frame anyway, since depth, camera frustum and tile boundaries are all view dependent. It's smart to pack the tile info in the high bits and the depth data in the low bits, that way once the radix sort is done based on this composite number they end up with the gaussians sorted first by tile, then by depth. I don't think they use hash tables or any other indirection, the sorting step moves the actual data around based on this tile+depth "key" and groups it in one contiguous memory block. This lets the rasterizer avoid dependent reads which are super slow: all it needs are the start/end indices of each tile.
    The strategy of cloning a gaussian vs just moving it is probably about creating new degrees of freedom close to where they're needed and initializing them with reasonably good guesses. The paper describes the optimization process as starting out with a low-quality set of gaussians that are coarsely optimized using low-res versions of the training dataset, pruned aggressively, and then cloned and optimized further using the full-res images. Presumably this leads to faster convergence, since most gaussians start out inheriting the parameters fitted to their parents and only have to explore a small volume of the optimization space.

    • @hu-po
      @hu-po  ปีที่แล้ว +8

      These are some good high quality comments, ty for the clarifications!

    • @retsu-h6460
      @retsu-h6460 ปีที่แล้ว

      Doesn't the gradient descent only optimize Σ via R and S, which is the direction and magnitude of stretch of a 3D Gaussian? While the position is conditioned by 𝜇.

  • @radubenea9343
    @radubenea9343 ปีที่แล้ว +20

    Regarding the "sketchy" discard of the 3rd covariance matrix row and column at 1:09:07, they discard this information after the view transform has been applied, as noted by the apostrophe decorating the capital sigma (easy to miss). After the camera transform, the third dimension presumably only describes how stretched the gaussian is in screen depth, which is probably how they get away with this.

    • @dialectricStudios
      @dialectricStudios 10 หลายเดือนก่อน +1

      Is that similar to the idea of PCA to reduce the dimensionality of the matrix to make the optimization easier?

  • @radubenea9343
    @radubenea9343 ปีที่แล้ว +15

    2:09:30 they're also crushing it on framerate: 100+ fps across the board. If other techniques weren't so slow at scene reconstructions this might be less important, but right now this method seems like the only truly interactive solution.
    The gaussian representations also looks reasonably easy to edit manually after training, a major advantage of explicit representations in general, and very important once real-world datasets and potential real world applications are taken into account. It's a shame that the lighting has to be baked in.

  • @davidshavin1998
    @davidshavin1998 ปีที่แล้ว

    This is great! Thank you very much! The best part about it is seeing how professionals read papers, which part are they focusing on and how they use the web for help

  • @THEMATT222
    @THEMATT222 10 หลายเดือนก่อน

    Thanks for the detailed explaination, I really appreciate it :)

  • @legreg
    @legreg ปีที่แล้ว +12

    That's not what "anisotropic" means in this context. The definition by Chatgpt is right "varies depending on the direction", but it refers to the shape of the gaussian blob. Instead of being a perfect sphere (isotropic) in all directions, the gaussian blob is elongated in 1 or 2 dimensions and not necessarily axis aligned (where the covariance terms come into play). They get higher quality by choosing to represent the scene with anisotropic gaussian blobs because those anistropic blobs "follow" the shape of the existing geometry better than if they were isotropic spheres, discs or circular surfels, like described in the previous literature.

    • @hu-po
      @hu-po  ปีที่แล้ว

      thanks for the clarification and insight!

    • @vizzyb8400
      @vizzyb8400 10 หลายเดือนก่อน

      yeah! this is written in the ablation part!

  • @raushanjoshi1384
    @raushanjoshi1384 6 หลายเดือนก่อน +1

    At @18:30 in the video, you were referring to methods which does not require camera positions. And you got to review/explain the paper on DUSt3R within a year. Looks like researchers are working well aligned to your suggestions :)

  • @jonatan01i
    @jonatan01i ปีที่แล้ว

    I'm so happy I found your channel!

  • @hanyanglee9018
    @hanyanglee9018 4 หลายเดือนก่อน

    Best paper reading video I've ever seen.

  • @NeonNewt9
    @NeonNewt9 ปีที่แล้ว +1

    Thanks for covering this! Very helpful.

  • @hanyanglee9018
    @hanyanglee9018 4 หลายเดือนก่อน

    About the position of cameras. A friend told me, it's possiblt to gan it.

  • @kekitech
    @kekitech 11 หลายเดือนก่อน

    The AI eye contact filter is freaking me out so much I can't follow the video without putting a window over it to hide it o_o

  • @NeoShameMan
    @NeoShameMan ปีที่แล้ว +1

    Seems like we can bring neuron for compression after the fact, my guess is that's what the next paper will do 😂

  • @lion87563
    @lion87563 11 หลายเดือนก่อน +1

    Thank you very much for your explanation. I really like and respect the thing that you explain some phrases using simple words. That helps of understanding whole paper. I think some scientific words should always be explained especially for people whose english is not first language.

  • @EveBatStudios
    @EveBatStudios ปีที่แล้ว

    I really hope this gets picked up and adopted quickly by companies that are training 3-D generation on nerfs. The biggest issue I’m seeing is resolution. I imagine this is what they were talking about coming in the next update with imagine 3D. Fingers crossed that would be insane.

  • @galileo3431
    @galileo3431 ปีที่แล้ว +1

    Wow! Thank you for that great deep dive! 🙏🏼 I was wondering if it was possible to also retrieve depth maps, once the training is done - do you have any clue on that?

    • @hu-po
      @hu-po  ปีที่แล้ว +1

      You could get a depth map based on the positions of the splats relative to the camera view

  • @朱荣坤
    @朱荣坤 6 หลายเดือนก่อน

    It seems like you are also the first time reading this article. So some of your points are inaccurate. It does cause a lot of trouble when I try to reproduce this article. But all in all, we indeed need these kinds of videos. maybe prepared well next time.

  • @太郎田中-i2b
    @太郎田中-i2b 9 หลายเดือนก่อน

    58:32 overview

  • @lion87563
    @lion87563 11 หลายเดือนก่อน

    And also, it would be good idea if you could read some survey papers, because it will give us more basic understanding of main concepts

  • @w000w00t
    @w000w00t ปีที่แล้ว +1

    ANy chance you can do a walkthrough on the install? :) It's a bit arcane for me hehe

  • @virtualorganics
    @virtualorganics ปีที่แล้ว

    Not ‘fast’ but *Flash* attention. 28:14