NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (ML Research Paper Explained)

แชร์
ฝัง
  • เผยแพร่เมื่อ 4 มิ.ย. 2024
  • #nerf #neuralrendering #deeplearning
    View Synthesis is a tricky problem, especially when only given a sparse set of images as an input. NeRF embeds an entire scene into the weights of a feedforward neural network, trained by backpropagation through a differential volume rendering procedure, and achieves state-of-the-art view synthesis. It includes directional dependence and is able to capture fine structural details, as well as reflection effects and transparency.
    OUTLINE:
    0:00 - Intro & Overview
    4:50 - View Synthesis Task Description
    5:50 - The fundamental difference to classic Deep Learning
    7:00 - NeRF Core Concept
    15:30 - Training the NeRF from sparse views
    20:50 - Radiance Field Volume Rendering
    23:20 - Resulting View Dependence
    24:00 - Positional Encoding
    28:00 - Hierarchical Volume Sampling
    30:15 - Experimental Results
    33:30 - Comments & Conclusion
    Paper: arxiv.org/abs/2003.08934
    Website & Code: www.matthewtancik.com/nerf
    My Video on SIREN: • SIREN: Implicit Neural...
    Abstract:
    We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location (x,y,z) and viewing direction (θ,ϕ)) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.
    Authors: Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng
    Links:
    TabNine Code Completion (Referral): bit.ly/tabnine-yannick
    TH-cam: / yannickilcher
    Twitter: / ykilcher
    Discord: / discord
    BitChute: www.bitchute.com/channel/yann...
    Minds: www.minds.com/ykilcher
    Parler: parler.com/profile/YannicKilcher
    LinkedIn: / yannic-kilcher-488534136
    BiliBili: space.bilibili.com/1824646584
    If you want to support me, the best thing to do is to share out the content :)
    If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
    SubscribeStar: www.subscribestar.com/yannick...
    Patreon: / yannickilcher
    Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
    Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
    Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
    Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 145

  • @YannicKilcher
    @YannicKilcher  3 ปีที่แล้ว +23

    OUTLINE:
    0:00 - Intro & Overview
    4:50 - View Synthesis Task Description
    5:50 - The fundamental difference to classic Deep Learning
    7:00 - NeRF Core Concept
    15:30 - Training the NeRF from sparse views
    20:50 - Radiance Field Volume Rendering
    23:20 - Resulting View Dependence
    24:00 - Positional Encoding
    28:00 - Hierarchical Volume Sampling
    30:15 - Experimental Results
    33:30 - Comments & Conclusion

    • @G12GilbertProduction
      @G12GilbertProduction 3 ปีที่แล้ว

      Captions, dear bud'! Caaaaaaaaptions!

    • @jimj2683
      @jimj2683 ปีที่แล้ว

      Imagine this, but using an AI model that is trained on vast amounts of 3d data from the real world. It would be able to fill in the gaps with all the experience it has much more accurately.

    • @THEMATT222
      @THEMATT222 8 หลายเดือนก่อน

      Noice 👍

  • @nitisharora41
    @nitisharora41 3 วันที่ผ่านมา

    Thanks for creating such a detailed video on NerF

  • @thierrymilard1544
    @thierrymilard1544 3 ปีที่แล้ว +30

    First time I really truely understand NERF. Wonderfull simple explanation. Thanks a lot !

  • @TheDukeGreat
    @TheDukeGreat 3 ปีที่แล้ว +57

    Wait, did Yannic just release a review of a paper I already read? So proud of myself :D

    • @NikolajKuntner
      @NikolajKuntner 3 ปีที่แล้ว

      Incidentally, same.

    • @Kolopuster
      @Kolopuster 3 ปีที่แล้ว +5

      It's even my thesis subject :O

    • @DMexa
      @DMexa ปีที่แล้ว

      So proud of Yaniic bro, he is sharing this awasome knowledge!

  • @Jianju69
    @Jianju69 ปีที่แล้ว +8

    This type of pre-digestion for a complex technical paper is very expedient. Thank you.

  • @laurentvit3117
    @laurentvit3117 6 วันที่ผ่านมา

    Duuuude i'm learning NERF, and this video is a jewel, thank you!

  • @gravkint8376
    @gravkint8376 ปีที่แล้ว +1

    Gotta present this paper for a seminar at uni so this video makes it so much easier. Thank you so much for this!

  • @user-gq7yn3li9g
    @user-gq7yn3li9g ปีที่แล้ว +1

    Man you got many clear notes to explained papers. I got tons of helps from your videos.

  • @howdynamic6529
    @howdynamic6529 ปีที่แล้ว +1

    Thank you for the clear-cut and thorough explanation! I was able to follow and that is definitely saying something because I come from a different world, model-based controls :)

  • @adriansalazar8303
    @adriansalazar8303 ปีที่แล้ว +3

    One of the best NeRF explanations available. Thank you so much, it helped a lot.

  • @aayushlamichhane
    @aayushlamichhane ปีที่แล้ว +3

    Awesome explanation! Please dont stop making these.

  • @ieatnoodls
    @ieatnoodls 2 ปีที่แล้ว

    Thanks to your markings and visualization I can understand a lot more than I could on my own :D

  • @michaellellouch3682
    @michaellellouch3682 2 ปีที่แล้ว

    Superbly explained, thanks!

  • @peter5470
    @peter5470 4 หลายเดือนก่อน

    My guy, this has to be the best tutorial on NeRF I've seen, finally understood everything

  • @NoobMLDude
    @NoobMLDude ปีที่แล้ว

    Thanks for the Great explanation. Finally understand the central ideas behind NeRF.

  • @shempincognito4401
    @shempincognito4401 2 ปีที่แล้ว

    Awesome explanation! Thanks for the video.

  • @user-py3cp5sk1e
    @user-py3cp5sk1e 2 ปีที่แล้ว

    非常细致的讲解,thanks to you!

  • @user-kl6wk9yi2f
    @user-kl6wk9yi2f 2 ปีที่แล้ว

    This video helps a lot for some fresher like me to understand NeRF, thanks!

  • @hbedrix
    @hbedrix ปีที่แล้ว

    awesome video! Really appreciate you doing this!

  • @kameelamareen
    @kameelamareen 7 หลายเดือนก่อน +1

    Beautiful and Super Intuitive video ! Thanks :3

  • @tnmygrwl
    @tnmygrwl 3 ปีที่แล้ว +2

    Had been waiting for this for a while now. 🔥

  • @willd1mindmind639
    @willd1mindmind639 3 ปีที่แล้ว +3

    It more closely represents what happens in the brain where the neural networks represent a coherent high fidelity representation of real world signal information. However, that kind of detailed scene representation normally has a lot of temporal decay with the 'learning" being a set of generalized learning elements extracted from such input info. For example, you could "learn" a generalized coordinate space (up,down, left, right, near, far), depth perception, perspective, surface information (convex, concave, etc), shape information, etc. But that would be another set of networks for specific tasks with less temporal decay and more generalization parameters to allow higher order understanding such as object classification, logical relationships between objects and so forth.

  • @user-oi6uu8sq4k
    @user-oi6uu8sq4k 6 หลายเดือนก่อน +1

    Pretty clear and great thanks to you!!

  • @Dave_Lee
    @Dave_Lee ปีที่แล้ว

    Great video. Thanks Yannic!

  • @thomsontg1730
    @thomsontg1730 2 ปีที่แล้ว

    Great explanation, I really enjoyed watching it.

  • @Snuson
    @Snuson 8 หลายเดือนก่อน

    Loved the video. Learned a lot. Thanks

  • @dsp4392
    @dsp4392 2 ปีที่แล้ว +1

    Excellent explanation. Realtime 3D Street View should be right around the corner now.

  • @juang.8799
    @juang.8799 9 หลายเดือนก่อน

    Thanks for the explanation!!

  • @user-cz5cp4wf6j
    @user-cz5cp4wf6j 2 ปีที่แล้ว

    Wonderful videos! Thanks for sharing~

  • @trejohnson7677
    @trejohnson7677 3 ปีที่แล้ว +2

    The "overfitting" is one of the core principles in Functional Programming/Dataflow Programming. Very awesome to see, wil have to check whether or not it was a locally unique idea, or if it is directly pulling from the aforementioned knowledgebases.

    • @heejuneAhn
      @heejuneAhn ปีที่แล้ว

      Can we say memorizing instead of "overfiitting"? It sounds more intuitive to me.

  • @muhammadaliyu3076
    @muhammadaliyu3076 3 ปีที่แล้ว +5

    UC Berkeley - I salute this University when it comes to A.I. research. In most big paper, you will definitely see one or more scholars from it.

  • @vslaykovsky
    @vslaykovsky 3 ปีที่แล้ว +15

    "Two papers down the line" we'll probably see a paper that also infers positions and directions of photos.

  • @ferranrigual
    @ferranrigual 5 หลายเดือนก่อน

    Amazing video, thanks a lot.

  • @bilalbayrakdar7100
    @bilalbayrakdar7100 ปีที่แล้ว

    bro you are killin' it, pretty damn good explanation thanks

  • @IoannisNousias
    @IoannisNousias 3 ปีที่แล้ว +3

    D-NeRF
    Great explanation as always Yannic! Will you be doing a follow up on their next paper (D-NeRF), which handles dynamic scenes?

  • @siyandong2564
    @siyandong2564 2 ปีที่แล้ว +1

    Nice explanation!

  • @SanduniPremaratne
    @SanduniPremaratne 2 ปีที่แล้ว +2

    How are the (x,y,z) coordinates obtained for the input data?
    I assume a pose estimation method was used to get the two angles?

  • @LouisChiaki
    @LouisChiaki 3 ปีที่แล้ว +30

    I feel this approach probably has been used in physic and 3D image reconstruction for a long time with the Fourier decomposition technique (that is renamed as the positional encoding here). The main point is that it is 1 model per object so I feel like it is a curve fitting problem. Though using gradient descent and neural network like framework probably makes it much easier to model.

    • @paulothink
      @paulothink 2 ปีที่แล้ว +2

      Would that be analogous to the DCT technique in video codecs, and that hopefully this could shed some light into a potentially better video codec? 👀

    • @Stopinvadingmyhardware
      @Stopinvadingmyhardware 11 หลายเดือนก่อน

      DTFT?

    • @tomfahey2823
      @tomfahey2823 9 หลายเดือนก่อน

      The "1 model per object" is an interesting, if not surprising, evolution in itself, as it can be seen as a further step in the direction of neural computing (as opposed to algorithmic computing), where memory/data and computation are encoded in the same structure (the neural network), in a manner not dissimilar to our own brains.

  • @Milan_Openfeint
    @Milan_Openfeint 3 ปีที่แล้ว +1

    Nice, Image Based Rendering strikes back after 25 years.

  • @jaysethii
    @jaysethii 6 หลายเดือนก่อน

    Phenomemal video!

  • @qwerty123443wifi
    @qwerty123443wifi 3 ปีที่แล้ว

    Awesome! Was hoping you'd I'd a video on this one

  • @usama57926
    @usama57926 ปีที่แล้ว

    This is mind blowing

  • @ethanjiang4091
    @ethanjiang4091 2 ปีที่แล้ว

    I watched the video on the same topic before but got lost, Now I get it after watching your video.

  • @user-dh4ud1dr8u
    @user-dh4ud1dr8u 4 หลายเดือนก่อน

    Thank u😮😮😮😮😮 amazing description

  • @alpers.2123
    @alpers.2123 3 ปีที่แล้ว +23

    Dear fellow scholars...

  • @thecheekychinaman6713
    @thecheekychinaman6713 4 หลายเดือนก่อน

    Crazy to think that this came out 2 years ago, advancement in the field is crazy

  • @michaelwangCH
    @michaelwangCH 3 ปีที่แล้ว

    Cool effect. I saw this on two mlnute papers. To train NN from different perspectives of same object - hard to get the right data.

  • @starship9874
    @starship9874 3 ปีที่แล้ว +1

    Hey will you ever do a video explaining Knowledge Graphs / Entity embeddings? For example by talking about the "Kepler, a unified model for KE and PLM" paper

  • @AdmMusicc
    @AdmMusicc 7 หลายเดือนก่อน

    This is an amazing explanation! I have a doubt though. You talked about the major question of training images not having information about "density". How are we even computing the loss in that case for each image? You said we compare what we see with what the model outputs. But how does the model give different density information for a particular pixel if we don't have that kind of information in the input? How will having a differentiable function that can backtrack all the way to the input space be any helpful if we don't have any reference or ground truth for the densities in the training images?

  • @isbestlizard
    @isbestlizard ปีที่แล้ว

    You could stack lots of objects so long as you know the transformation from object to world coordinates and give each object a bounding volume in world space for the ray tracer to bother calculating if you had a supercomputer you could render worlds with thousands of overlapping and moving objects :D

  • @daanhoek1818
    @daanhoek1818 ปีที่แล้ว +4

    Really cool. I love getting into this stuff. I'm a compsci student in my first year, but considering switching and going for AI. Such an interesting field.
    What a time to be alive! ;)

    • @minjunesong6667
      @minjunesong6667 ปีที่แล้ว +1

      I'm also a first year student, feeling same here. Which school do u go to?

    • @daanhoek1818
      @daanhoek1818 ปีที่แล้ว +1

      @@minjunesong6667 The university of Amsterdam

  • @vaishnavikhindkar9444
    @vaishnavikhindkar9444 ปีที่แล้ว +1

    Great video. Can you please make one on LeRF (Language embedded Radiance Fields)?

  • @ceovizzio
    @ceovizzio 2 ปีที่แล้ว +1

    Great explanation. Yannic! Like to know if this technique could be used for 3D modelling?

    • @pretzelboi64
      @pretzelboi64 ปีที่แล้ว

      Yes, you can construct a triangle mesh from NeRF density data

  • @dr.mikeybee
    @dr.mikeybee 3 ปีที่แล้ว

    How compact are minimally accurate models? How many parameters?

  • @firecloud77
    @firecloud77 2 ปีที่แล้ว

    When will this become available for image/video software?

  • @6710345
    @6710345 3 ปีที่แล้ว +10

    Yannic, would you ever review your own paper? 🤔

  • @bona8561
    @bona8561 2 ปีที่แล้ว +3

    Hi Yannic, I found this video very helpful. Could you do a follow up on instant NERF by Nvidia?

  • @sarvagyagupta1744
    @sarvagyagupta1744 3 ปีที่แล้ว +1

    Hey Yannic. I've been waiting for you to talk about this. Thanks. One question though. The viewing angle, is it like the latitude and longitude angle? We need to values because we need to want to know how that point looks from both the horizontal and vertical angle, right?

    • @quickdudley
      @quickdudley 3 ปีที่แล้ว

      I'd been assuming it was pan and tilt. The full algorithm would need to know the roll of the camera but I don't think that would influence any lighting effects.

    • @ashastra123
      @ashastra123 2 ปีที่แล้ว

      It's spherical coordinates (minus the radius, for obvious reasons)

    • @sarvagyagupta1744
      @sarvagyagupta1744 2 ปีที่แล้ว

      @@ashastra123 but then spherical coordinates have two angles, one wrt y axis and other wrt X axis. So are we using the name nomenclature here?

    • @ghostoftsushimaps4150
      @ghostoftsushimaps4150 2 ปีที่แล้ว

      good question, I also been assuming one angle to measure how much left right, and other to measure how much up down, on a surface of a sphere. So, I am also assuming viewing angle like lat long angles.

  • @truy7399
    @truy7399 10 หลายเดือนก่อน

    I was searching for nerf guns, this is better than what I was asked for.

  • @Bellenchia
    @Bellenchia 3 ปีที่แล้ว +1

    I heard about Neural Radiance Fields on the TWIML podcast earlier this year, and never connected that it was the same paper Karloy (and now, Yannic) talked about.
    It's funny how we associate a paper with a photo or graphic a lot of the time.

    • @Bellenchia
      @Bellenchia 3 ปีที่แล้ว

      Fellow visual learners feel free to @ me

    • @Bellenchia
      @Bellenchia 3 ปีที่แล้ว

      Also want to mention that you did a much better job explaining this than Pavan Turaga did on the episode in question, so well done Yannic. The episode I'm talking about is called Trends in Computer Vision or something along those lines for those interested.

  • @dr.mikeybee
    @dr.mikeybee 3 ปีที่แล้ว

    View synthesis shows the power of interpolation!

  • @mort.
    @mort. 8 หลายเดือนก่อน

    Is this an in depth breakdown of what photogrammetry or is this different?

  • @jonatan01i
    @jonatan01i 3 ปีที่แล้ว +1

    I've started the "I invented everything" video yesterday and paused to continue today, but it's private now :(

  • @agnivsharma9163
    @agnivsharma9163 2 ปีที่แล้ว +3

    Can anyone tell, how do we get the density parameter during training?
    Since, we don't have the full 3D scene?

    • @jeteon
      @jeteon 2 ปีที่แล้ว +1

      The density is something the network makes up. It only exists so that you can use it to say what each pixel from a new viewpoint should look like. If you ask the network to generate a picture you already have then the loss from comparing them gives the network enough information to find a density.
      The density it finds doesn't mean anything outside of the specific algorithm that generates images from the network outputs. Just think of the network as generating 4 numbers and then coupled to some other function h(a, b, c, d) that we use to generate pictures from those 4 numbers. We can name the 2nd number "red" but the network doesn't "care", it's just an output, same as what they chose to call "density".

    • @shawkontzu642
      @shawkontzu642 2 ปีที่แล้ว +1

      The output for training is not the (color, density) array but the rendered images, after the network predicts (color, density) for samples points, this info is then rendered into images using volume rendering technique, so the loss is the error between rendered images and training images instead of the (color, density) array itself.

    • @agnivsharma9163
      @agnivsharma9163 2 ปีที่แล้ว

      Thank you so much for both the replies. Now, that you have explained it, it makes much more sense to do it like that. It also helped me clarify a few doubts which I had with follow-up NERF based papers. Huge help!

    • @jeteon
      @jeteon 2 ปีที่แล้ว

      @@shawkontzu642 Yes, I agree. The (color, density) is an intermediate output that gets fed into h(a, b, c, d) whose outputs are rendered images. The h function doesn't need to be trained though. It is just the volume rendering technique.

  • @brod515
    @brod515 หลายเดือนก่อน

    where does the scene come from?

  • @AThagoras
    @AThagoras 3 ปีที่แล้ว +3

    I must be not understanding something. How do you get the density from 2d images?

    • @arushirai3776
      @arushirai3776 3 ปีที่แล้ว +1

      I don't understand either

    • @jeteon
      @jeteon 2 ปีที่แล้ว

      The 2D images give you multiple perspectives of the same point in space just from different angles. If you combine that information (in this case using the neural network) then you can get a good idea of whether or not there is something at a particular point. Density is not how much mass there is in the volume around a point but rather how much stuff there is at that point that interacts with light.
      Think of it like what people naturally do when they pick something up and look at it from different angles. Each of those angles is like the pictures and the internal idea you would form of what the 3D object looks like is the model that gets trained. By the time you've analysed an object this way, you also can make pretty good guesses about which parts of the object reflect light, how much light, from which angles and what colour to expect. That's basically the model. The density is how much that point dominates the light you get from that point and could be something like 0 to 1 being from completely invisible to completely opaque.
      Also, if you just look at the pictures you train on, your brain can build this model so that you have a good sense of whether or not a purple spot makes sense on the yellow tractor.

    • @thanvietduc1997
      @thanvietduc1997 2 ปีที่แล้ว

      You ask the neural netwok for density information, not the images. The pixels (RGB value) in those images serve as the target for the neural network to train on.

    • @AThagoras
      @AThagoras 2 ปีที่แล้ว

      @@thanvietduc1997 OK. that makes sense. Thanks.

  • @alanliu7148
    @alanliu7148 2 ปีที่แล้ว

    what do you think the next step after NeRF?

  • @piotr780
    @piotr780 ปีที่แล้ว

    so there are really two networks (coarse and fine) or this is some kind of trick ?

  • @marknadal9622
    @marknadal9622 2 ปีที่แล้ว

    Help! How do they determine depth density from a photo? Wouldn't you need prior trained data to know how far away an object is, from a single photo?

    • @YannicKilcher
      @YannicKilcher  2 ปีที่แล้ว +2

      Yes, search for monocular depth estimation

    • @marknadal9622
      @marknadal9622 2 ปีที่แล้ว

      @@YannicKilcher Thank you!

  • @masterodst1
    @masterodst1 2 ปีที่แล้ว

    Itd be cool is someone combined this with volumetric or loghtfield displays

  • @ilhamwicaksono5802
    @ilhamwicaksono5802 ปีที่แล้ว

    THE BEST

  • @jonatan01i
    @jonatan01i 3 ปีที่แล้ว

    This is better than magic.

  • @dhawals9176
    @dhawals9176 3 ปีที่แล้ว

    Ancient huh! nice way to put it.

  • @sebastianreyes8025
    @sebastianreyes8025 ปีที่แล้ว

    I noticed manny of the scenes were from UC Berkeley, kinda trippy. The engineering school there gave me a bit of PTSD ngl.

  • @paulcassidy4559
    @paulcassidy4559 3 ปีที่แล้ว

    hopefully this isn't comment isn't in bad taste - but the changes in lighting patterns on the right hand side at 2:51 reminded me a lot of how light behaves while under the influence of psychedelics. yes I'll see myself out...

  • @herp_derpingson
    @herp_derpingson 3 ปีที่แล้ว +1

    Reminds me of that SIREN paper.

  • @usama57926
    @usama57926 ปีที่แล้ว

    But can this be used real time?

  • @CristianGarcia
    @CristianGarcia 3 ปีที่แล้ว +1

    I was a bit confused, I thought this paper already had been reviewed but it was actually the SIREN paper.

  • @sofia.eris.bauhaus
    @sofia.eris.bauhaus 3 ปีที่แล้ว +2

    i don't think this should be called "overfitting". as far as i'm concerned, overfitting means learning the input data (or at least big chunks of it) as one big pattern itself, instead of finding the patterns within the data and generalizing from it. this system may be able to reproduce the input data faithfully (i haven't compared them 🤷) but it clearly learned to generalize the spacial patterns of the scene.

    • @mgostIH
      @mgostIH 3 ปีที่แล้ว +1

      It doesn't really generalize anything outside the data it has seen, its job is to just learn *really* well the points we care about, but anything outside that range isn't important.
      Think of it like if you were to train a network on a function f(x) and you are interested on the domain [-1,1]. Overfitting on this domain would mean that the network is extremely precise inside this interval but does something random outside of it, while generalizing means that we also care about having a good estimate of the function outside the domain.
      Here our domain is the parts where we can send rays to, it doesn't really matter what the model thinks is outside the box we never sampled on.

    • @laurenpinschannels
      @laurenpinschannels 3 ปีที่แล้ว +2

      in this context, overfitting might be replaced with undercompression

    • @sofia.eris.bauhaus
      @sofia.eris.bauhaus 3 ปีที่แล้ว

      @@mgostIH yeah, and network that is trained on birds will probably never generate a good squirrel. i don't think neural nets tend to be good at producing things unlike anything they have ever seen before.

    • @jeteon
      @jeteon 2 ปีที่แล้ว

      🤔 That's actually a good point. If it "overfit" it wouldn't be able to interpolate novel viewpoints, just the pictures it was trained on.

  • @paulcurry8383
    @paulcurry8383 2 ปีที่แล้ว

    Why is this “overfitting”? Wouldn’t overfitting in this case be if the network snaps the rays to the nearest data point with that angle and doesn’t interpolate?

  • @JeSuisUnKikoolol
    @JeSuisUnKikoolol 3 ปีที่แล้ว +2

    I would be very surprised to see a similar technology used to render objects inside games. According to the paper, sampling takes 30 seconds on a high end GPU. As games often run at 60 fps, this would only be viable with a speed up of x1800 and it's assuming we only have to render a single object (so realistically speaking we could add another factor of x100).
    This does not mean it is not possible with more research and better hardware but if we compare this to the traditional way of rendering in games, I'm not really sure there is an advantage.
    It's not even something we could not do as we already have photogrammetry to generate meshes from images.
    For non biased rendering ("photorealistic") I could see some use but the learning time is way too high for the moment. One application could be to render a few frames of an animation and use the model to "interpolate" between the frames.

    • @ksy8585
      @ksy8585 2 ปีที่แล้ว

      Now it reached x1000 speedup in both training and inference. What a speed of progress. There is more chance of using this technology where you take a few pictures of an object existing in the real world and reconstruct it as 3d image by training a neural network. Then you can manipulate the image or synthesis 2d images from novel viewpoint, lightening, time-step(if a video) or so.

    • @JeSuisUnKikoolol
      @JeSuisUnKikoolol 2 ปีที่แล้ว

      @@ksy8585 Very impressive. Do you have a link ?

  • @WhenThoughtsConnect
    @WhenThoughtsConnect 3 ปีที่แล้ว

    take pics from all angles of apple, maximize the score to label apple

  • @Bellenchia
    @Bellenchia 3 ปีที่แล้ว +1

    I think the fact that it takes 1-2 days on a V100 is the biggest gotcha

    • @jeroenput258
      @jeroenput258 3 ปีที่แล้ว

      Another gotcha: the view dependent part only comes into play in the last layer. It really doesn't do much but optimize some common geometry.

    • @Bellenchia
      @Bellenchia 3 ปีที่แล้ว

      @@jeroenput258 maybe this is akin to GPT-3's large store of knowledge, where the top layers all store information about the image and the last layer basically just constructs an appropriate image for a given view using the shapes and textures it learned

    • @jeroenput258
      @jeroenput258 3 ปีที่แล้ว

      Perhaps, but I don't know enough about GPT-3 to answer that. What I do find odd is that when you move the ray direction parameter up just one layer the whole thing falls apart. It's really strange Nerf even works imo.

  • @jasonvolk4146
    @jasonvolk4146 3 ปีที่แล้ว +4

    Reminds me of that scene from Enemy Of The State th-cam.com/video/3EwZQddc3kY/w-d-xo.html -- made over 20 years ago!

  • @ankurkumarsrivastava6958
    @ankurkumarsrivastava6958 ปีที่แล้ว

    Code?

  • @R0m0uT
    @R0m0uT ปีที่แล้ว

    This sounds as if presentation could be entirely done in a raymarching shader on the GPU as I suspect the evaluation of the model can be implemented as a shader.

  • @NeoShameMan
    @NeoShameMan 3 ปีที่แล้ว +1

    I'm studying light field, the premise makes it not that impressive to me, program a lytro like renderer you'll know what i mean

  • @dvfh3073
    @dvfh3073 3 ปีที่แล้ว +2

    5:50

  • @VERY_TALL_MAN
    @VERY_TALL_MAN ปีที่แล้ว

    It’s NeRF or Nothin’ 😎

  • @pratik245
    @pratik245 2 ปีที่แล้ว

    Deep Tesla

  • @GustavBoye-cs9vz
    @GustavBoye-cs9vz 2 วันที่ผ่านมา

    7:05 - 7:45 So we use the same neural network for multiple different scenes? - Thats smart because then we dont need to retrain it every time.

  • @rezarawassizadeh4601
    @rezarawassizadeh4601 11 หลายเดือนก่อน

    I think, saying that each scene is associated with one single neural network (NN is overfitted for that scene) is not correct.

  • @yunusemrekarpuz668
    @yunusemrekarpuz668 ปีที่แล้ว

    Its ilke end of the photogrammetry

    • @notram249
      @notram249 11 หลายเดือนก่อน

      nerf is a step forward in photogrammetry

  • @fintech1378
    @fintech1378 4 หลายเดือนก่อน

    Python code?

  • @Anjum48
    @Anjum48 3 ปีที่แล้ว

    Obligatory "we're living in a simulation" comment

  • @govindnarasimman6819
    @govindnarasimman6819 ปีที่แล้ว

    finally something without cnns. bravo guys.

  • @muzammilaziz9979
    @muzammilaziz9979 3 ปีที่แล้ว

    Yannick "not so lightspeed" Kilcher

  • @Ohmriginal722
    @Ohmriginal722 3 ปีที่แล้ว

    It's NeRF or NOTHING

  • @stephennfernandes
    @stephennfernandes 3 ปีที่แล้ว

    How I wonder why Andrej karpathy was tweeting about this paper. Tesla will bring this in their FSD. And it will do wonders

    • @malymilanovic8298
      @malymilanovic8298 3 ปีที่แล้ว +1

      I dont follow, where will they use it? What do they need to render?

    • @stephennfernandes
      @stephennfernandes 3 ปีที่แล้ว

      @@malymilanovic8298 if because of multiple cars just a part of a car is visible for a far distance. It will use to render the whole car, assuming how the car is placed behind another car or any object. This assumptions will then be used to make better decision in driving at that current instance.

    • @malymilanovic8298
      @malymilanovic8298 3 ปีที่แล้ว

      ​@@stephennfernandes ​ Interesting. Wouldn't that imply that they need to train their net online, during driving and for every car that might get occluded? This seems computationally intensive. They could meta train and then use few SGD steps, that could help but I still do not see why they would want to model the car like that.

    • @stephennfernandes
      @stephennfernandes 3 ปีที่แล้ว

      @@malymilanovic8298 in the Tesla autonomy day, Elon and his team said that thr tesla FSD computer is quite a powerful computer, so it's not an issue of compute. Secondly the training on th FSD chips happening is on shadow mode and is preliminary. Only some annotations and models decision predictions are captured and sent over to Tesla cloud where after proper test and checking the model is trained. Yes doing this would bring in tons of complexity. But imagine a clusters of cars standing right in front and next to each other from a single sided 2D perspective we cannot have a strong assumption of their position case we will have annotations in 2D , having NeRF would give us a broad perspective on how the entire cluster of car space in 3D would look like ... Karpathy once said it's hard to make full annotations and predictions of objects around Tesla. But NeRF would really help

  • @trejohnson7677
    @trejohnson7677 3 ปีที่แล้ว

    Also, these guys weren't smart enough to come up with it by themselves either. That's why there's a whole team. Collaboration has to factor into modern codebases in some form or fashion.