Yeah i have been thinking the same lately, that i would love to dive deeper on some topics. @Two Minute Papers i think this is something you should seriously consider as a second channel - big fan of your work :)
@@guillermojperea6355 a second channel with in-depth analysis of papers, which is a whole new huge project with hours of work for each episode ... yeah let's decide in a few seconds and announce it in a comment's reply 😂 Btw check the chanel's playlist, he has courses !
I hope so, imagine being able to take a few images of a place and instantly be able to walk around virtually in this space. You could couple it to image search services and just type a location to find third party images, and be there.
@@ET_AYY_LMAO imagine if vr headsets used this to finally make ar truly be able to interact with the environment. It's so fast that just a quick walkthrough would make a 3d map. Plus you could have it run in the background if there's any new positional info that needs to be added as you move
This is incredible. It's not real-time raytracing, it's an AI literally just eyeballing it. And it's MORE real-time than the regular methods. Can't wait for this kind of rendering to make it into games and simulations, it would outperform anything ever seen before.
It also begs the question and possible reality or realities that if an AI can generate graphics on demand in this contextual sense how do you know what you see is actually what is represented before your eyes compared to something that's shown to another entity? Not taking any position just wondering what if?
@@lorimillim9131 I assume like we saw with deep fakes that a counter-ai trained specifically on the pitfalls of the technology in question gets rolled out at about the same time to counteract misinfo
A lot of people are mentioning computer graphics as an interesting application, but are missing the bigger picture: This is using neural networks, and it's replicating a 3D environment from limited input, which is VERY similar to what our brain does (dreaming, for example)... This is amazing for neuroscience.
This is an incredibly important point. On that note. Let's also not forget that this means you can hyper-accelerate drug design from first principles based on receptor shape. There's already algorithms that can design molecules with a certain shape and algorithms that can search for a synthesis pathway. So you could theoretically feed the machine a series of images of a receptor and get a recipe for a drug targeting it out the other end.
It’s not just in our dreams, our brain creates a 3D environment while we are conscious too. Our eyes are 2D sensors. It’s our brains that combine that information into a 3D experience.
100% FMRIs can already get enough data to discern if someone is thinking about a building/person/ animal etc. Until now the resolution was bad. If AI fills in the extra data reliably , we are not far from mind reading/projection :P
" which is VERY similar to what our brain does" - Not really. Our brains probably don't need to "reconstruct" a 3d environment. It's already perceived to be a 3d environment, no need for "reconstruction"
This channel will live to see the day when the pace of progress in this field will exceed the speed of publishing and paper discovery. The format may have to switch to a statistical approach that samples results from multiple simultaneously published papers to depict the state of the art. aka "paper transport" Then Nvidia will publish work on a hardware accelerated paper transport resolver that produces "perf"s at a rate faster than this channel. The papers will move so fast we won't be able to hold onto them
In a few years when this can run at 60hz+, all you need is a few cameras in a space and you'll be able to use VR to insert yourself in that location :O That will be completely bonkers!
As I understood it, it takes a couple of seconds to create the "render" of the scene but when it's done it is in fact running at 60fps. So I guess you can just have a loading screen or something before you get to see anything
This NOT creating a 3d environment, or even a single 3d object. This is creating a smooth track of 2d images. It's a very cool technique, but if you're imagining a game where you can do anything more than ride a roller coaster or something like that (and not change the direction of your camera) then this won't be applicable to that game.
Is it possible to apply this on a video footage? It would be mind blowing to have this in a VR video player. Current VR video players play a video projected inside a sphere. Which not a real 3D, because once you start moving your head in 6DOF it breaks the immersion. But having a dynamic 3D mesh which would be used for the video projecton instead of a sphere, would be mind blowing!
If that‘s the case, we‘ll soon be able to walk around in movie sets and possibly assume the role of a character, just like the book Ready Player One predicted.
I'd argue that machine being able to interpolate between photos is still far from understanding the actual geometry and synthesizing the scene from ANY point of view, as VR does require.
I am eager to see the practical application to be available. As an architectural historian, it would help documenting architectural heritage much easier with more simple equipment!
@@Adhil_parammel lasers are cool. Getting a near-perfect render with a phone camera & 2 minutes is objectively better. Especially if the site is hard to get to or not very secure. even if the laser thing is higher quality the lower cost & higher accessibility of this technique would still be mighty usefull.
Careful. Remember that what isn't in the source photos has to be made up. Great for many applications, but not so great for archival and historical research. Is that the actual gargoyle , or invented from training data from all periods of history?
@@stub42 It's a valid argument, but the same is true for a laser scanning, and it requires human clean-up. After photo scanning with this net there also would be validation process.
There are a lot of amazing two minutes papers, but only a few standalone software are available for this (like the ones from Topaz Labs). When will this technique be available easily for everyone?
It can take several years to ten years. No matter how great the performance is, if it is a specialized single function, there is little motivation to offer it as an easy-to-use application. If this feature can be further developed and used for all kinds of surveying on construction sites, people's productivity will be visibly improved.
The method is already available on GitHub. For commercial use, the authors say that Nvidia should be contacted. So it is a matter then of going through a sales process to generate interest between stakeholders, agreeing on pricing, and going through legal. Only then would a commercial agreement begin. And if the goal was a graphical interface you'd have to give ample time to develop this, bug test it, and run perception tests to ensure that it is user-friendly. It altogether takes a while and can explain the lag between the newest results and easy-to-use graphical programs like Topaz Labs. I work on a sales team for a commercial scientific image processing software. I suggest new ideas to our R&D regularly. Maybe 2%-5% of ideas are accepted and end up in the product. For these features (which are mostly low-effort high-reward due to cost considerations), it often takes 2-3 years. And that's when we already have an application and team to build the new feature in to.
For the program that showed more detail when zooming in: was it an AI that filled in missing detail or was it an AI that made the high def image less “costly” by simplifying the image when zoomed out?
The AI created the detail from that one single grainy, almost black and white image. It is so good at creating the next layer that it can procedurally generate ever increasing depth of image. Pretty neat stuff.
From my understanding, and i dont think in this case he explained it vary clearly... An AI seems to take a large image and train another AI with it in order to reduce the size of the image since the new AI will then be able to fill in the details based on its training. So each image would contain its own AI or be paired with one specific to it. So long as the end result is smaller its a win. Though the explanation like i said is lacking and it would have been great to see the original file size VS the end results file size.
Absolutely extraordinary. My first thought was that "Now we know what those old CSI shows were using to zoom into their videos to find important details" Haha Yet also, does this mean from a couple of photos we can now create 3D models which could plausibly be printed? Effectively advanced Photogrammetry?
As cool as this tech is, you'd have to be very careful if you wanted to use it like this. The AI doesn't know the truth, it's just guessing one possibility of what could be true, based on the limited amount of information it was given.
@@EVPointMaster I assume you mean the CSI idea. Yes that is a joke. With regards to the 3D printing functionality I assume you'd still need to do some work to get a useful mesh. However it looks really close to something that would be useful.
@@EVPointMaster youd like to think that enhanced footage was never used in criminal cases, but just look at the kyle rittenhouse case, the prosecutor submitted a frame that was an interpolated and upscaled image from a video. in order to try to make a very specific claim of kyle rittenhouse doing a specific thing. for one frame. it was allowed as evidence and even enhanced it was a blurry mess that showed nothing specific, but the prosecutor was allowed to claim it showed him aiming a gun at someone.
what caught my attention is the neural representation thing. Can it be used for image compression? I imagine there's lots of room for improvement from other current methods like jpeg, which doesn't really understand the image
I believe this is what they are aiming for. Extreme compresstion for photos and video while preserving the important details. Say a video of a footballer kicking a ball inside a stadium. The important details such as the facial expression and the actual movement are preserved. Non-important details such as the field, the spectators, the roaring sound can be compressed. During playback, these non-important details are then procedurally generated. This is somewhat anolagous to how we stored information in our brain.
@@Ginsu131 A lot of time can go into making the details of a map in table top rpgs it would be awesome to have an AI be able to fill in that detail as it would speed up production and take a lot of mental energy off the GMs.
nvidia GPUs of 2021 are heavily optimized for matrix multiplication, with a mode for sparse matrices. It is mostly used for up-scaling, but can also be used as great noise-filter (audio and raytracing-denoiser) ray-tracing is also useful for more realistic audio. The more general applications of this are pretty much anything with linear-algebra, where ever you multiply 2 or more matrices, most likely rootSolvingInverse*projection.
Sorry, havn't dive into the paper bcoz am totally noob in AI, any idea which graphic card they r using in this paper? Am bout to get a new rig with RTX 3070, but had put onhold due to the progress in AI for photogrammetry is too fast..
@@danielng1765 the RTX30xx series cards are for private house holds, nvidia also makes graphics cards for datacenters (commonly used to train ai models or for things such as sorting long prioity-lists for the internet). The server-rack architecture is similar, but the bandwidth and parallelization is much higher, and it scales the price to "millions per unit".
@@danielng1765 for the common high-tier-gamer (or indie gamedev and ai-code-learner), the playstation5 has a gpu, that compares pretty well to the GTX 2070, This is 2020 technology, significantly slower memory access, significantly worse for motion blur than 30xx cards of 2021. GTX 2070 card is currently still a relative good value (damm all the cryptocurrency scammers/thiefs) to put in a new pc, that costs up to 1100usd new. A GTX3070 is significantly better (>2,5x of a 2070), and commonly fround in new PCS that cost over 1700 usd. the "ti" suffix makes a significant difference and is not to be overlooked (commonly means: 24% faster memory, +20% power consumption +50% more cuda-cores.) This in general seems to appeal more for higher resolution, displays (up to 4k)
@@ollllj thks for the advise, I initially decided to get RTX 3070 due to available performance comparison based on agisoft metashape. RTX 3070 has the good cost/performance balance compare to others based on their standard samples. Shall check 3070ti as well if the result is available.
I wonder how long until we will have interactive movies? Where they just film from a few angles and then at home you feed that footage into VR and can move around within the scene as it unfolds.
I'm not completely clear on what the gigapixel-image one is actually doing - taking a gigapixel image, building a model, and then keeping only the model (which is smaller? how much smaller?)
When I was about 17 in 2011 I tried making my own version of what is essentually NERF to make buildings into 3D models really quickly just using photos
@@DonC876 It kind of ended up looking a lot like how google maps does their 3D models just a lot more manual and a lot more buggy but it did kind of work
Astounding - I can see 3d scanning apps / software will pretty soon become “trivial”on phones etc. … which itself is astounding never mind all the other stuff !
Making a game in the future: "Hello Siri, create me a game that plays like Doom 12, but with Disney-Characters that look like musicians that are currently in the top 20 charts and this all should happen in Tokio at daytime, raining. End boss should be a giant paper."
Would be interesting if there was a follow up showing us where these technologies are eventually available in actual commercial products But I know this isn't the scope of the channel.
I can already tell you: - Video games production will just be... wtf-level of workflow improvement ; - Google Earth, which already have bazillion pictures, will either be banned or transformed into a life-like simulation ; - Military analysis and transport will make use of the above (Google Earth) ; - Urbanism, real-estate construction and sales of real-estate ; - Back to Google Earth, the applications are just crazy: imagine a traffic application where you can get a life-like visualization of the traffic, making it seem so much more reliable to the viewer than a red line on the road? - I'm going nuts thinking about all the crazy ways this will affect us in our day lives. ... aaaaah, just the VR-Google Earth life-like simulation is utopian/dystopian enough.
@@themore-you-know Im working with google earth studio.. combine that with blender -and Video editing .. you can create a realistic 3D scene. The problem is close up... the textures/models are so broken.th-cam.com/video/UrvKsuDSaNE/w-d-xo.html
Is it really rendering genuine 3d model from those images, like photo geometry does, or makes some sort of detailed map of viewing angle's and lighting of this object???
So I am curious; can the AI transition past context knowledge to a current context knowledge, and retrain on the fly? For instance, if you see a painting of a woman and contextualize it as 'old painting of woman with no background'. Then, as you zoom in switch the context to the current frame and re contextualize 'old painting of woman's head'. So portrait -> head -> face -> nose and eye -> eye -> pupil -> retina -> optic nerve -> light sensing cells -> cell wall -> DNA At some point you are only providing that last context to the AI to create the new image from pre-trained understanding. i.e. you reach a point where you are not caring about the reference material anymore.
I got that from this explanation. The layer that it produces is good enough to use as input. So, yeah at some level it is creating images based on pixels that it generated.
i imagine its trained based on images that exist, and learns what detail should exist in such an image, and then fills it in. so you can zoom in on a city because it knows what buildings, cars, roads, people look like. but it wont understand that zooming on on a persons face would reveal skin cells, unless it was given high resolution photos of skin, that transition to the microscopic level to reveal cells. also, if its given context that its a painting, at some point you would want it to assume its zooming on on oil, not assume that the artist was able to paint in individual cells in the painting because its a painting of a face.
Great video! Thanks for all work! However…….. Is it possible for people outside the research team test it? Are all of these available for us mortals? Can we test this? Or these will only go to Nvidia? Sorry for the dumb questions…
The code is on GitHub! Dunno how hard it would be to actually get it running, and the GitHub page says you need a Nvida graphics card, but it is available to the public if you know what you’re doing!
@@thegeekclub8810 I got it running just now. I'm on a GTX 1080 and it runs quite slow, but I can train the fox one and look around in low res. Works pretty well. Setting it up is quite impossible if you don't know what you're doing. But you can always try. I tried training one of my own datasets but it throws an error. Still working on that. edit: The GTX 1080 is probably much slower because it is not RTX . the Geforce RTX line of cards have tensor cores which are more optimized for this kind of job and my 1080 has none.
Thank you for making Two Minute Papers. Your excitement and enthusiasm is infectious! And I always find myself getting more and more enthused as I watch your videos and see the pace of progress. It's nice to see a channel that just... Makes me feel that the future can be bright.
I remember hearing how, in 2019, AI was going to be in another "winter"... same thing said in 2020, 2021... but this really is just the dawning of it! :p That gigapixel compression, in particular, will be adapted to "memorizing" the input-output map of software, so that you can just use a look-up table, instead of computing values in code. It'll let us fix bugs by flipping a bit on the look-up table, without needing to find a way to "correct the software"!
How would I be able to use this technology? Is there any program that let's me take photos and have 3d models really fast like in this video? Thank you.
Impressive the time scaling in just one year ("O"). I'm expecting a generalized model with a "kind of 3d segmentation" to change parameters in materials or add physics for the future ...in any case this is the first step for the "synthetic" rendering. Amazing ♥️
Musicians would call what you keep doing with your voice a "mordent". But seriously, I've been impressed by the pace of AI predictive algorithms in recent years, and it keeps getting better.
Basically they developed a multiresolution input encoding that simplifies and allows to highly parallelize the task, taking full advantage of the GPU. They apply it to different techniques (NeRF, Neural Gigapixel Image, Neural SDF, and Neural Volume ). From the paper website: *_Instant Neural Graphics Primitives with a Multiresolution Hash Encoding_* _We demonstrate near-instant training of neural graphics primitives on a single GPU for multiple tasks. In gigapixel image we represent an image by a neural network. SDF learns a signed distance function in 3D space whose zero level-set represents a 2D surface. NeRF [Mildenhall et al. 2020] uses 2D images and their camera poses to reconstruct a volumetric radiance-and-density field that is visualized using ray marching. Lastly, neural volume learns a denoised radiance and density field directly from a volumetric path tracer. In all tasks, our encoding and its efficient implementation provide clear benefits: instant training, high quality, and simplicity. Our encoding is task-agnostic: we use the same implementation and hyperparameters across all tasks and only vary the hash table size which trades off quality and performance._ _Abstract:_ _Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate. We reduce this cost with a versatile new input encoding that permits the use of a smaller network without sacrificing quality, thus significantly reducing the number of floating point and memory access operations. A small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through stochastic gradient descent. The multiresolution structure allows the network to disambiguate hash collisions, making for a simple architecture that is trivial to parallelize on modern GPUs. We leverage this parallelism by implementing the whole system using fully-fused CUDA kernels with a focus on minimizing wasted bandwidth and compute operations. We achieve a combined speedup of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds, and rendering in tens of milliseconds at a resolution of 1920x1080._
@@Pixelarter Thanks! Are the inputs to all of these tasks the same (2D images?)? Cause the tasks sound quite different, so it's cool that their encoding works for all of them
@@tobiascornille No. Some are 3D, some are 2D. From what I glance, the hash they developed just encode the position of the input at different resolutions, concatenates them and apply some transform, and feed as input along regular information.
I wonder if this could be used to make 3D scanning through photogrammetry incredibly fast, since it can detect the geometry of such few pictures SO fast.
This has me excited for the future of video games and other simulations. I obviously an enthralled about the idea of a "Full Dive" video game like Sword Art Online. Seeing things like this, Unreal Engine 5 with Lumen and Nanite, AI in general, as well as what Gaben and Elon have been doing for neural interfaces, as he excited for a future where we can be fully immersed in whatever scenario that we want. It's definitely a pipe dream of sorts. But I can imagine a future where we have insanely detailed, low cost simulations, as well as the ability to dive into these worlds with all of our senses. It is a driving factor for me to learn more about ML, AI, and video games.
Which Two Minutue Paper video does the clip at 0:30 come from? I remember seeing it in his videos before. I was interesting in looking into it further. Thanks to whoever can help!
Hey, so I'm kinda new to this whole AI enviorment but this looks amazing! Is this public? Like can I upload some pictures and the program maken an 3d object out of it?
Yeah they give descriptions of a scene or draw primitive zones like this blob is water, this is land, this is trees, this zone is sky etc and a.i paints a picture. That is magic indeed. One upping that is telling a.i to do a task or write a program and it does it in a matter of minutes or hours.
Its interesting how you have completely moved away from any attempt at explaining the papers you present. I suppose that makes sense for the 2-minute format; I always scroll to the results section first anyway. But a tiny bit more depth and context wouldnt hurt. I was assuming this papers content must be all about how to leverage a whole datacenter full of GPUs in parallel; but its even more mindblowing to see their abstract mentions a single GPU... now thats a bit of detail that would really add to the presentation of this work.
Wow this is INCREDIBLE! I mean, there's just no comparison between the 2-month old and 1-month old papers. Unbelievable how crisp and smooth everything comes out to be.
So could we use something like this to create a real time driving simulator? The AI could use the input data from something like google map's street view and edit it into an interactive 3D environment.
If it only takes 5 seconds to render a still object, and that same rendering speed is applied to each frame of an actor's performance filmed in 60 FPS from multiple angles, then each minute of footage should take about 5 hours to render. If you're a small game dev studio, that means that you can basically feed your dedicated workstation a few minutes of footage and leave it running overnight, and you'll get the final animated asset rendered in a day or two. What a boon this would be for Myst-like adventure games! EDIT: Actually, not even a day or two! If it takes 2 seconds per frame, then even five minutes of footage would be rendered in just 10 hours! You could leave it running overnight, and it would be done the very next morning!
@@magen6233 Idk I did not red whole code yet. From what I read, I understand that synchronization is done with m_training_stream (for training) and m_inference_stream (for rendering) (these are cuda's streams and are used for runing kernels asynchronously). Whole magic happens in Testbed::NerfTracer::trace() function and train_nerf. I think that they are coping sth but for sure not every frame (update_nerf_transforms function, copy sth every training step).
How can I get started implementing this in my own workflow?... I just figured it out click on the read the paper link and then it will take you to the git hub page...
Sir, your hype is both awesome and fully appropriate in equal amounts. I learned about NERFs only a few days ago. You don't exaggerate when you say that this was science fiction only four years ago. Back then, I intuited the possibility of AI photogrammetry - in the same way Star Trek intuites warp drive and the holodeck. And now - it is here. The tech straight out of my dream. What a time to be alive!
Photogrammetry, self driving vehicles, and, an interesting and maybe not so feasible one, pre-rendering a complex scene and simplifying it to be displayed on lower end hardware using this technique. From what I read, it produces an SDF, which is super awesome because they're cheap to render and can offer a lot of mathematical meaning, e.g. with self driving vehicles
@@Hexcede yes it creates 3d models, but how would you integrate it into existing software would be a bigger challenge. 3d models to be used in applications need to be highly optimised as well (topology , different maps and stuff) I get that this has amazing and varied applications but I fail to see how it can be seamlessly integrated into existing software say blender or reality capture, unity, etc
@@SYBIOTE well obviously it would need other processes added to it. for say if you wanted to use this to create cgi characters. youd probably start with a person in minimal clothes, then add the clothes on top after. if you want it for map making, then youd remove everything you want to be interactive and model those objects separately, so the level, the walls, floor, etc is created using this technique, and you dont have to cut things out of the model. combine this iwth teechniques for removing the specific lighting condition and being able to use an in engine lighting, which weve already seen in other papers. these things are all literally just research papers. how they get applied to software that is end user friendly, even for professionals using complex software, is years down he line, and will likely require companies like weta to create them.
It’s incredible things have come this far. I’ve had fun playing around with Stable Diffusion, but I know that’s only a prelude to what we’ll see in the coming months and years.
Good introducing! BTW I am quite interested in the scene of restaurant "Le Petit Coin" in 5:03, does anyone know where it is now, or does it not exist anymore? Any information would be appreciated.
ok maybe I will end up studying neural graphics. everytime I am back at this channel I am excited and I want to read about this and study them in length for a while.
Here's an idea for a project (I don't have the means to do it myself) : Train an AI on 360° photos, but have the AI try to fill in a missing hemisphere or even larger region of the sphere, at random angles; vary the size of the sphere region that needs to be filled, and once it can fill a very large percentage of the sphere, you might be able to have the AI correctly guess what's behind the camera on photos taken with non-360° cameras! :D
I hope we can get a piece of demo software or even a retail product with this. Would be so cool to go and snap a bunch of pics, throw them on a gaming pc to make into 3d objects and environments.
Maybe a dumb question, but how can we compare the computation times if we can use a machine that can do the same computation faster? Shouldnt there be a special metric, that takes in to account the computational capabilities of a computer and the time it takes to compute the simulation?
Hi Károly Zsolnai-Fehér, you should do a longer-format video that goes into more detail of how these results are achieved! I would love to learn more about what technologies and algorithms went into these achievements.
The 3d world this method creates really remind me of the GTA V world. You could take stuff from that AI and just put it in the game, probably without much further editing. That's impressive.
It makes me think of consciousness. How people say that everything is one giant whole, a rectangle mesh, that the self, the i, the A.I. divides into a multiplicity.
in the future, combining this with the doodles too photo realistic scenes.. ending up building realistic 3D environments from quick thumbnail sketches would be amazing!
Imagine movie making with this, or stage performances played after the fact in VR. It sounds like if you have enough cameras, like the original matrix setup, you'd be able to process possibly 1 second of film or movie in about 1-2 minutes...that's amazing! To have that kind of viewer angle independent data sounds like the dream of a VR holodeck style experience is closer than we think!
Great video! It would be amazing if you had a second channel titled "Twenty Minute Papers," where you go more in depth on topics that interest you.
You are too kind. Thank you so much! 🙏
Yeah i have been thinking the same lately, that i would love to dive deeper on some topics. @Two Minute Papers i think this is something you should seriously consider as a second channel - big fan of your work :)
just saying, I'd subscribe to that.
@@TwoMinutePapers Karoly, we expected you to tell if you'd do it!
@@guillermojperea6355 a second channel with in-depth analysis of papers, which is a whole new huge project with hours of work for each episode ... yeah let's decide in a few seconds and announce it in a comment's reply 😂
Btw check the chanel's playlist, he has courses !
This is going to make photogrammetry so much easier
I hope so, imagine being able to take a few images of a place and instantly be able to walk around virtually in this space. You could couple it to image search services and just type a location to find third party images, and be there.
Google Earth can be very enhenced with this method.
@@ET_AYY_LMAO imagine if vr headsets used this to finally make ar truly be able to interact with the environment. It's so fast that just a quick walkthrough would make a 3d map. Plus you could have it run in the background if there's any new positional info that needs to be added as you move
I really hope so!! I hate how long it always takes to process my inputs
In fact with a few cameras something like this could make a 3d video phone call finally work
This is incredible. It's not real-time raytracing, it's an AI literally just eyeballing it. And it's MORE real-time than the regular methods. Can't wait for this kind of rendering to make it into games and simulations, it would outperform anything ever seen before.
Introducing ai vision to settings in fps games 😂
Imagine how much this will streamline workflows for 3D graphics designers! You’ll have updates to adjustments in SECONDS
It also begs the question and possible reality or realities that if an AI can generate graphics on demand in this contextual sense how do you know what you see is actually what is represented before your eyes compared to something that's shown to another entity? Not taking any position just wondering what if?
@@lorimillim9131 Yeah, that has pretty frightening implications for media/the legal world.
@@lorimillim9131 I assume like we saw with deep fakes that a counter-ai trained specifically on the pitfalls of the technology in question gets rolled out at about the same time to counteract misinfo
A lot of people are mentioning computer graphics as an interesting application, but are missing the bigger picture: This is using neural networks, and it's replicating a 3D environment from limited input, which is VERY similar to what our brain does (dreaming, for example)... This is amazing for neuroscience.
Neh, its amazing for rule 34 artist, imagine the possibilities. ( ͡° ͜ʖ ͡°)
This is an incredibly important point. On that note.
Let's also not forget that this means you can hyper-accelerate drug design from first principles based on receptor shape. There's already algorithms that can design molecules with a certain shape and algorithms that can search for a synthesis pathway.
So you could theoretically feed the machine a series of images of a receptor and get a recipe for a drug targeting it out the other end.
It’s not just in our dreams, our brain creates a 3D environment while we are conscious too. Our eyes are 2D sensors. It’s our brains that combine that information into a 3D experience.
100% FMRIs can already get enough data to discern if someone is thinking about a building/person/ animal etc. Until now the resolution was bad. If AI fills in the extra data reliably , we are not far from mind reading/projection :P
" which is VERY similar to what our brain does"
- Not really. Our brains probably don't need to "reconstruct" a 3d environment. It's already perceived to be a 3d environment, no need for "reconstruction"
This channel will live to see the day when the pace of progress in this field will exceed the speed of publishing and paper discovery.
The format may have to switch to a statistical approach that samples results from multiple simultaneously published papers to depict the state of the art. aka "paper transport"
Then Nvidia will publish work on a hardware accelerated paper transport resolver that produces "perf"s at a rate faster than this channel. The papers will move so fast we won't be able to hold onto them
This is the paper singularity
There are already neural networks that can learn by reading papers. And other neural networks that can compose videos. Not much more is needed.
This is funny but could also end up being true WHY NOT
Very Noice 👍
Two Minute Papers: a better paper in every 2 minutes 😲
In a few years when this can run at 60hz+, all you need is a few cameras in a space and you'll be able to use VR to insert yourself in that location :O
That will be completely bonkers!
As I understood it, it takes a couple of seconds to create the "render" of the scene but when it's done it is in fact running at 60fps. So I guess you can just have a loading screen or something before you get to see anything
Imagine ditching the equipment too and having the AI embodied?
could even be next month lol
This NOT creating a 3d environment, or even a single 3d object. This is creating a smooth track of 2d images. It's a very cool technique, but if you're imagining a game where you can do anything more than ride a roller coaster or something like that (and not change the direction of your camera) then this won't be applicable to that game.
Yes, with one remark: it will be done with ONE camera.
Imgine rendering only 10 frames of a 100 frame animation. And then feeding it into this new AI. You'll finish your render 10x faster. Thats amazing!
Is it possible to apply this on a video footage? It would be mind blowing to have this in a VR video player. Current VR video players play a video projected inside a sphere. Which not a real 3D, because once you start moving your head in 6DOF it breaks the immersion. But having a dynamic 3D mesh which would be used for the video projecton instead of a sphere, would be mind blowing!
the are so many applications to this, in VFX too
If that‘s the case, we‘ll soon be able to walk around in movie sets and possibly assume the role of a character, just like the book Ready Player One predicted.
@@cbuchner1 bruh Imagine being able to see boobs from other directions like that 🤯
@@Danuxsy and it it's gonna be the most rentable application of it
I'd argue that machine being able to interpolate between photos is still far from understanding the actual geometry and synthesizing the scene from ANY point of view, as VR does require.
I am eager to see the practical application to be available. As an architectural historian, it would help documenting architectural heritage much easier with more simple equipment!
But there is better laser scanner now to scan whole building.
I have seen an episode about that in net geo
@@Adhil_parammel lasers are cool. Getting a near-perfect render with a phone camera & 2 minutes is objectively better. Especially if the site is hard to get to or not very secure. even if the laser thing is higher quality the lower cost & higher accessibility of this technique would still be mighty usefull.
Careful. Remember that what isn't in the source photos has to be made up. Great for many applications, but not so great for archival and historical research. Is that the actual gargoyle , or invented from training data from all periods of history?
@@virutech32 Also one could use small lightweight drone to take photos of places where using laser scanning is impossible.
@@stub42 It's a valid argument, but the same is true for a laser scanning, and it requires human clean-up. After photo scanning with this net there also would be validation process.
There are a lot of amazing two minutes papers, but only a few standalone software are available for this (like the ones from Topaz Labs). When will this technique be available easily for everyone?
It can take several years to ten years.
No matter how great the performance is, if it is a specialized single function, there is little motivation to offer it as an easy-to-use application.
If this feature can be further developed and used for all kinds of surveying on construction sites, people's productivity will be visibly improved.
@@B0A0A Why the society cannot simply into Kickstarter some developer?
@@brexitgreens Nvidia don't need Kickstarter they will loan the tech to game devs to earn royalty
The method is already available on GitHub. For commercial use, the authors say that Nvidia should be contacted. So it is a matter then of going through a sales process to generate interest between stakeholders, agreeing on pricing, and going through legal. Only then would a commercial agreement begin. And if the goal was a graphical interface you'd have to give ample time to develop this, bug test it, and run perception tests to ensure that it is user-friendly. It altogether takes a while and can explain the lag between the newest results and easy-to-use graphical programs like Topaz Labs.
I work on a sales team for a commercial scientific image processing software. I suggest new ideas to our R&D regularly. Maybe 2%-5% of ideas are accepted and end up in the product. For these features (which are mostly low-effort high-reward due to cost considerations), it often takes 2-3 years. And that's when we already have an application and team to build the new feature in to.
@@mustardofdoom Thanks for the detailed explanation, it really puts things into perspective
For the program that showed more detail when zooming in: was it an AI that filled in missing detail or was it an AI that made the high def image less “costly” by simplifying the image when zoomed out?
The AI created the detail from that one single grainy, almost black and white image. It is so good at creating the next layer that it can procedurally generate ever increasing depth of image. Pretty neat stuff.
From my understanding, and i dont think in this case he explained it vary clearly... An AI seems to take a large image and train another AI with it in order to reduce the size of the image since the new AI will then be able to fill in the details based on its training. So each image would contain its own AI or be paired with one specific to it. So long as the end result is smaller its a win. Though the explanation like i said is lacking and it would have been great to see the original file size VS the end results file size.
@@jessiejanson1528 Well thats dissapointing in a way. But it could be used for compressing data immensly.
Absolutely extraordinary.
My first thought was that "Now we know what those old CSI shows were using to zoom into their videos to find important details" Haha
Yet also, does this mean from a couple of photos we can now create 3D models which could plausibly be printed? Effectively advanced Photogrammetry?
As cool as this tech is, you'd have to be very careful if you wanted to use it like this.
The AI doesn't know the truth, it's just guessing one possibility of what could be true, based on the limited amount of information it was given.
@@EVPointMaster I assume you mean the CSI idea. Yes that is a joke.
With regards to the 3D printing functionality I assume you'd still need to do some work to get a useful mesh. However it looks really close to something that would be useful.
@@EVPointMaster youd like to think that enhanced footage was never used in criminal cases, but just look at the kyle rittenhouse case, the prosecutor submitted a frame that was an interpolated and upscaled image from a video. in order to try to make a very specific claim of kyle rittenhouse doing a specific thing. for one frame. it was allowed as evidence and even enhanced it was a blurry mess that showed nothing specific, but the prosecutor was allowed to claim it showed him aiming a gun at someone.
what caught my attention is the neural representation thing. Can it be used for image compression? I imagine there's lots of room for improvement from other current methods like jpeg, which doesn't really understand the image
great idea! It would also be easy to generate data. What is the state of the art on this?
Nvidia maxine exist as neural video compression
I believe this is what they are aiming for. Extreme compresstion for photos and video while preserving the important details. Say a video of a footballer kicking a ball inside a stadium. The important details such as the facial expression and the actual movement are preserved. Non-important details such as the field, the spectators, the roaring sound can be compressed. During playback, these non-important details are then procedurally generated. This is somewhat anolagous to how we stored information in our brain.
Can you imagine being able to apply this to table top gaming maps and have the ai fill in the game world as you zoom in!
And then turn it into a photo realistic image. What a time to be alive!
What exactly would be interesting about that?
@@Ginsu131 A lot of time can go into making the details of a map in table top rpgs it would be awesome to have an AI be able to fill in that detail as it would speed up production and take a lot of mental energy off the GMs.
Imagine what cheap GPUs will be able to do in the future
nothing because new cheap GPUs won't exist anymore, they're all going to keep costing $700+
@@SnrubSource 3090's will be cheap
@@ChuckSploder trust me it'll stay the same and newer gpu's only get more expensive
Mining bitcoins I suppose.
My gt 1030 can run tetris! In 10 FPS... I meant 10 Seconds Per Frame
nvidia GPUs of 2021 are heavily optimized for matrix multiplication, with a mode for sparse matrices. It is mostly used for up-scaling, but can also be used as great noise-filter (audio and raytracing-denoiser) ray-tracing is also useful for more realistic audio.
The more general applications of this are pretty much anything with linear-algebra, where ever you multiply 2 or more matrices, most likely rootSolvingInverse*projection.
Sorry, havn't dive into the paper bcoz am totally noob in AI, any idea which graphic card they r using in this paper? Am bout to get a new rig with RTX 3070, but had put onhold due to the progress in AI for photogrammetry is too fast..
@@danielng1765 rtx 3090
@@danielng1765 the RTX30xx series cards are for private house holds, nvidia also makes graphics cards for datacenters (commonly used to train ai models or for things such as sorting long prioity-lists for the internet). The server-rack architecture is similar, but the bandwidth and parallelization is much higher, and it scales the price to "millions per unit".
@@danielng1765 for the common high-tier-gamer (or indie gamedev and ai-code-learner), the playstation5 has a gpu, that compares pretty well to the GTX 2070, This is 2020 technology, significantly slower memory access, significantly worse for motion blur than 30xx cards of 2021. GTX 2070 card is currently still a relative good value (damm all the cryptocurrency scammers/thiefs) to put in a new pc, that costs up to 1100usd new. A GTX3070 is significantly better (>2,5x of a 2070), and commonly fround in new PCS that cost over 1700 usd.
the "ti" suffix makes a significant difference and is not to be overlooked (commonly means: 24% faster memory, +20% power consumption +50% more cuda-cores.) This in general seems to appeal more for higher resolution, displays (up to 4k)
@@ollllj thks for the advise, I initially decided to get RTX 3070 due to available performance comparison based on agisoft metashape. RTX 3070 has the good cost/performance balance compare to others based on their standard samples. Shall check 3070ti as well if the result is available.
I wonder how long until we will have interactive movies? Where they just film from a few angles and then at home you feed that footage into VR and can move around within the scene as it unfolds.
I'm not completely clear on what the gigapixel-image one is actually doing - taking a gigapixel image, building a model, and then keeping only the model (which is smaller? how much smaller?)
When I was about 17 in 2011 I tried making my own version of what is essentually NERF to make buildings into 3D models really quickly just using photos
How did the results look in the end ? Would love to see that.
I remember Microsoft had a program that did something similar for popular tourist destinations, forgot the name
@@DonC876 It kind of ended up looking a lot like how google maps does their 3D models just a lot more manual and a lot more buggy but it did kind of work
@@StolenPw do you still have that program, and could you make a video of it?
@@URB4NR3CON Photosynth
This is insane. I've watched many of your videos, but this is the only one so far that *really* seems like straight up magic.
Astounding - I can see 3d scanning apps / software will pretty soon become “trivial”on phones etc. … which itself is astounding never mind all the other stuff !
I can't wait for research like these to end up in game engines, truly incredible!
Making a game in the future: "Hello Siri, create me a game that plays like Doom 12, but with Disney-Characters that look like musicians that are currently in the top 20 charts and this all should happen in Tokio at daytime, raining. End boss should be a giant paper."
that's what OpenAi Codex and Github Copilot are doing (not at this level, but they are quite good)
Papers so fast, they won't even let ya hold them, damn. As always really informative and fun video sir.
Would be interesting if there was a follow up showing us where these technologies are eventually available in actual commercial products
But I know this isn't the scope of the channel.
I can already tell you:
- Video games production will just be... wtf-level of workflow improvement ;
- Google Earth, which already have bazillion pictures, will either be banned or transformed into a life-like simulation ;
- Military analysis and transport will make use of the above (Google Earth) ;
- Urbanism, real-estate construction and sales of real-estate ;
- Back to Google Earth, the applications are just crazy: imagine a traffic application where you can get a life-like visualization of the traffic, making it seem so much more reliable to the viewer than a red line on the road?
- I'm going nuts thinking about all the crazy ways this will affect us in our day lives.
... aaaaah, just the VR-Google Earth life-like simulation is utopian/dystopian enough.
@@themore-you-know Im working with google earth studio.. combine that with blender -and Video editing .. you can create a realistic 3D scene. The problem is close up... the textures/models are so broken.th-cam.com/video/UrvKsuDSaNE/w-d-xo.html
Is it really rendering genuine 3d model from those images, like photo geometry does, or makes some sort of detailed map of viewing angle's and lighting of this object???
So I am curious; can the AI transition past context knowledge to a current context knowledge, and retrain on the fly? For instance, if you see a painting of a woman and contextualize it as 'old painting of woman with no background'. Then, as you zoom in switch the context to the current frame and re contextualize 'old painting of woman's head'. So portrait -> head -> face -> nose and eye -> eye -> pupil -> retina -> optic nerve -> light sensing cells -> cell wall -> DNA
At some point you are only providing that last context to the AI to create the new image from pre-trained understanding. i.e. you reach a point where you are not caring about the reference material anymore.
I got that from this explanation. The layer that it produces is good enough to use as input. So, yeah at some level it is creating images based on pixels that it generated.
i imagine its trained based on images that exist, and learns what detail should exist in such an image, and then fills it in. so you can zoom in on a city because it knows what buildings, cars, roads, people look like. but it wont understand that zooming on on a persons face would reveal skin cells, unless it was given high resolution photos of skin, that transition to the microscopic level to reveal cells.
also, if its given context that its a painting, at some point you would want it to assume its zooming on on oil, not assume that the artist was able to paint in individual cells in the painting because its a painting of a face.
What a time to be alive, indeed! And your documentation of these accomplishments will be preserved forever. Thanks Doc 😊
I'm curious about how accurate the models are. Photogrammetry has not had good accuracy in the past, I wonder if this changes that.
Where can I use it for interpolation between images?
2 months from now it'll be finished training a week before we even start.
NVIDIA has been going crazy with these AI papers
Great video! Thanks for all work! However…….. Is it possible for people outside the research team test it? Are all of these available for us mortals? Can we test this? Or these will only go to Nvidia? Sorry for the dumb questions…
The code is on GitHub! Dunno how hard it would be to actually get it running, and the GitHub page says you need a Nvida graphics card, but it is available to the public if you know what you’re doing!
@@thegeekclub8810 I got it running just now. I'm on a GTX 1080 and it runs quite slow, but I can train the fox one and look around in low res. Works pretty well.
Setting it up is quite impossible if you don't know what you're doing. But you can always try.
I tried training one of my own datasets but it throws an error. Still working on that.
edit: The GTX 1080 is probably much slower because it is not RTX . the Geforce RTX line of cards have tensor cores which are more optimized for this kind of job and my 1080 has none.
@@daanhoek1818 can you please share the link?
@@thegeekclub8810 Thanks for the information!
Thank you for making Two Minute Papers. Your excitement and enthusiasm is infectious! And I always find myself getting more and more enthused as I watch your videos and see the pace of progress. It's nice to see a channel that just... Makes me feel that the future can be bright.
I remember hearing how, in 2019, AI was going to be in another "winter"... same thing said in 2020, 2021... but this really is just the dawning of it! :p That gigapixel compression, in particular, will be adapted to "memorizing" the input-output map of software, so that you can just use a look-up table, instead of computing values in code. It'll let us fix bugs by flipping a bit on the look-up table, without needing to find a way to "correct the software"!
Google Street View with this would be mindblowing.
If it can also convert it to clean quad poly it will be amazing
Yes retopology and UV mapping is the most annoying part of 3D design. We are close though 🔮
This is unbelievable. Phenomenal work, can't wait to see the applications! Especially in photogrammetry.
This is astounding! It's so fast you could train it on the user's hardware, no need to include a trained model ;)
How would I be able to use this technology? Is there any program that let's me take photos and have 3d models really fast like in this video? Thank you.
Impressive the time scaling in just one year ("O"). I'm expecting a generalized model with a "kind of 3d segmentation" to change parameters in materials or add physics for the future
...in any case this is the first step for the "synthetic" rendering. Amazing ♥️
Musicians would call what you keep doing with your voice a "mordent".
But seriously, I've been impressed by the pace of AI predictive algorithms in recent years, and it keeps getting better.
Seems super cool! Don't fully understand what the technique is doing exactly (in terms of which inputs and outputs), though.
I find this with most of his videos
It interpolates between given photos of a scene.
Basically they developed a multiresolution input encoding that simplifies and allows to highly parallelize the task, taking full advantage of the GPU. They apply it to different techniques (NeRF, Neural Gigapixel Image, Neural SDF, and Neural Volume ).
From the paper website:
*_Instant Neural Graphics Primitives with a Multiresolution Hash Encoding_*
_We demonstrate near-instant training of neural graphics primitives on a single GPU for multiple tasks. In gigapixel image we represent an image by a neural network. SDF learns a signed distance function in 3D space whose zero level-set represents a 2D surface. NeRF [Mildenhall et al. 2020] uses 2D images and their camera poses to reconstruct a volumetric radiance-and-density field that is visualized using ray marching. Lastly, neural volume learns a denoised radiance and density field directly from a volumetric path tracer. In all tasks, our encoding and its efficient implementation provide clear benefits: instant training, high quality, and simplicity. Our encoding is task-agnostic: we use the same implementation and hyperparameters across all tasks and only vary the hash table size which trades off quality and performance._
_Abstract:_
_Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate. We reduce this cost with a versatile new input encoding that permits the use of a smaller network without sacrificing quality, thus significantly reducing the number of floating point and memory access operations. A small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through stochastic gradient descent. The multiresolution structure allows the network to disambiguate hash collisions, making for a simple architecture that is trivial to parallelize on modern GPUs. We leverage this parallelism by implementing the whole system using fully-fused CUDA kernels with a focus on minimizing wasted bandwidth and compute operations. We achieve a combined speedup of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds, and rendering in tens of milliseconds at a resolution of 1920x1080._
@@Pixelarter Thanks! Are the inputs to all of these tasks the same (2D images?)? Cause the tasks sound quite different, so it's cool that their encoding works for all of them
@@tobiascornille No. Some are 3D, some are 2D. From what I glance, the hash they developed just encode the position of the input at different resolutions, concatenates them and apply some transform, and feed as input along regular information.
I wonder if this could be used to make 3D scanning through photogrammetry incredibly fast, since it can detect the geometry of such few pictures SO fast.
This has me excited for the future of video games and other simulations. I obviously an enthralled about the idea of a "Full Dive" video game like Sword Art Online. Seeing things like this, Unreal Engine 5 with Lumen and Nanite, AI in general, as well as what Gaben and Elon have been doing for neural interfaces, as he excited for a future where we can be fully immersed in whatever scenario that we want.
It's definitely a pipe dream of sorts. But I can imagine a future where we have insanely detailed, low cost simulations, as well as the ability to dive into these worlds with all of our senses. It is a driving factor for me to learn more about ML, AI, and video games.
same here. Imagine how good GTA Earth could be!
Which Two Minutue Paper video does the clip at 0:30 come from? I remember seeing it in his videos before. I was interesting in looking into it further. Thanks to whoever can help!
Hey, so I'm kinda new to this whole AI enviorment but this looks amazing! Is this public? Like can I upload some pictures and the program maken an 3d object out of it?
There is a link in the description to it.
Crazy. Can’t wait to see applications of this come out.
We're at the point where it has all become magic to me. One program does all this?? How???
Deep learning abstraction... first layers do the stuff, just plug your application on the top.
The same way your brain does, evolution.
Yeah they give descriptions of a scene or draw primitive zones like this blob is water, this is land, this is trees, this zone is sky etc and a.i paints a picture. That is magic indeed. One upping that is telling a.i to do a task or write a program and it does it in a matter of minutes or hours.
It's actually mindblowing how useful this will be for static environments.
Its interesting how you have completely moved away from any attempt at explaining the papers you present. I suppose that makes sense for the 2-minute format; I always scroll to the results section first anyway. But a tiny bit more depth and context wouldnt hurt. I was assuming this papers content must be all about how to leverage a whole datacenter full of GPUs in parallel; but its even more mindblowing to see their abstract mentions a single GPU... now thats a bit of detail that would really add to the presentation of this work.
Wow this is INCREDIBLE! I mean, there's just no comparison between the 2-month old and 1-month old papers. Unbelievable how crisp and smooth everything comes out to be.
So could we use something like this to create a real time driving simulator? The AI could use the input data from something like google map's street view and edit it into an interactive 3D environment.
Out of all the AI shown in this channel that has blown my mind, this is probably one of the most impressive.
If it only takes 5 seconds to render a still object, and that same rendering speed is applied to each frame of an actor's performance filmed in 60 FPS from multiple angles, then each minute of footage should take about 5 hours to render. If you're a small game dev studio, that means that you can basically feed your dedicated workstation a few minutes of footage and leave it running overnight, and you'll get the final animated asset rendered in a day or two. What a boon this would be for Myst-like adventure games!
EDIT: Actually, not even a day or two! If it takes 2 seconds per frame, then even five minutes of footage would be rendered in just 10 hours! You could leave it running overnight, and it would be done the very next morning!
It takes 5 seconds to train nn, and evaluation (rendering) is real-time. Acording to paper this animation (5:02) can run 133fps in 1080 on rtx3090
that would be quite heavy to replace your model on every frame of animation.
@@magen6233 Idk I did not red whole code yet. From what I read, I understand that synchronization is done with m_training_stream (for training) and m_inference_stream (for rendering) (these are cuda's streams and are used for runing kernels asynchronously). Whole magic happens in Testbed::NerfTracer::trace() function and train_nerf. I think that they are coping sth but for sure not every frame (update_nerf_transforms function, copy sth every training step).
How can I get started implementing this in my own workflow?... I just figured it out click on the read the paper link and then it will take you to the git hub page...
Sir, your hype is both awesome and fully appropriate in equal amounts. I learned about NERFs only a few days ago. You don't exaggerate when you say that this was science fiction only four years ago. Back then, I intuited the possibility of AI photogrammetry - in the same way Star Trek intuites warp drive and the holodeck. And now - it is here. The tech straight out of my dream. What a time to be alive!
What a time to be alive.
This is just amazing but, other than image compression, I fail to imagine how this technique can be integrated into existing software
Dude you can create 3d models from photo alone
I'd love to use this on Google Earth. Imagine you can slowly walk down the street instead of zipping down a few meters at a time
Photogrammetry, self driving vehicles, and, an interesting and maybe not so feasible one, pre-rendering a complex scene and simplifying it to be displayed on lower end hardware using this technique.
From what I read, it produces an SDF, which is super awesome because they're cheap to render and can offer a lot of mathematical meaning, e.g. with self driving vehicles
@@Hexcede yes it creates 3d models, but how would you integrate it into existing software would be a bigger challenge. 3d models to be used in applications need to be highly optimised as well (topology , different maps and stuff)
I get that this has amazing and varied applications but I fail to see how it can be seamlessly integrated into existing software
say blender or reality capture, unity, etc
@@SYBIOTE well obviously it would need other processes added to it. for say if you wanted to use this to create cgi characters. youd probably start with a person in minimal clothes, then add the clothes on top after.
if you want it for map making, then youd remove everything you want to be interactive and model those objects separately, so the level, the walls, floor, etc is created using this technique, and you dont have to cut things out of the model. combine this iwth teechniques for removing the specific lighting condition and being able to use an in engine lighting, which weve already seen in other papers.
these things are all literally just research papers. how they get applied to software that is end user friendly, even for professionals using complex software, is years down he line, and will likely require companies like weta to create them.
We'll soon be able to walk around inside our favorite movie scene sets. The implications of this are insane.
Its becoming easier to accept we live in a simulation.
USE THIS ON THE PATTERSON BIGFOOT VIDEO! (The classic one for the 1970s)
Can't wait to see that reconstruction applied to the Google Street View or similar
Yes, I also immediately thought of that when I saw the video.
Like GTA in the real world.
I'm wondering about the storage difference. Images are huge (there are a LOT of them), but this is probably way bigger. Or is it...?
@@mikiqex maybe the neural network could generate the intermediates in real time someday?
I'm a bit confused, what were the inputs for the objects being generated here? Was it a similar set of images to those shown at 0:21?
This channel latelly seems more about black magic than science.
We are entering Dr Strange realms here 😂..
with that scene in paris i had an epifany about this being used in google maps🤯
imagine how much better it could become
It’s incredible things have come this far. I’ve had fun playing around with Stable Diffusion, but I know that’s only a prelude to what we’ll see in the coming months and years.
Incredible stuff to say the least. Imagine having a few pictures of your old house that turn into a full VR space in a few seconds...
Good introducing! BTW I am quite interested in the scene of restaurant "Le Petit Coin" in 5:03, does anyone know where it is now, or does it not exist anymore? Any information would be appreciated.
Man do I love everything you put out! Thanks for the time and effort you put into these videos!
WHAT!!!!! This is amazing. What a time to be alive
imagining importing all the google maps drive-by photos into this AI
Wow, this shows deep learning is great for large data if we teach they properly!
This is incredible!!! The potentials in VR, AR alone ...
Amazing work. I cannot wait for a smartphone implementation.
ok maybe I will end up studying neural graphics. everytime I am back at this channel I am excited and I want to read about this and study them in length for a while.
im looking forward to near instantaneous Camera based Photogrammetry
I wonder what the next big thing in AI after deep learning will be. The pace is already brutally insane. How fast will it be with even better methods
Already have it downloaded! Going to use this!!!
I couldn't hold on to my papers for this one. Incredible!
I love it whenever he says ''yes''
This makes neural rendering in games so much better
Here's an idea for a project (I don't have the means to do it myself) : Train an AI on 360° photos, but have the AI try to fill in a missing hemisphere or even larger region of the sphere, at random angles; vary the size of the sphere region that needs to be filled, and once it can fill a very large percentage of the sphere, you might be able to have the AI correctly guess what's behind the camera on photos taken with non-360° cameras! :D
So realistic avatars for VR gaming using your phone? Damn, the meta universe will be salivating over this paper.
This is truly incredible. Unbelievable! What a time to be alive!
I hope we can get a piece of demo software or even a retail product with this. Would be so cool to go and snap a bunch of pics, throw them on a gaming pc to make into 3d objects and environments.
Wow. Absolutely amazing.
I could barely hold on to my papers!
Your passion is absolutely contagious. This is amazing!
Wait so this is like a ray marches solution or something to that caliber, absolutely brilliant, just wow.
This learns the geometry of an object about as fast as I can recall the geometry of an object I've seen. Crazy.
Great video as always!
Every time the stuff you review amazes me. It is, indeed, a great time to be alive.
Maybe a dumb question, but how can we compare the computation times if we can use a machine that can do the same computation faster? Shouldnt there be a special metric, that takes in to account the computational capabilities of a computer and the time it takes to compute the simulation?
Hi Károly Zsolnai-Fehér, you should do a longer-format video that goes into more detail of how these results are achieved! I would love to learn more about what technologies and algorithms went into these achievements.
2 seconds??!! It´s here, it´s now, let´s digitize the entire world! :D
The 3d world this method creates really remind me of the GTA V world. You could take stuff from that AI and just put it in the game, probably without much further editing. That's impressive.
It makes me think of consciousness. How people say that everything is one giant whole, a rectangle mesh, that the self, the i, the A.I. divides into a multiplicity.
in the future, combining this with the doodles too photo realistic scenes..
ending up building realistic 3D environments from quick thumbnail sketches would be amazing!
Imagine movie making with this, or stage performances played after the fact in VR. It sounds like if you have enough cameras, like the original matrix setup, you'd be able to process possibly 1 second of film or movie in about 1-2 minutes...that's amazing! To have that kind of viewer angle independent data sounds like the dream of a VR holodeck style experience is closer than we think!
Wow this is so cool. What a time to be alive.