Thanks to all the amazing viewers who have been engaging with this video! One question that popped up is whether Nanite’s software rasterizer runs on the CPU or GPU. The answer: Nanite’s software rasterizer is GPU-based. 🎮 It uses compute shaders to handle small triangles (clusters with edges smaller than 32 pixels) and dynamically bypasses the traditional hardware rasterization pipeline. This GPU-based approach ensures performance scalability and keeps the rendering pipeline efficient, even with highly detailed scenes. This design minimizes CPU-GPU overhead and leverages the massive parallelism of modern GPUs, which is a core aspect of Unreal Engine 5’s real-time rendering magic. Thanks again for the support, and keep the feedback coming! If you’ve got more questions, feel free to ask-I’ll dig up the answers. 😊
I chose to go with voxels + vertex colors + auto lods for GODOT, I use a quad per pixel with no uvs and textures. From what I seen with benchmarks, shaders and textures are actually much more expensive than poly counts. So I will see in practice how well that will work.
Looking forward to seeing it in action. I've never used voxelized assets, apart from using it in parametric modeling for 3D printing, but I did not research the use of materials when it comes to voxelized geometry, is it really that much of an issue?
Thank you for your time spent here friend! Both teaching me some new things like the clockwork logic of screen space reflections. Whilst helping to reinforce information I had already digested.
Thanks a lot for the overview! What's your opinion on using nanite to render those 30 million poly statues, versus rendering them with traditional lods, view space culling and gpu instancing? Another question: how do you control how dense the "merged" mesh nanite uses for the one geometry draw call? Is there a theoretic maximum depending on the rendering resolution, and is it guaranteed to be handled by the GPU?
@@mushudamaschin2608 realistically, that one contrived example would probably run just fine on modern GPUs. Do note that instancing and LODs are mutually exclusive. LODs create different geometry, instancing only works for rendering the same geometry over and over. However, realistically if you instanced the models based on a couple of lods, or batched the models all together as they share the same material and would presumably be static. Especially if you were aggressive with LODs because realistically you don't even need a million polygon statue close up.That kind of detail you should probably bake into the model, It makes the model much cheaper to render and baking the data into the model texture maps aren't going to decrease quality all that much. In any case, it would probably run fine even on a traditional renderer. That isn't to say nanite is worthless, however, I'm pretty sure it would be better used for VFX and cutscenes than actual games. The high performance penalty isn't a big downside when the alternative is a 10x performance drop compared to nanite As for the merged mesh, there's probably a bias somewhere. Finally, yes, it should be handled by the GPU as any GPU that couldn't handle that calculation also by necessity could not handle nanite. Nanite from what I understand relies on compute shaders. Which would also be how it's handled on the GPU.
Thank you mushudamaschin2608 for the clarification. I think that instancing helps in lowering the complexity in the scene from the point of locating the clusters in the tree or the DAG. For example, all these 33 million polygon statues were used as isntances and placed at the exit in the Nanite in the Land of Lumen Demo, as was discussed by the creators. But as is stated, when you have large occluders with a huge amount of polygons, like photogrammetric or movie est asset quality, sometimes you just want to use them, without manual prep work, like normal baking and that's where Nanite should shine. Someone raised a point of asking are there any games that use Nanite, apart from Fornite?
I wanted to tackle the aspect of lighting and shadows in a different video, but it is a great topic and I can't wait to research and share what I've come up with. Thank you for the suggestion and for watching :D
Thank you very much, there is a saying - If you can't explain it simply, you don't understand it well enough. Even though it lasts for 28 minutes, hopefully it is understandable 😄
Thank you for the support. The next logical one would be about lighting, but it will definitely take some time to explore and prepare. Until then, maybe you can check out the video about virtual texturing, which was basically the concept behind virutal geometry, like Nanite, or about optimizing the pipeline?
Nice overview, man! From time to time I rewatch that Siggraph talk trying to understand it a bit better... It's a really complex talk that requires a lot of background knowledge to grasp... Just one observation. The software rasterizer happens in GPU, not CPU. It differs from hardware rasterizer by utilizing compute shaders instead of the dedicated raster bit of the graphics pipeline.
Thank you for providing feedback. Yeah, I came back to the lecture every once and a while, hoping that 1x speed and really committed listening would help me understand. But I was lacking the terminology and the experiences I found later on. However, still I can watch the lecture and not get everything, so many nuances. For the software rasterizer on the GPU instead on the CPU, it seems I got my facts wrong about them, or I implied to myself that if this was hardware, software goes here, bassed on prior information that software is tied to CPU. Taking you very much for pointing it out, much appreciated.
Amazing presentation! It really sets the stage of information for standard techniques along with the details with ninite, very lovely. I don't use Unreal so I have a question, but I am curious of the technology. Does Ninite allow for tweaking of the number of polygons per cluster, or how "deep" the tree goes of clusters? Or is the technology more so static in editor? Just curious of how it would impact performance. Glad you share use cases, again great work.
Thank you very much, I'm glad you found it insightful, even as someone who, as you said, is not an UE user. I think you can set the numbers for the error and based on that the clusters will form, but I believe they always strive for those 128 triangles following the logic of 128 pixels for virtual textures, which in this case is where Nanite wants triangles to be the size of pixels, I guess
Thank you for pointing it out 😄 I also believe that these things are important, but not because you get the knowledge, but because you also gain a perspective as to where it all came from and where it can possibily go.
I still don’t quite understand the occlusion culling bit - specifically: The Z buffer of the previous frame is used to do a first culling pass, makes sense.. but then you end up rendering the Z buffer of the current frame to do a second pass of culling.. aren’t you just rendering the scene at that point? Like to generate the Z buffer of the current frame, wouldn’t you basically have to push every single triangle to the screen? Only to then cull a bunch of them and AGAIN render all those triangles?
@@liquos I can actually explain that. So on the GPU you aren't required to use a full pixel shader and the GPU has a fast path for depth assuming you don't do anything funky with the pixels. (Like discarding them or manually setting depth). It's usually referred to as Early Z. Because of this, much less state gets changed, if any at all. The actual drawing to a framebuffer is extremely fast.
It's shading, blending, post-processing, texturing, GPU state changes. All of those eat up performance. A depth pre pass does none of those. It is only for opaque geometry, alternatively geometry that has transparency enabled for say grass billboards. (But this actually oftentimes permanently disables early Z optimization until the buffers get swapped, so you would do this in a separate pass afterwards as it's less efficient.)
If I understood correctly, you need a certain amount of time to create the Z buffer and a certain amount of cache memory for it. If you wait for the GPU to create the Z buffer you are creating overhead, meaning there is idle time between the CPU and the GPU. So, in order to solve this, we take the previous frames Z Buffer and use that to test all the bounding volumes of the meshes that were determined to be the occluders. Since the occlusion query about what is an occluder can take a lot of time, if you test everything, this cuts down on that portion. When you take the previous frame's Z buffer, but at a lower resolution and the current bounding volumes and compare it, you can create a rough estimate of the current frame's Z buffer and then just check the new meshes or the meshes that became visible in the current frame, and test their bounding boxes for occlusion. There is a short text here, medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501 and here www.nickdarnell.com/hierarchical-z-buffer-occlusion-culling/ I think I got a good understanding of it, but would like if someone can clarify or correct me if I am wrong. :D Sorry for the longer post
Thank you for another great video, much clearer now after introducing all the major concepts 😉. One question though - is the occlusion volume used for object based occlusion and bounding volume used for image based occlusion - like Z buffer?
Great observation. That is my understanding as well. In one case, we use a occlusion volume, which is snuggly fit inside the boundaries of our mesh so we can always be sure that we are not occluding more than necessary. In the other case, when we are using bounding volumes, we want to check the entire mesh to know if even one part is visible or occluded so we can overlap it with the Z buffer. Thank you for bringing it out 😀
Was very good summary video. There is also far plane clipping (culling) not done by Unreal - with exclusions for permanent objects. I'm trying to get this exact thing to work myself in another engine.
Yeah, there was also portal culling which is really an interesting thing, even though it can require manual setup. I found a lot of understanding in the series from thebennybox here th-cam.com/video/8xgb-ZcZV9s/w-d-xo.htmlsi=yWWAG79ulUeRl8ry
I agree, threat interactive seems to have stirred the pot with this Nanite application issue and bringing everyone to discuss it is definitely a win, regardless of what the outcome is
@@markitekta2766I think he would be amaze because it's moving image, the legends say he angrily smash his hammer saying "now move" after finishing his moses, he wasn't just trying to do life like sculpt, he was trying to make life, he was pissed that even achieving perfection would bring sculpture to life
I made it to be seen and used for learning in classes so the editing is not as it could be, with all the bullet points taking up a huge portion of the screen. I read the comment and hopefully other people will as well :D
I don't understand why people obsess over David when the Egyptians were sculpting figures 5 times bigger thousands of years before. There were statues called Colossi, some were made of bronze. It seems to be some strange bias towards modern European culture. Do you really think ancient people were incapable of sculpting or observing veins and muscles? It's so silly.
Thank you for pointing it out and sharing your opinion :D I just needed an example for the intro and I believe that any example would have sufficed. I found this more relateable to me and the audience I thought I have, but if you have some examples of ancient sculptures, please share them, I'd like to learn more about it :D
They're big, yea. But that's about all we can see. They're so damaged from time that any detail they could've had is gone. Ancient greek statues have stayed in much better condition and so are more appealing for most people
That is a great observation. There is a book called - Flatland: A Romance of Many Dimensions by the English schoolmaster Edwin Abbott Abbott who explores this very notion of lower dimensional creatures being visited by a higher dimensional creature. It is understandable and relateable, because it is a square getting to know a sphere. You can read about it or watch a short video on TH-cam regarding this topic, kinda puts everyting into a different scope of thinking.
Only good thing about nanite is that is saves time. But it should not be like it. Also for stuff like terrain you really could just develop a retopology tool that remeshes it once and creates a full quad mesh that i later easy to adjust either automatically or by hand.
@@wydua Thanks for sharing your insights. As they say - good things take time, so if manual labor and adjustment with tedious tweaking is what gets great performance down the line then it is worth it, right 🙂
@@markitekta2766Yeah. It's honestly really bugging me off that in modern days the games are just made as fast as possible because it's cheaper. It seems they forgot that you can't rush art.
If I understood correctly, you are looking for specific numbers before and after using, that is a good suggestions, I'll try to post something soon enough ;-)
@@markitekta2766 Ну, возможно можно не большой проект попробовать использовать. Просто я делаю свою игру, и тут трудный выбор как поступать. Использовать наниты или ЛОД. Если геометрия простая и плоская, то ее как будто нет смысла переводить в наниты. А вот фольяж как показала практика сделать нанитами стоит, это дает хороший буст производительности. В общем хочу сказать, что непонятно как делать правильно, надо столько всего учесть, без примеров иногда это просто по интуиции :)
@@markitekta2766 Ну, возможно можно не большой проект попробовать использовать. Просто я делаю свою игру, и тут трудный выбор как поступать. Использовать наниты или ЛОД. Если геометрия простая и плоская, то ее как будто нет смысла переводить. А вот фольяж как показала практика сделать нанитами стоит, это дает хороший буст производительности. В общем хочу сказать, что непонятно как делать правильно, надо столько всего учесть, без примеров иногда это просто по интуиции.
:DD I guess that would have gotten even more comments here, some would say why not, others would go for - beacause of realistic depictions, while some would say we don't, we can use less performance heavy approaches. The diversity in optinions based on reason and fact is always welcome to prosper the development :D
What's going on in blender at th-cam.com/video/RRKCqmctxLs/w-d-xo.html ? I've spent weeks trying to create a similar dynamic LOD system in blender and would love to see how somebody else has implemented it.
some comments in this video are really making me mad, people really dont know what they are talking about, they dont know how the tech works and where it is useful, guys if ur a game developer, you care about making the most performant real time game ever (as needed with esports titles) you offcourse would have billions of polygons in ur scene, and hence wouldnt benefit with nanite, nanite really isnt made for that, it is made for higher details while still keeping the game realtime (around 16ms per frame). Nanite really is a big thing in graphics programming, dont discredit it because some devs dont use it properly. (infact I would say UE5 is really not well optimised in the first place, they compile 1000+ pipelines for no reason, if you really really care about performance, you would be writing ur own custom renderer).
Thanks for sharing, I really appreciate your opinion. As a friendly suggestion, perhaps you shouldn't take other people's opinions as something that should be corrected or heavily debated. They have a different perspective on the topic which helps all of use broaden our horizons. We can only show which paths we think are correct, it amongst many, chose the ones to take. Hopefully they all reach the same place in the end. Having such a great comment section on this really brings a smile on my face as I see so many perspective I never though off, like the one you are making ;-)
If my understanding is correct, when you get significantly far from the overlapping geometry, the distance between the triangles or clusters becomes so close that it can be like an artifact that can occur when two triangle are overlapping in any general setting during modeling. Non the less, I'd like to hear other opinions regarding this question :D
@@markitekta2766 this seems like the best way of explaining it; you cram enough triangles in the scene you can either select relaxed culling and have significant overdraw or strict culling and have artefacting as triangles are removed to save time but there are so *many* triangles you eventually delete important ones through the culling Nanite makes sense on white paper to sell GPUs but ultimately is just a nerf overall.
I'm working on a project where the depth buffer is used to determine polygons a z distance greater than x, generate a very simple LOD using multiple objects at once (reduce draw calls) and then combine the materials into a "grouped" mip map. Imagine a scene with a castle in the foreground and trees & mountains in the background. My code when done would combine the mountains and trees into a reduced LOD and then merge the materials into a single material. I've been struggling with this second part. I've been thinking about some of these questions for years. On personal projects I have gone back to UE4 because UE5 has been terrible for optimization.
Thanks themeangene for sharing. Yeah, I though of HLOD right of the bat, which creates atlas materials. I was asking the same think for Nanite, like if it combines clusters and triangles, why not materials. Especially if we have virtual textures that can aid in only showing the visible parts. But perhaps the preparation of this atlas and the memory would impact performance, which is why they stuck with traditional pipeline for these.
Except both nanite and lumen look horrid in production with numerous artefacts, jittering and smeared images on movements. I'm not saying it's not a great tech and great progress, but without their myriad of issues fixed they are pretty much pre-alpha and should not be used in production. Source: every single production use-case of lumen and nanite out there.
Thank you for sharing, I was not aware of that. When I saw the Nanite video 3-4 years ago, I had no idea how it worked or why. I think I have a better understanding now, but not about its use in practice. If anyone has any examples, it would bring a new note to the discussion section, perhaps?
That is an interesting observation. Currently we have, if I'm not mistaken, over 10 000 cores in a GPU chip, each with a capability of running 3 billion operations every second. But still, if the pipeline is not optimized, a simple scene can cause latency or display issues. I always return to the quote of Jeff Goldblum from Jurrasic Park saying Your scientists were so preoccupied with whether or not they could, they didn't stop to think if they should.
@@markitekta2766 I mean, for large scene (10+ billion triangles) raytracing with 100 million rays should be much faster than entire scene rasterization.
@@DefleMask I can see logic in this, but I guess in the end we still have to rasterize it for display purposes, so perhaps they are trying to kill to birds with one stone, even though it can be slow at times?
If I understood correctly, it depends on when you use it. Under a specific polygon count number, it has diminishing returns, meaning you lose more time to prepare everything and render, when compared to the traditional pipeline. The entire Valley of The Ancient Demo wouldn't work with a traditional pipeline, where you had a room with 500 sculptures, each one having 33 million polygons. But for a simple architectural building or urban site with little details, Nanite would not produce better results. If you have additional resources, please share them here, I'd like to find out more about the topic :D
@@markitekta2766 It's not about polycount, it's about overdraw. Threat Interactive showed a 6 million poly mesh running 2x faster without LODs or Nanite. It gets to the point where all the Nanite details just become noise which needs TAA...which just blurs the detail anyway. Nanite is just a tool for devs who don't want to bake details.
@@Igivearatsass7 Thanks for the clarification. The things I gave here are based on theory, if there are practical examples that show otherwise, we should jusst implement a scientific approach perhaps. But 16 billion polygons cannot go through a classical pipeline, right? Also, overdraw happens if you kitbash a scene or use agregate geometry, otherwise, the culling process removes the unwanted clusters?
Threat interactive is not a good source for technical details, he has no background in computer graphics and has been trying to do 1-1 comparison when things aren't 1-1 comparable. I know this because I am very active in a graphics programming discord server and we have seen him beg for any numbers that may support his claims. Nanite overall scales better than non nanite tendering, if u try to render a few 100 thousand meshes, nanite won't perform as well, nanite shines when you wanna render millions of meshes.
Witha a proper 2014 with projection and baking you can have all those minute details into TANGENT SPACE normal map. you don't need nanite and draw subpixel triangles. To do that it takes more time to draw every triangles by themselves lmao. """"optimization""""
I think there was a talk, perhaps about Lumen, when they said that since one pixel can only display one triangle, the exploration of subpixel triangles is not the way to go. But I agree that if you take the time to optimize your scene with manual work and not automated approaches, you can gain more than letting the system handle it for you ;-)
@@markitekta2766 Also because PCs today are a lot more faster and you can get away with overdrawing most of the time. People should work on mid-poly nowdays and work on a global GI with PBR and you will see 100% resolution rendering with some multisampled anti aliasing in a crystal clear picture on your screen. Instead let's put boxes with billions of triangles just because.. let's lower resolution because somehow my game runs at 2 fps and upscale the frame with AI and "fix" all the artifacts with some vaseline, i meant TAA (which is also a cancer on performance.. > 1 ms is still a lot, you know how many things you can do with that infinite amount of time?.
Yeah, the scene really lights up due to overdraw when you use Nanite on aggregate geometry such as leaves. Imposters are also an interesting approach, as was seen in this video on 2:10 th-cam.com/video/hf27qsQPRLQ/w-d-xo.html Also, they talk about leaves in this video on 13:04 th-cam.com/video/eoxYceDfKEM/w-d-xo.html maybe you can find some suggestions there?
@@markitekta2766 Oh! I was doing a bit of trolling! Definitely would just go the regular industry way to go about it hahah Thanks for the extra videos, super interesting stuff! Keep up your awesome work :D
@@Cless_Aurion Oh, sorry, it went over my head. But yeah, as people have said here, Nanite is just a tool, so if something ain't broke, don't fix it, especially with this tool :D
Thanks to all the amazing viewers who have been engaging with this video! One question that popped up is whether Nanite’s software rasterizer runs on the CPU or GPU.
The answer: Nanite’s software rasterizer is GPU-based. 🎮
It uses compute shaders to handle small triangles (clusters with edges smaller than 32 pixels) and dynamically bypasses the traditional hardware rasterization pipeline. This GPU-based approach ensures performance scalability and keeps the rendering pipeline efficient, even with highly detailed scenes.
This design minimizes CPU-GPU overhead and leverages the massive parallelism of modern GPUs, which is a core aspect of Unreal Engine 5’s real-time rendering magic.
Thanks again for the support, and keep the feedback coming! If you’ve got more questions, feel free to ask-I’ll dig up the answers. 😊
I chose to go with voxels + vertex colors + auto lods for GODOT, I use a quad per pixel with no uvs and textures. From what I seen with benchmarks, shaders and textures are actually much more expensive than poly counts. So I will see in practice how well that will work.
Looking forward to seeing it in action. I've never used voxelized assets, apart from using it in parametric modeling for 3D printing, but I did not research the use of materials when it comes to voxelized geometry, is it really that much of an issue?
Thank you for your time spent here friend! Both teaching me some new things like the clockwork logic of screen space reflections. Whilst helping to reinforce information I had already digested.
I'm glad you found value in it. Was there anything about screenspace reflection here or was it a part of some of the techniques?
Thanks for the work you put in this !
5:10 Now that Blender version is a blast from the past
Yeah, that slide does not showcase what I am talking about properly, but it does look older, thanks for pointing it out :D
Thanks a lot for the overview! What's your opinion on using nanite to render those 30 million poly statues, versus rendering them with traditional lods, view space culling and gpu instancing?
Another question: how do you control how dense the "merged" mesh nanite uses for the one geometry draw call? Is there a theoretic maximum depending on the rendering resolution, and is it guaranteed to be handled by the GPU?
@@mushudamaschin2608 realistically, that one contrived example would probably run just fine on modern GPUs. Do note that instancing and LODs are mutually exclusive. LODs create different geometry, instancing only works for rendering the same geometry over and over. However, realistically if you instanced the models based on a couple of lods, or batched the models all together as they share the same material and would presumably be static. Especially if you were aggressive with LODs because realistically you don't even need a million polygon statue close up.That kind of detail you should probably bake into the model, It makes the model much cheaper to render and baking the data into the model texture maps aren't going to decrease quality all that much. In any case, it would probably run fine even on a traditional renderer. That isn't to say nanite is worthless, however, I'm pretty sure it would be better used for VFX and cutscenes than actual games. The high performance penalty isn't a big downside when the alternative is a 10x performance drop compared to nanite
As for the merged mesh, there's probably a bias somewhere.
Finally, yes, it should be handled by the GPU as any GPU that couldn't handle that calculation also by necessity could not handle nanite. Nanite from what I understand relies on compute shaders. Which would also be how it's handled on the GPU.
Thank you mushudamaschin2608 for the clarification.
I think that instancing helps in lowering the complexity in the scene from the point of locating the clusters in the tree or the DAG. For example, all these 33 million polygon statues were used as isntances and placed at the exit in the Nanite in the Land of Lumen Demo, as was discussed by the creators. But as is stated, when you have large occluders with a huge amount of polygons, like photogrammetric or movie est asset quality, sometimes you just want to use them, without manual prep work, like normal baking and that's where Nanite should shine.
Someone raised a point of asking are there any games that use Nanite, apart from Fornite?
I originally watched this video to see how our reality could potentially be polygons.. but ended up learning about things I'll never use 😅
Yeah, sometimes the things that we think would be useless, seem to help us gain an interesting understanding of our surroundings later on ;-)
Great video!
I wish you talked a bit about VSM and how the elements outside the frustum can still cast a visible shadow in the scene.
I wanted to tackle the aspect of lighting and shadows in a different video, but it is a great topic and I can't wait to research and share what I've come up with. Thank you for the suggestion and for watching :D
Wow, this is all really well explained
Thank you very much, there is a saying - If you can't explain it simply, you don't understand it well enough. Even though it lasts for 28 minutes, hopefully it is understandable 😄
Awesome work man, thanks for sharing the knowledge.
Thank you for watching, I'm glad you found value in it :D
Great video, thank you! A lot of science involved behind a game engine.
Sure is, kinda like not knowing what happens under the hood of a car, yet it still goes if you know how to use it :D
Your explanation is fantastic and I am eagerly awaiting your future explain videos
Thank you for the support. The next logical one would be about lighting, but it will definitely take some time to explore and prepare. Until then, maybe you can check out the video about virtual texturing, which was basically the concept behind virutal geometry, like Nanite, or about optimizing the pipeline?
Nice overview, man! From time to time I rewatch that Siggraph talk trying to understand it a bit better... It's a really complex talk that requires a lot of background knowledge to grasp...
Just one observation. The software rasterizer happens in GPU, not CPU. It differs from hardware rasterizer by utilizing compute shaders instead of the dedicated raster bit of the graphics pipeline.
Thank you for providing feedback. Yeah, I came back to the lecture every once and a while, hoping that 1x speed and really committed listening would help me understand. But I was lacking the terminology and the experiences I found later on. However, still I can watch the lecture and not get everything, so many nuances.
For the software rasterizer on the GPU instead on the CPU, it seems I got my facts wrong about them, or I implied to myself that if this was hardware, software goes here, bassed on prior information that software is tied to CPU.
Taking you very much for pointing it out, much appreciated.
Amazing presentation! It really sets the stage of information for standard techniques along with the details with ninite, very lovely.
I don't use Unreal so I have a question, but I am curious of the technology. Does Ninite allow for tweaking of the number of polygons per cluster, or how "deep" the tree goes of clusters? Or is the technology more so static in editor?
Just curious of how it would impact performance.
Glad you share use cases, again great work.
Thank you very much, I'm glad you found it insightful, even as someone who, as you said, is not an UE user. I think you can set the numbers for the error and based on that the clusters will form, but I believe they always strive for those 128 triangles following the logic of 128 pixels for virtual textures, which in this case is where Nanite wants triangles to be the size of pixels, I guess
Such a great explanation.
Thank you, I really tried to explain it to myself, and I'm very picky about all the nuances... At least up to a certain point :D
real good explanation.
Thank you, that means a lot, I was looking for a way to make something that was complex to me, simple and relatable, glad it came through 🙂
That information is really important. Thanks for making it
Thank you for pointing it out 😄 I also believe that these things are important, but not because you get the knowledge, but because you also gain a perspective as to where it all came from and where it can possibily go.
I still don’t quite understand the occlusion culling bit - specifically:
The Z buffer of the previous frame is used to do a first culling pass, makes sense.. but then you end up rendering the Z buffer of the current frame to do a second pass of culling.. aren’t you just rendering the scene at that point? Like to generate the Z buffer of the current frame, wouldn’t you basically have to push every single triangle to the screen? Only to then cull a bunch of them and AGAIN render all those triangles?
@@liquos I can actually explain that.
So on the GPU you aren't required to use a full pixel shader and the GPU has a fast path for depth assuming you don't do anything funky with the pixels. (Like discarding them or manually setting depth). It's usually referred to as Early Z.
Because of this, much less state gets changed, if any at all. The actual drawing to a framebuffer is extremely fast.
It's shading, blending, post-processing, texturing, GPU state changes. All of those eat up performance. A depth pre pass does none of those.
It is only for opaque geometry, alternatively geometry that has transparency enabled for say grass billboards. (But this actually oftentimes permanently disables early Z optimization until the buffers get swapped, so you would do this in a separate pass afterwards as it's less efficient.)
If I understood correctly, you need a certain amount of time to create the Z buffer and a certain amount of cache memory for it. If you wait for the GPU to create the Z buffer you are creating overhead, meaning there is idle time between the CPU and the GPU.
So, in order to solve this, we take the previous frames Z Buffer and use that to test all the bounding volumes of the meshes that were determined to be the occluders. Since the occlusion query about what is an occluder can take a lot of time, if you test everything, this cuts down on that portion.
When you take the previous frame's Z buffer, but at a lower resolution and the current bounding volumes and compare it, you can create a rough estimate of the current frame's Z buffer and then just check the new meshes or the meshes that became visible in the current frame, and test their bounding boxes for occlusion.
There is a short text here, medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501
and here
www.nickdarnell.com/hierarchical-z-buffer-occlusion-culling/
I think I got a good understanding of it, but would like if someone can clarify or correct me if I am wrong. :D
Sorry for the longer post
Thank you for another great video, much clearer now after introducing all the major concepts 😉. One question though - is the occlusion volume used for object based occlusion and bounding volume used for image based occlusion - like Z buffer?
Great observation. That is my understanding as well. In one case, we use a occlusion volume, which is snuggly fit inside the boundaries of our mesh so we can always be sure that we are not occluding more than necessary. In the other case, when we are using bounding volumes, we want to check the entire mesh to know if even one part is visible or occluded so we can overlap it with the Z buffer. Thank you for bringing it out 😀
Was very good summary video.
There is also far plane clipping (culling) not done by Unreal - with exclusions for permanent objects. I'm trying to get this exact thing to work myself in another engine.
Yeah, there was also portal culling which is really an interesting thing, even though it can require manual setup. I found a lot of understanding in the series from thebennybox here th-cam.com/video/8xgb-ZcZV9s/w-d-xo.htmlsi=yWWAG79ulUeRl8ry
im not deep in unreal but cool to see comments about threat interactive. Let's all work together to find the best solutions
I agree, threat interactive seems to have stirred the pot with this Nanite application issue and bringing everyone to discuss it is definitely a win, regardless of what the outcome is
Imagine showing this tech to Michelangelo
He'd probably say, OK so you can digitize sculptures now, let's do that, I'm better with a chisel than with a mouse 😀
Michael Angelo :D
I hate to be that guy .. but it's spelled Michelangelo Buonarroti (thanks for fixing it)
@NicolaFerruzzi It's way different in my country. His name is "Michał Anioł" in polish, which translates to Michael Angel, hence the mistake.
@@markitekta2766I think he would be amaze because it's moving image, the legends say he angrily smash his hammer saying "now move" after finishing his moses, he wasn't just trying to do life like sculpt, he was trying to make life, he was pissed that even achieving perfection would bring sculpture to life
i guarantee that this will be used in classes. if you're one of those students, hey! you better be reading this _after_ the video is over.
I made it to be seen and used for learning in classes so the editing is not as it could be, with all the bullet points taking up a huge portion of the screen. I read the comment and hopefully other people will as well :D
I don't understand why people obsess over David when the Egyptians were sculpting figures 5 times bigger thousands of years before. There were statues called Colossi, some were made of bronze. It seems to be some strange bias towards modern European culture. Do you really think ancient people were incapable of sculpting or observing veins and muscles? It's so silly.
Thank you for pointing it out and sharing your opinion :D I just needed an example for the intro and I believe that any example would have sufficed. I found this more relateable to me and the audience I thought I have, but if you have some examples of ancient sculptures, please share them, I'd like to learn more about it :D
They're big, yea. But that's about all we can see. They're so damaged from time that any detail they could've had is gone. Ancient greek statues have stayed in much better condition and so are more appealing for most people
i wonder if four dimensional beings consider our 3D geometry to be flat textures
That is a great observation. There is a book called - Flatland: A Romance of Many Dimensions by the English schoolmaster Edwin Abbott Abbott who explores this very notion of lower dimensional creatures being visited by a higher dimensional creature.
It is understandable and relateable, because it is a square getting to know a sphere. You can read about it or watch a short video on TH-cam regarding this topic, kinda puts everyting into a different scope of thinking.
Only good thing about nanite is that is saves time. But it should not be like it.
Also for stuff like terrain you really could just develop a retopology tool that remeshes it once and creates a full quad mesh that i later easy to adjust either automatically or by hand.
Hard to tell. I am a 3d artist not a game dev. What you try to do automatically, I just do by hand.
@@wydua Thanks for sharing your insights. As they say - good things take time, so if manual labor and adjustment with tedious tweaking is what gets great performance down the line then it is worth it, right 🙂
@@markitekta2766Yeah.
It's honestly really bugging me off that in modern days the games are just made as fast as possible because it's cheaper.
It seems they forgot that you can't rush art.
@@wydua Yeah, we live in times where everything is needed as soon as possible, but when you take your time, you can produce something wonderful :D
@@markitekta2766 :D
хороший контент, возможно, только из пожеланий, чтобы было осуществлено больше оптимизации. до\после
If I understood correctly, you are looking for specific numbers before and after using, that is a good suggestions, I'll try to post something soon enough ;-)
@@markitekta2766 Ну, возможно можно не большой проект попробовать использовать. Просто я делаю свою игру, и тут трудный выбор как поступать. Использовать наниты или ЛОД. Если геометрия простая и плоская, то ее как будто нет смысла переводить в наниты. А вот фольяж как показала практика сделать нанитами стоит, это дает хороший буст производительности. В общем хочу сказать, что непонятно как делать правильно, надо столько всего учесть, без примеров иногда это просто по интуиции :)
@@markitekta2766 Ну, возможно можно не большой проект попробовать использовать. Просто я делаю свою игру, и тут трудный выбор как поступать. Использовать наниты или ЛОД. Если геометрия простая и плоская, то ее как будто нет смысла переводить. А вот фольяж как показала практика сделать нанитами стоит, это дает хороший буст производительности. В общем хочу сказать, что непонятно как делать правильно, надо столько всего учесть, без примеров иногда это просто по интуиции.
I read the title as "why"
:DD I guess that would have gotten even more comments here, some would say why not, others would go for - beacause of realistic depictions, while some would say we don't, we can use less performance heavy approaches. The diversity in optinions based on reason and fact is always welcome to prosper the development :D
What's going on in blender at th-cam.com/video/RRKCqmctxLs/w-d-xo.html ?
I've spent weeks trying to create a similar dynamic LOD system in blender and would love to see how somebody else has implemented it.
I just found the video to prove a point for what I was saying, but I'd like to see if anyone can offer more information about this
some comments in this video are really making me mad, people really dont know what they are talking about, they dont know how the tech works and where it is useful,
guys if ur a game developer, you care about making the most performant real time game ever (as needed with esports titles) you offcourse would have billions of polygons in ur scene, and hence wouldnt benefit with nanite, nanite really isnt made for that, it is made for higher details while still keeping the game realtime (around 16ms per frame). Nanite really is a big thing in graphics programming, dont discredit it because some devs dont use it properly. (infact I would say UE5 is really not well optimised in the first place, they compile 1000+ pipelines for no reason, if you really really care about performance, you would be writing ur own custom renderer).
Thanks for sharing, I really appreciate your opinion. As a friendly suggestion, perhaps you shouldn't take other people's opinions as something that should be corrected or heavily debated. They have a different perspective on the topic which helps all of use broaden our horizons. We can only show which paths we think are correct, it amongst many, chose the ones to take. Hopefully they all reach the same place in the end. Having such a great comment section on this really brings a smile on my face as I see so many perspective I never though off, like the one you are making ;-)
Interesting. I'm just not sure why all this occlusion culling still can't properly avoid overdraw.
@@cube2fox because occlusion culling is hard and it's not perfect. The hidden surface determination problem is a bit of a rough problem to solve.
If my understanding is correct, when you get significantly far from the overlapping geometry, the distance between the triangles or clusters becomes so close that it can be like an artifact that can occur when two triangle are overlapping in any general setting during modeling.
Non the less, I'd like to hear other opinions regarding this question :D
@@markitekta2766 this seems like the best way of explaining it; you cram enough triangles in the scene you can either select relaxed culling and have significant overdraw or strict culling and have artefacting as triangles are removed to save time but there are so *many* triangles you eventually delete important ones through the culling
Nanite makes sense on white paper to sell GPUs but ultimately is just a nerf overall.
@@ThePlayerOfGames Got it, thank you making it clearer :D
I'm working on a project where the depth buffer is used to determine polygons a z distance greater than x, generate a very simple LOD using multiple objects at once (reduce draw calls) and then combine the materials into a "grouped" mip map.
Imagine a scene with a castle in the foreground and trees & mountains in the background. My code when done would combine the mountains and trees into a reduced LOD and then merge the materials into a single material. I've been struggling with this second part.
I've been thinking about some of these questions for years. On personal projects I have gone back to UE4 because UE5 has been terrible for optimization.
So... HLOD?
Thanks themeangene for sharing. Yeah, I though of HLOD right of the bat, which creates atlas materials. I was asking the same think for Nanite, like if it combines clusters and triangles, why not materials. Especially if we have virtual textures that can aid in only showing the visible parts. But perhaps the preparation of this atlas and the memory would impact performance, which is why they stuck with traditional pipeline for these.
Except both nanite and lumen look horrid in production with numerous artefacts, jittering and smeared images on movements. I'm not saying it's not a great tech and great progress, but without their myriad of issues fixed they are pretty much pre-alpha and should not be used in production. Source: every single production use-case of lumen and nanite out there.
Thank you for sharing, I was not aware of that. When I saw the Nanite video 3-4 years ago, I had no idea how it worked or why. I think I have a better understanding now, but not about its use in practice. If anyone has any examples, it would bring a new note to the discussion section, perhaps?
We are need a full raytracing. GPUs with hundred thousands RT cores
Number of RT cores is hardly the problem.
That is an interesting observation. Currently we have, if I'm not mistaken, over 10 000 cores in a GPU chip, each with a capability of running 3 billion operations every second. But still, if the pipeline is not optimized, a simple scene can cause latency or display issues.
I always return to the quote of Jeff Goldblum from Jurrasic Park saying
Your scientists were so preoccupied with whether or not they could, they didn't stop to think if they should.
@@markitekta2766 I mean, for large scene (10+ billion triangles) raytracing with 100 million rays should be much faster than entire scene rasterization.
@@DefleMask I can see logic in this, but I guess in the end we still have to rasterize it for display purposes, so perhaps they are trying to kill to birds with one stone, even though it can be slow at times?
but nanite is performing much slower, wasn't it proven by threat interactive? nanite in case of performance is useless
If I understood correctly, it depends on when you use it. Under a specific polygon count number, it has diminishing returns, meaning you lose more time to prepare everything and render, when compared to the traditional pipeline. The entire Valley of The Ancient Demo wouldn't work with a traditional pipeline, where you had a room with 500 sculptures, each one having 33 million polygons. But for a simple architectural building or urban site with little details, Nanite would not produce better results.
If you have additional resources, please share them here, I'd like to find out more about the topic :D
Don't listen to that moron. Threat Interactive is the farthest thing from an authoritative source on the subject.
@@markitekta2766 It's not about polycount, it's about overdraw.
Threat Interactive showed a 6 million poly mesh running 2x faster without LODs or Nanite. It gets to the point where all the Nanite details just become noise which needs TAA...which just blurs the detail anyway.
Nanite is just a tool for devs who don't want to bake details.
@@Igivearatsass7 Thanks for the clarification. The things I gave here are based on theory, if there are practical examples that show otherwise, we should jusst implement a scientific approach perhaps. But 16 billion polygons cannot go through a classical pipeline, right? Also, overdraw happens if you kitbash a scene or use agregate geometry, otherwise, the culling process removes the unwanted clusters?
Threat interactive is not a good source for technical details, he has no background in computer graphics and has been trying to do 1-1 comparison when things aren't 1-1 comparable.
I know this because I am very active in a graphics programming discord server and we have seen him beg for any numbers that may support his claims.
Nanite overall scales better than non nanite tendering, if u try to render a few 100 thousand meshes, nanite won't perform as well, nanite shines when you wanna render millions of meshes.
Witha a proper 2014 with projection and baking you can have all those minute details into TANGENT SPACE normal map.
you don't need nanite and draw subpixel triangles.
To do that it takes more time to draw every triangles by themselves lmao. """"optimization""""
I think there was a talk, perhaps about Lumen, when they said that since one pixel can only display one triangle, the exploration of subpixel triangles is not the way to go. But I agree that if you take the time to optimize your scene with manual work and not automated approaches, you can gain more than letting the system handle it for you ;-)
@@markitekta2766 Also because PCs today are a lot more faster and you can get away with overdrawing most of the time.
People should work on mid-poly nowdays and work on a global GI with PBR and you will see 100% resolution rendering with some multisampled anti aliasing in a crystal clear picture on your screen.
Instead let's put boxes with billions of triangles just because.. let's lower resolution because somehow my game runs at 2 fps and upscale the frame with AI and "fix" all the artifacts with some vaseline, i meant TAA (which is also a cancer on performance.. > 1 ms is still a lot, you know how many things you can do with that infinite amount of time?.
(rethorical question, you seem to understand this pretty well brother)
@@AvalancheGameArt Valid points, thank you for putting it out there ;-)
Did you ever breath while doing this video?? i haven't seen you pause once holy😂😂😂
😂 I sighed a lot, actually, because there is so much to say at appropriate times, but I edited it out, thnx for bringing my attention to it :D
Ok, now show us UE5 game that runs well and doesn't look like Fortnite.
If you are asking me specifically, I'll see what I can do, but if you are asking the community, I'd like to hear about it as well.
My game, for example.
Silent Hill 2 runs well apart from the occlusion culling bug.
Black myth wukong
It looks Really good
So it justifies being demanding
The Talos Principle 2. Unlike infamous Wukong, it wasn't noticed for such catastrophic performance troubles.
Or just optimize the game so that users can play on a potato?
Ai
Do you mean AI used for upscaling or for something else?
Got it. Will sculpt each tree leaf with geo instead of using alpha cards. Thanks! 𓁹‿𓁹
Yeah, the scene really lights up due to overdraw when you use Nanite on aggregate geometry such as leaves. Imposters are also an interesting approach, as was seen in this video on 2:10 th-cam.com/video/hf27qsQPRLQ/w-d-xo.html Also, they talk about leaves in this video on 13:04 th-cam.com/video/eoxYceDfKEM/w-d-xo.html maybe you can find some suggestions there?
@@markitekta2766 Oh! I was doing a bit of trolling! Definitely would just go the regular industry way to go about it hahah
Thanks for the extra videos, super interesting stuff!
Keep up your awesome work :D
@@Cless_Aurion Oh, sorry, it went over my head. But yeah, as people have said here, Nanite is just a tool, so if something ain't broke, don't fix it, especially with this tool :D