Genius Graphics Optimizations You NEED TO KNOW

Oskar Schramm

มุมมอง 23 720

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 10 พ.ย. 2024

ความคิดเห็น • 97

@perli216 หลายเดือนก่อน ⁺⁸³
0:30 Frustum Culling
0:52 Occlusion Culling
1:28 Distance Based Fog
1:42 Instancing
2:06 Batching
2:19 Dynamic Terrain Tessellation
3:03 Image Based Lighting
3:25 Light Probes
4:20 Light Mapping
4:37 Photon Mapping
4:57 Voxel Based Global Illumination
5:14 SSAO
5:27 Deferred Shading
5:49 Light Prepass
6:12 Acceleration Structures
6:33 Tiled Rendering
6:55 Clusters (Forward+)
7:18 Screen Space Reflection
7:29 Precomputed Radiance Transfer
7:50 Stencil Shadow Volumes
8:03 Shadow Atlas
8:22 Cascaded Shadow Maps
8:42 Variance Shadow Mapping
8:04 Mipmapping
9:15 Texture Channel Packing
9:46 Bindless Resources
10:08 Mega Textures
10:28 Resource Streaming
10:43 Sparse Virtual Textures
10:57 Optimizing Models
11:15 LOD
11:55 Caching
12:11 Minimizing State Changes
12:27 Branchless Shaders
12:54 Signed Distance Fields
13:05 Compute Shaders
13:32 Async Compute
14:02 Temporal Reprojection
14:27 FXAA
14:41 Hierarchical Z-Buffer
14:54 Depth Peeling
15:06 Bitwise transparency & Alpha Stripping
15:37 Logarithmic & Reverse Depth
16:01 Depth Prepass
@oskar_schramm หลายเดือนก่อน ⁺³
Much appreciated! Well deserved sleep.
@Architector_4 หลายเดือนก่อน ⁺⁴
@@oskar_schramm Consider copypasting this list into the video's description so that TH-cam would automatically form them into chapters; it's a handy feature lol
@635574 16 วันที่ผ่านมา ⁺¹
Its like an atlas of all the graphics optimizations.
@marcsh_dev 9 วันที่ผ่านมา
@@635574 Exactly. I was about to say, thats a FANTASTIC overview of potential optimizations
@HiHi-iu8gf 8 วันที่ผ่านมา ⁺²
came in expecting to hear mostly things I already knew, but this was quite extensive! nice video
@Kolyasisan 10 วันที่ผ่านมา ⁺⁶
Clicked on it out of curiosity expecting some rudimentary and well-trodden stuff, like all the videos saying "why 'game name x' is slow and how we can fix it" and it goes over highly basic stuff like culling, but you did go over a ton of various techniques. It's refreshing to see discussion on some nuances once in a while.
Like when I shipped a game on consoles that targeted strict 60 FPS and occlusion culling actually worsened the overall performance, so we didn't include it : )
@Kolyasisan 10 วันที่ผ่านมา
Maybe one slight comment: branches aren't really a problem nowadays. Sort of. By themselves, branch code execution came a long way on GPU and can be scheduled much more elegantly by the GPU scheduler, unlocking many more optimization opportunities and techniques along the way.
The main problem with branches is still data dependency, as usually the algorithms that the user implements with branch usages are very dependent on data that cannot be fetched quickly. This in turn can further worsen performance with data which loading depends on branch outcomes if you're not careful enough. There's additional necessity to suspend threads and hide their latency because of that, and the GPUs can only do it so much before they hit massive pipeline stalls.
@dragoons_net 3 วันที่ผ่านมา ⁺¹
Half of them I didn',t know or understand: very concise but very usefull! Great.
@Bluehawk2008 12 วันที่ผ่านมา ⁺³
I think your clock needs new batteries.
@DriftJunkie 2 วันที่ผ่านมา
The man has stopped time to record the video 😮
@rulonoboev1783 หลายเดือนก่อน ⁺⁹⁴
Modern optimizations in AAA games: check the dlss and nanite boxes in unreal engine 5
@namelessalias0007 หลายเดือนก่อน ⁺⁷
AAA games already use most of these techniques.
@nbshftr หลายเดือนก่อน
@@oskar_schrammdlss is cool and all but nanite is actually useless, harmful even
@OverJumpRally 29 วันที่ผ่านมา ⁺²
Nanite alone uses a bunch of these automatically.
@wouf_ 27 วันที่ผ่านมา ⁺¹⁵
Nanite can give you worse frame rate, because it overdraws if you don’t know how to use it and if your model topology is complex it’s pretty hard to optimize and it’ll kill frame rate, so for big game it’s framekilling, it’s all of the other optimization that are tanking for nanite
@oskar_schramm 25 วันที่ผ่านมา ⁺⁹
I think this explains it pretty well. Although we have to realize that nanite isn’t theoretically a harmful thing, else it would not have been made.
It solves a lot of problem, but just as with any other type advanced tech, it’s important to use it correctly, and if not, yes it can harm performance.
Same is true with ex z-prepass or occlusion culling, which is implemented by 90% of the industry, but if used incorrectly, can also hurt performance.
Acceleration structures almost always have an overhead before it gives results.
It’s not just a checkbox for frames.
@xabblll หลายเดือนก่อน ⁺¹⁴
Great video!
I think we can mention shader programs optimizations too:
- cycles
- branching
- per vertex/per fragment calculation trade off
- texture samplers and bindless textures
Also hardware specific optimizations like for mobiles(tile architecture) we have
- hidden surface removal
- optimized MAD, dot, saturate shader instructions
- cheap MSAA on write only frame buffers
@oskar_schramm หลายเดือนก่อน ⁺²
Glad you liked it. Absolutely great additions, thanks!
It's just too hard to list them all in one video :D
I was thinking of making a video in the future about specifically shader optimization, so will take this in mind then!
@helloitshecker หลายเดือนก่อน ⁺⁷
Wow, you deserve more subscribers. This video is very high quality and educational, no bs. Hats off to you sir
@oskar_schramm หลายเดือนก่อน
Thanks for the kind words, and glad you enjoyed it!
@mostrealtutu หลายเดือนก่อน
well, there is some bs in there, "stay till the end to find out this one simple trick youtubers dont want you to know"-sort of bs.
@mihalydozsa2254 20 วันที่ผ่านมา ⁺¹
Liked, Subscribed, saved the video :D Great compilation of these techniques.
@dokgo7822 9 วันที่ผ่านมา ⁺¹
Finally voxels getting some love ❤
@RamunDev 2 หลายเดือนก่อน ⁺³²
As someone learning graphics programming this is a treasure trove of information, thanks for the video
@oskar_schramm 2 หลายเดือนก่อน ⁺³
I’m really happy you are finding value in it! Thanks for engaging, and hope you have a blast in the graphics realm!
@No1001-w8m หลายเดือนก่อน
If only he could explain it better.
@oskar_schramm 21 วันที่ผ่านมา
@@No1001-w8m Do you have an example? If you mean the depht of each technique, I can make follow up videos for that, but if i were to do 50 techniques in depth in 1 video, that would be 1hour+
@AlexGoldring 15 วันที่ผ่านมา ⁺²
Note, sparse virtual textures and megatextures are the same thing.
also, shader branching is complicated. Branches are actually very cheap, what's expensive is divergence across a wave. That is - if neighbouring pixels all take the same branch - it's cheap, if they take different branches - it's expensive.
@oskar_schramm 15 วันที่ผ่านมา
Thanks for letting me know, yeah realized the mega texture part a week or 2 after the video. About branching, you mentioned pixels but I guess the same is true for any code that uses the cuda cores? So if a thread in a warp takes a different path than the others, they cannot run in lockstep right? And fragment shaders are run in a 2x2 thread block, is that 4 threads, or a warp sharing that 2x2 block, if you know?
@AlexGoldring 15 วันที่ผ่านมา
@@oskar_schramm The concept of a warp or whatever roughly exists because of hardware mapping. In hardware a bunch of cores share a large piece of cache, so no matter if it's cuda, pure compute shaders or fragment shaders - they share a larger than 2x2 (4) group. Actual sizes will vary depending on architecture, even between generations of the same architecture things change sometimes.
But yeah, you got it exactly right - if there is divergence both paths will be executed more or less, so the timing will be worse than max(x,y), it will be x+y roughly.
You typically want as little divergence as possible anyway, because divergence slows everything down, not just branches, your texture caches get hammered as well.
A good recent example would be ray tracing, pretty much every RTX api today will sort rays before dispatch to reduce divergence, because every with the cost of sorting you tend to get ~20% performance boost due to better data locality. If you're interested, there's a lot of literature out there on the topic, "Ray Tracing Gems" has some introduction on it. The actual sorting is done based on direction and origin, rays that originate close together are more likely to hit the same data, and rays that go in the same direction share the same property.
Regarding performance, I would recommend looking into the following topics as well:
1. Temporal and Spatial integration (TAA is an example of that, another would be straight-up upscaling)
2. Meshlet shading. You brought up nanite, which used meshlets (they call them clusters)
3. Variable-rate shading
4. Texture compression. This is something that most platforms will handle for you, but if we're talking about graphics engineering - it's a very valuable technique. Hardware supports texture compression out of the box, and it's actually faster than non-compressed access, as there's less data per pixel.
5. Texture streaming. The idea is to only load MIPs that are currently needed, and skip the full resolution levels, this makes load times much faster and helps eliminate a lot of FPS spikes when you need to load some new textures as we rarely actually need the full 4k or 8k textures since we're looking at things from far away
6. Frame graphs. These are standard today as well, there's a good talk from EA I think from over a decade ago. Both Unity and Unreal use frame graphs, and so do most modern engines. The reason is resource utilization and pipeline flexibility. You can use waaay less memory by reusing render targets between passes and your code become more modular. There's an extra perf benefit as you're more likely to have the render target in your caches already when starting a pass.
There's a lot more, I suggest checking out "Advances in Real-time Graphics" section of SIGGRAPH, they have a separate course every year with industry presentations from companies like Epic and EA, lots of truly amazing stuff and it's all in open access.
@Johnny31323 25 วันที่ผ่านมา ⁺³
It's interesting how a game like Battlefield 1 from DICE, can be so pretty and yet so optimized that it can run on very simple hardware, without stutters, graphical glitches and no perfomance issues as in frame drops or frametime issues.
Then you go and look at CS2, which is yet to be optimized, and where as Battlefield 1 can easily run with 62 players in a match, CS2 struggles with 10 of them in a Match.
Also not to say that CS2's perfomance has only been worse after subsequent updates. So sad.
Valve has all the time they want/need to optimize, yet they don't, but DICE was working within a time limit, and yet implemented so much with way more effort.
Edit: All games have turned to DLSS and FSR for help, when they don't actually help overall perfomance because you're only optimizing resolution perfomance for the GPU, while the CPU will keep chugging like crazy. Every single game nowadays is not GPU limited, but CPU limited with GPU perfomance features that don't help if you're already CPU limited because the game is not optimized via distant LODs and occlusion culling and way more...
@glitchdev หลายเดือนก่อน ⁺²
Nice Video, I thought that I would know pretty much about game development, but most of the things included in your video, I've never even heard.
@slayth6332 20 วันที่ผ่านมา ⁺¹
These are lovely technical optimization techniques, but we also have to take texture usage into account, like using trim sheets, and model stuff in a modular way to be repetitive but also full of variation and customizability, to allow instancing, and all culling types.
@oskar_schramm 18 วันที่ผ่านมา ⁺¹
Absolutely, great addition. There is just too much to cover in 1 video, will probably make a follow up
@WhipsterCZ 28 วันที่ผ่านมา ⁺¹
Thank you. Perfect and short explanation.
@oskar_schramm 27 วันที่ผ่านมา
Hey, glad you appreciate the fast paced format
@Capewearer หลายเดือนก่อน ⁺⁴
7:59 - "But it cannot do soft shadows". What a horrible lie. The Dark Mod did it, you can allow smooth shadows both with stencil shadows mode or with shadow mapping.
@oskar_schramm หลายเดือนก่อน ⁺²
You’re right that The Dark Mod achieved soft shadows, but it’s important to clarify: stencil volume shadows alone create hard edges due to their binary nature. The Dark Mod used extra techniques like penumbra widening and post processing blurs to mimic soft shadows. So, while stencil volumes can be enhanced, they don’t natively support soft shadows without these added tricks. Thanks for mentioning though, will try to be more specific in the future.
@michawhite7613 หลายเดือนก่อน ⁺⁴
Deferred shading is usually slower than forward shading. You need large g-buffers in order to do it, and transferring all that memory is very slow.
@oskar_schramm หลายเดือนก่อน ⁺¹
@@michawhite7613 Not necessarily.
Memory bandwidth is not everything, especially in PC games.
You are correct in the fact that forward can be better, but it’s usually in the case of multiple other optimization techniques supporting it, like clusters and zprepass.
Deferred would not be an industry standard if it was worse.
@DevLancelot 20 วันที่ผ่านมา ⁺¹
@@oskar_schramm Making a game around forward shading will give you better FPS than deferred shading. Only time deferred is better is if you're using more complex (and less optimised) lighting. The industry standard for mobile games is forward
@bananaboy482 หลายเดือนก่อน ⁺¹
invaluable video for those learning graphics programming
@cory99998 หลายเดือนก่อน ⁺²
thank you! This gives me a lot to explore
@oskar_schramm หลายเดือนก่อน
Glad you enjoyed it. And finding things to explore more was exactly what I was going for :)
@SmellsLikeRacing 21 วันที่ผ่านมา ⁺²
I feel like Frustum Culling animation was wrong (or oversimplified) even though the spoken text was correct. As I understand, culling is not just about what gets rendered, but also about what gets sent to GPU. If the game uses meshes (most games) then intersecting mesh will still get sent to GPU and processed. If the game uses meshlets (Alan Wake 2 and UE5 games with Nanite), meshlets that aren't inside the frustum and doesn't intersect, get culled.
@oskar_schramm 21 วันที่ผ่านมา ⁺¹
Yes you are right, the animation was not representative.
@Faby__ 28 วันที่ผ่านมา ⁺¹
Great video!
@Magnymbus หลายเดือนก่อน ⁺³
Screen space reflections are extremely distracting imho.
@oskar_schramm หลายเดือนก่อน
You mean reflections as a whole or specifically SSR? Because proper reflections outcome shouldn’t have you see the difference.
Obviously you will see a lot of artifacts either way with screen space solutions, but games can always do ssr with ex reflections probe as fallback to handle the artifacts.
SSR is a pretty ugly solution to the problem all together, but it works in some cases and it’s faster than other solutions, so that’s why we use it.
@RiasatSalminSami 26 วันที่ผ่านมา ⁺¹
Agreed. I rather just have cube maps because the abnormal cut off with SSR is too ugly.
@floinseler หลายเดือนก่อน ⁺¹
just impressive, thank you very much
@oskar_schramm หลายเดือนก่อน
Thanks! Glad you liked it.
@TheBackstreetNets หลายเดือนก่อน ⁺⁹
This is a brilliant video, every useful. But that clock in the background not moving distracted me the entire time.
@oskar_schramm หลายเดือนก่อน ⁺²
Thanks, glad you liked it! Haha, yeah I just hate the tick of it, so plugged it out. Now it's just a not very appealing wall decor, but it’s always high noon here 🤠
@Ahrone1586 หลายเดือนก่อน
@@oskar_schrammbased mcgree ref
@Betruet 29 วันที่ผ่านมา ⁺²
he managed to squeeze 17mins into less than 1 minute. true optimisation
@usercontent2112 2 หลายเดือนก่อน ⁺²
Very nice video. I just know half of these!
@MaakaSakuranbo 21 วันที่ผ่านมา ⁺⁴
Guh, I hate screen space reflections. Always so weird when you look at water and stuff in the reflection just disappears at it goes off the screen
@oskar_schramm 21 วันที่ผ่านมา
Yeha SSR is well known for being hated, both by player and by developers, but they do surve their purpose, and with a proper reflection implementation, it should smoothly transition to a fallback like reflection probes to limit artifacts. Also, it's one of the simplest realtime reflection we have that isn't raytracing.
@AlexGoldring 15 วันที่ผ่านมา
I think SSR is great. The problem comes from expectations. If you look at a mirror-like surface, such as water, you can see a lot of detail in reflections, and you expect them to be accurate. If you look at brushed metal, you won't see much detail at all. You expect the detail, and when it's incorrect or disappears - that's frustrating.
Why I love SSR - it gives at least portion of the screen much more realist. Reflections are a part of real-world, we do a lot of hacks and hand-wavy approximations, SSR is much more faithful to reality. Things like Lumen in Unreal use SSR, and fill areas that don't have sufficient information with other techniques, such as ray tracing and environment maps.
"Real-time graphics" has a limitation, it's in the name "real-time". We have to make compromises. In film they spend ~1h per frame on very powerful hardware, for games that's not an options, so we can't use the same techniques. We have to cheat. And the goal is to produce a pretty picture first, and realism second. So SSR is a good compromise in that direction, it makes the picture more pretty by adding extra color variation in the frame.
@MaakaSakuranbo 15 วันที่ผ่านมา ⁺¹
@@AlexGoldring
I'm not talking about brushed metal, I'm talking about a body of water for example.
You look at it you, see the cliff above it reflected, you pan the camera down a bit more, the cliff vanishes as it's too far off screen, feeling very weird. (And no, it's not from the different angle causing no reflections, the sky behind the cliff is no reflected , is all).
Idk, I get there are limitations, but I feel like if you frequently see such jarring reflection changes when moving around big bodies of water, and there are a good number of such bodies, the jarringness of the change can be worse than having no reflection to begin with.
@Adi-rb3vr 24 วันที่ผ่านมา ⁺²
I wonder how many (if any) are being done automagically already in Unreal engine when you first start it up with a template?
@oskar_schramm 24 วันที่ผ่านมา
Oh yeah 100% unreal uses many of these.
I can’t answer with actual facts which they use, but what I can say is that there isn’t a 3D game engine that doesn’t do frustum culling.
Deferred shading, instancing and or batching, shadow atlasing and LODs are some other things they very likely use by default.
@ytubeanon หลายเดือนก่อน ⁺⁴
wish I could tell A.I. to generate a github video game demo showing each of these optimizations with examples
@glitchdev หลายเดือนก่อน
wdym with GitHub video game?
@ytubeanon หลายเดือนก่อน ⁺²
@@glitchdev github project containing video game files
@glitchdev หลายเดือนก่อน
@@ytubeanon ah I see
@oskar_schramm หลายเดือนก่อน ⁺²
Yeah I miss not having demos/more practical examples in my videos, will see what I can do in the future.
@mouloudagaoua หลายเดือนก่อน ⁺¹
well done
@oskar_schramm หลายเดือนก่อน
Thanks :)
@phutureproof 13 วันที่ผ่านมา
so why exactly do i NEED TO KNOW this?
@BusinessWolf1 หลายเดือนก่อน ⁺²
Shave off transparent bits of meshes on alpha clipped ones.
@oskar_schramm 28 วันที่ผ่านมา
Yes, discarding when reading from an alpha texture with alpha 0 is a great optimization. Usually, this is for ex on foliage, so a great thing one could do there aswell is ’checkerboard discarding’ based on distance from camera. Just be sure not to mess up the fragment shaders parallelism from this, as discarding is prone to branching
@hexiy_dev หลายเดือนก่อน ⁺¹
so helpful
@oskar_schramm หลายเดือนก่อน ⁺¹
Glad it is serving its purpose :)
@миииц 14 วันที่ผ่านมา
you should rename the video from this to "list of optimizations used in 3d engines" :|
@ThePlayerOfGames 25 วันที่ผ่านมา
5:11 this is Lumen isn't it? Except Lumen updates so slowly it looks like fairy lights and fireflies rather than torches, pointers, or fluorescent light flicker
@oskar_schramm 16 วันที่ผ่านมา
I think you are correct that It's using voxel based GI, but it's not only VBGI.
I think its a group of techniques, woven together, and getting a fancy name like lumen. If I recall correctly lumen has these components:
Software Raytracing, SSR, SDF, and the fallback is some kind of voxel GI for acceleration, but I could be wrong
@forasago หลายเดือนก่อน ⁺⁶
I can tell from the techniques I myself understand that you are not trying to explain anything and frankly I doubt you understand most of the things you're listing in this video. If so, what is the point? You're listing technical terms with a vague and sometimes even incorrect description, adding no value. The very first item is already a perfect example: Frustum culling does not cut objects in half! The visualization you show at 0:39 is simply wrong. Frustum culling would remove other rectangles NOT PICTURED. It wouldn't do anything to the rectangles shown. They are all at least partially inside the frustum so they would be rendered in full. You even explicitly say "WE CHECK AGAINST THE BOUNDING BOX OF EACH OBJECT", contradicting what's on screen. If you only check the bounding box this necessarily means that any overlap will lead to full rendering of the object. You're culling entire objects, not vertices. Text correct, video incorrect. Just one example. It would be exhausting to go through the whole video like this.
@oskar_schramm หลายเดือนก่อน
Thanks for mentioning this, and I’m sad to hear that you think it doesn’t add any value. I agree with you, the visuals were not always representing the tech correctly for various reasons.
@bodardr 29 วันที่ผ่านมา ⁺¹
So yes you're right about the frustum culling bit, but I tend to think of this video more as a glossary than a tutorial. It's a great checklist for any project that is experiencing graphical bottlenecks. You have to look into each technique anyway, so there's not a lot of harm done here.
@635574 16 วันที่ผ่านมา
And this is why you will never make a gme if you decide to optimize befrore you prove your prototype is good. Also why graphics programming is its own specialisatiom.
@sharkgamestudio7630 8 วันที่ผ่านมา
Bro presenting standard and very basic optimisation technics like they are genius -_- i hate 2024
@hmmmidkkk 2 หลายเดือนก่อน ⁺²
Well-researched video and well presented and so less likes?
@oskar_schramm 2 หลายเดือนก่อน ⁺¹
Thanks for the kind words!
As long as somone learns from it and or finds it interesting I'm happy!
@the-guy-beyond-the-socket 23 วันที่ผ่านมา
0:50 this is a bad explanation of frustum culling. In the video it shows that object part which are not in frustum get cut off, while in reality almost none of the engines work that way. And the video title is missleading. Stands like its one technic which nobody knows about when in the video there are at least 20 of them and every mid dev knows about them
@oskar_schramm 16 วันที่ผ่านมา
The frustum part I totaly agree with. I don't think the title is misleading, Optimization(s) plural. And not all videos are catered towards mid level, or entry level or senior. I try to do what I think is missing in the space.
@SkeletalRavenArts2 2 หลายเดือนก่อน ⁺¹
First!!! Lol

ต่อไป

เล่นอัตโนมัติ

One Month of Programming on my Farming Game