Shader Performance Measurement - Shader Graph Basics - Episode 21

แชร์
ฝัง
  • เผยแพร่เมื่อ 6 ม.ค. 2025

ความคิดเห็น • 73

  • @BenCloward
    @BenCloward  3 ปีที่แล้ว +52

    After making this video, I remembered another reason why instruction count is unreliable. Most game engines use both forward rendering and deferred rendering. Shaders that use deferred rendering (mainly opaque objects) have significantly fewer instructions because the instruction count doesn't include any lighting calculations. So you might have shaders in the 100-400 instruction range. Shaders that use forward rendering (transparent objects) DO have lighting calculations built into them, so you might get instruction count numbers in the thousands instead. However, it's possible that a forward rendered shader with thousands of instructions and a deferred rendered shader with hundreds could take similar amounts of time to render.

  • @andrey730
    @andrey730 2 ปีที่แล้ว +2

    This shader playground is just mesmerizing.
    I keep hearing from different shader tutorials "this operation is expensive and this one is not" and finally I have a tool to grasp the actual costs for each operation. Thanks for sharing.

  • @PrismaticaDev
    @PrismaticaDev 3 ปีที่แล้ว +19

    Fantastic video as always - this is something that I end up telling at least 1 person each day, but now I can show them this video as proof! I really love that spreadsheet as well, it really goes to show how different instructions can be in terms of performance.

    • @BenCloward
      @BenCloward  3 ปีที่แล้ว +1

      Yes, totally! This was something that really surprised me when I started investigating it and make the spreadsheet. I had no idea that there was such a giant spread in performance between the fastest and slowest functions.

  • @Hristos_Muka
    @Hristos_Muka 2 หลายเดือนก่อน +1

    Extremely useful information. Thank you Ben, you are absolutely phenomenal!

  • @gazeth
    @gazeth 3 ปีที่แล้ว +1

    As a Unity user I was a bit miffed that Unity doesn't have a shader instruction count and Unreal does. Having watched this first part of on performance I understand better now - the count in Unreal simply misleading! I'm sure it does have its uses for advanced users though.
    Fascinating subject and brilliantly explained. Thank you very much Ben.

  • @alecek
    @alecek 3 ปีที่แล้ว +5

    7:30 you got me there, I was like : What? No, no, no.
    But actually it will be cool to learn more in depth about it.

    • @BenCloward
      @BenCloward  3 ปีที่แล้ว +1

      Hah! Nice to hear my little joke caught someone off guard. :) I'm usually pretty straight-faced in these.

    • @alecek
      @alecek 3 ปีที่แล้ว +1

      @@BenCloward Can't wait for next week Ben! Thank you for your time and effort to educate us.

  • @MegaWzl
    @MegaWzl 3 ปีที่แล้ว +2

    Thanks for providing the list of cycles per function. That is gonna come in *very* handy. I always had the vague suspicion that instruction count is not the end-all be-all, so thanks for explaining in such a detailed breakdown on why that is! Looking forward to next weeks continuation on the topic.

    • @BenCloward
      @BenCloward  3 ปีที่แล้ว

      Glad it was helpful!

  • @musicdudem6673
    @musicdudem6673 2 ปีที่แล้ว +3

    These are the types of videos that absolutely everyone needs. Thank you!

  • @JamesKellyWickerman123
    @JamesKellyWickerman123 3 ปีที่แล้ว +1

    Thanks for such an informative video! I've always known instruction count was an unreliable measure but I've never seen such a clear breakdown of WHY. Thank you! :D

    • @BenCloward
      @BenCloward  3 ปีที่แล้ว

      Glad it was helpful!

  • @IPpainting
    @IPpainting 3 ปีที่แล้ว +3

    This is incredible knowledge i wouldnt have figured out otherwise. Thank you very much Ben!

  • @zsenkrad
    @zsenkrad 3 ปีที่แล้ว +1

    your cycle doc is pure gold thanks so much!!

  • @johncorbitt4112
    @johncorbitt4112 3 ปีที่แล้ว +3

    Your tutorials are so appreciated Ben.

  • @Noowai
    @Noowai 3 ปีที่แล้ว +1

    I was hyped for a second when you said you were going to teach us HLSL code! Either way, now I know what language I want to learn next

  • @ryanolford1416
    @ryanolford1416 3 ปีที่แล้ว +1

    Best videos, please don't stop I have learnt so much from you. Thank you Ben.

  • @fruitp1227
    @fruitp1227 2 ปีที่แล้ว +1

    Thank you Ben! It's very helpful to beginer

  • @JhebadiaSprunklefunk
    @JhebadiaSprunklefunk 3 ปีที่แล้ว +1

    Great video Ben! I knew the what but not the details of the why. And I appreciate you taking the time to make the cycles spreadsheet for us. I'm looking forward to the next one :)

  • @diomiziofattorelli6609
    @diomiziofattorelli6609 3 ปีที่แล้ว +1

    Thank you for sharing with us these high level lessons!!! With that spreadsheet I can finally see why it is better to use the saturate node instead of clamp! :)

  • @UnrealBucket
    @UnrealBucket 3 ปีที่แล้ว +1

    ok, you got me with the "Shader Playground" joke :D Good video Ben!

  • @pxelguyplays
    @pxelguyplays 3 ปีที่แล้ว +1

    This is gold here! Thank you Ben!

  • @cyber797
    @cyber797 3 ปีที่แล้ว +1

    Incredibly informational video Ben, Thanks! Really looking forward to part 2!

  • @uchihai_a_h4871
    @uchihai_a_h4871 2 ปีที่แล้ว +1

    Dude, take all my 💰.... your content is invaluable. Make a course and I will definitely buy it twice👌

  • @jukai4091
    @jukai4091 3 ปีที่แล้ว +1

    great video i learned much today. cant w8 for the next vid.

  • @YASIR.K
    @YASIR.K 3 ปีที่แล้ว +1

    I am so glad to subscribe to your encyclopedia channel.
    Thank you.

  • @johnykallisto6488
    @johnykallisto6488 2 ปีที่แล้ว +1

    Best way to measure performance. It's testing on target device Android, IOS or PC with DirectX etc.

  • @eggZ663
    @eggZ663 3 ปีที่แล้ว +2

    7:45 Got me there!!!

  • @th3lazyguy
    @th3lazyguy 3 ปีที่แล้ว +1

    I was waiting for this... ❤️

  • @Vanderer11
    @Vanderer11 3 ปีที่แล้ว +1

    Today's Sorcery lesson *checked*, time for Minecraft.
    I feel like primitive now who keep using Shader complexity viewmode in UE to measure performance.
    Thank you for your work :)

    • @nxgentech
      @nxgentech 3 ปีที่แล้ว

      To be fair, the Shader Complexity view mode is still a very useful tool to detect problems at a glance. If you turn it on and see white areas where you inadvertently placed layers of translucent materials, that weren't really noticeable in normal view mode then it was certainly worth it. As long as you are aware of the limitation of each tool, at the end of the day performance optimization has to be tackled using various tools and techniques as it is indeed a complex problem and can't simply be reduced to just a single metric.

    • @Vanderer11
      @Vanderer11 3 ปีที่แล้ว

      @@nxgentech indeed, fortunately not every project need so much love

  • @miguelm.a7462
    @miguelm.a7462 3 ปีที่แล้ว

    Thank you so much, That is why always have been tell to use AO Roughtness and metallic in one of the RGB colors (I knew it was for optimitation), the trick I use to improve performance is to use differents materials per LOD, less instructions in a far LOD like take out normal for example

    • @BenCloward
      @BenCloward  3 ปีที่แล้ว +1

      Yes, LODing your materials is a great way to save performance! Thanks for posting.

  • @ryanolford1416
    @ryanolford1416 3 ปีที่แล้ว +1

    Wish I could like it more than once

  • @trunghoangnguyen660
    @trunghoangnguyen660 3 ปีที่แล้ว +1

    thanks for your video

  • @SenEmChannel
    @SenEmChannel 3 ปีที่แล้ว

    I got question? Should i use few complex shader but apply on many objects, or use simple shader but apply only few objects.
    For example. Should i use 1 materials with 3 axis world projection to apply for my chairs, Or each chair have 1 materials. Consider i have 50 chairs on the scene

  • @Zumito
    @Zumito 3 ปีที่แล้ว +1

    podrias incluir en la tabla de instrucciones, la operación add y subtract?

  • @johnykallisto6488
    @johnykallisto6488 2 ปีที่แล้ว +1

    Thanks 🙏

  • @OyunZinciri
    @OyunZinciri 3 ปีที่แล้ว

    First of all, thanks for this trailer. I made a structure where the river and the lake are together. In the river, the object stays above the water, while in the lake it does not stay above the water. What is the reason?

  • @MikheevAndrey
    @MikheevAndrey 3 ปีที่แล้ว +1

    Great video, thanx! =)

  • @f6y7t5
    @f6y7t5 ปีที่แล้ว

    Perhaps its different qon the gpu, but loops aren't necessarily turned into a repeated number of instructions. The cpu does a lot of arithmetic in the alu, but also does branches, jumps, and boolean operations. This means you can do variable length loops in assembly

    • @BenCloward
      @BenCloward  ปีที่แล้ว

      GPU compilers work very differently. GPUs are optimized to do large batches of the same thing all at once. GPUs will often calculate both sides of a branch and then take which ever side is actually chosen - to avoid having to do something different from one pixel to the next. For loops are unrolled - so when you get an instruction count back from the compiler on a for loop, it will tell you the maximum number of instructions that will be used. If there's a case where the maximum number won't always actually be used, you can't tell that by looking at the instruction count.

  • @ИванНовожилов-э9з
    @ИванНовожилов-э9з 3 ปีที่แล้ว +1

    great video

  • @GARIKDoroshchuk
    @GARIKDoroshchuk 3 ปีที่แล้ว

    Thank you so much, for this video. I want to ask you about precision in Unity Shader Graph. What exactly this parameter does? Simply rounding float values for quickly compiling? Will be great if you'll touch this topic a bit deeper than standard documentation, in the next video.

  • @JoeShmoe-ii1th
    @JoeShmoe-ii1th 5 หลายเดือนก่อน

    Can you please show us how to make that rain drops on surface shader because i saw you have the rain on Lens instead

    • @BenCloward
      @BenCloward  5 หลายเดือนก่อน +1

      Here's a playlist will all of my rain tutorials: th-cam.com/video/fYGOZYST-oQ/w-d-xo.html

    • @JoeShmoe-ii1th
      @JoeShmoe-ii1th 5 หลายเดือนก่อน +1

      @@BenCloward hello Ben the one i would like called Rain Wetness seem to be available on UE4 instead of Unity URP

    • @BenCloward
      @BenCloward  5 หลายเดือนก่อน

      @@JoeShmoe-ii1th It's exactly the same in Shader Graph as it is in the Unreal Material Editor - so you should be able to follow that UE4 tutorial and make it in Unity. Alternately, Unity just released a brand new Shader Graph sample called Production Ready Shaders (th-cam.com/video/iV79HBv6co4/w-d-xo.html) that contains all of these in Unity.

    • @JoeShmoe-ii1th
      @JoeShmoe-ii1th 5 หลายเดือนก่อน

      @@BenCloward Nice thank you man.!!!

    • @JoeShmoe-ii1th
      @JoeShmoe-ii1th 5 หลายเดือนก่อน

      @@BenCloward is it possible to then combine the following 2 shaders, the rain wetness and clear coat paint? if so which video do you recommend i watch to know how to combine shaders

  • @Pixellions
    @Pixellions 3 ปีที่แล้ว

    Why is unreal or unity not calculating the cycles (like the instruction count) for the top used GPUs (or GPU architecture I don't know) to have an approximation of the real cost of the shader. Or are GPUs so differents one to another ?

    • @BenCloward
      @BenCloward  3 ปีที่แล้ว +3

      Good question. That would be really nice information to have. But it can't be calculated by Unreal or Unity because that data gets created after the shader has already been passed outside of the engine. This is why I went over the order that things happen. The engine creates the graph and then creates the HLSL code from it, but after that, it's up to the API (DirectX, OpenGL, etc) to generate the assembly language, and then after that, it's up to the driver to generate the commands for the specific hardware. We can see how many cycles are required in this case because AMD has provided a compiler that provides that information, but most of the hardware venders don't do that. They want to keep the inner workings of their hardware a secret. So as I mentioned in the video, this cycle information is only accurate on AMD hardware.

    • @Pixellions
      @Pixellions 3 ปีที่แล้ว +1

      @@BenCloward Thank you for the clarification! As always, you are doing very interesting and informative videos.

  • @cosmotect
    @cosmotect 3 ปีที่แล้ว

    Totally down to learn hlsl!!!

  • @jonascarvalhodearaujo8038
    @jonascarvalhodearaujo8038 ปีที่แล้ว

    Once I heard that "clip" function is heavy on performance, but on the table it cost 0 Cycles to run. Is it heavy on performance for other reason or this information is just wrong and clip isn't costly? '-'

    • @BenCloward
      @BenCloward  ปีที่แล้ว +1

      In my experience, there is a very small cost for clip, but if you're able to use it in the right spot, it has the potential to save significantly more performance than it costs. The best place to use it is as close to the start of the shader as possible - as soon as you know whether or not a pixel will actually contribute something useful or not. If you can throw away a pixel at the beginning of the shader before the full pixel shader runs, you can save quite a bit. I've never heard that clip was heavy on performance, but maybe there is one specific platform where that's that case? I know it's not true on PC and consoles. Maybe mobile?

    • @jonascarvalhodearaujo8038
      @jonascarvalhodearaujo8038 ปีที่แล้ว

      ​@@BenCloward Thanks for the answer, Ben!
      I gave a quick research because I was curious about the platform point you brought it, but it seems that opinions seem to vary and also it seems that have a lot of variables to take in account. People mentioned that might be worth it to use clip or not depending on how old the GPUs are and also which kind of shader you are writing (if writes to Z-Buffer or not), so it doesn't seem to have a very direct answer if it's worth it not using clip. Depends on what you doing haha

    • @BenCloward
      @BenCloward  ปีที่แล้ว

      @@jonascarvalhodearaujo8038 Yes, and unfortunately that is almost always the answer you'll get when when you ask if something has a performance cost or not. "It depends." The only real way to know is to try it out on your target hardware and see.

  • @davidclark1775
    @davidclark1775 3 ปีที่แล้ว +1

    A bit worried about your first point. I believe, from my cuda experience, that the reason the loop is shown as unwound, is that the GPU will always take the time needed for all passes through the loop. Therefore, the unwound loop count is actually more accurate. Branching and the like is executed not by skipping instructions (goto or return) but by discarding the results of the instructions. All instances of an SIMD process set must execute the same instruction sequence as there is only one program counter. So even unused passes through the loop come at full cost. Appreciate if some expert on shaders could confirm/deny.

    • @BenCloward
      @BenCloward  3 ปีที่แล้ว

      My understanding is that there are multiple types of branching and looping. The branching that you're describing is dynamic branches where all calculations for both branches are done and then the rejected one is thrown out in favor of the selected branch. However, I do know that it's possible to change the number of loops dynamically in the shader itself. For example, with parallax occlusion mapping, you can adjust the number of loops based on view angle and distance, and doing so provides a large performance gain. However, since there's just a single instruction count number, you'll always get the maximum possible number of instructions even though you won't usually be running all of them. And as you said, if others know more than me about this, I'd be happy to learn more.

    • @parallelno
      @parallelno 3 หลายเดือนก่อน

      ​@BenCloward, thank you for your educational video!
      I think this is a nuanced point, and describing it as "always unrolling the loops" can be misleading. As far as I know, the compiler can unroll the loop if it can determine the iteration count at compile time, but this isn't guaranteed. It depends on factors like the loop iteration count, resource usage, etc. A key concept that might clarify this is the term "wavefront." A wavefront executes a shader across several pixels simultaneously, and threads within the wavefront share constants and variables. Crucially, the execution flow is synchronized between threads. If one thread exits the loop early, it stalls to wait for the others in the wavefront to finish. So, while the loop executes for the same duration across all threads in a wavefront, this doesn't mean the compiled shader contains a fully unrolled set of instructions for the maximum loop iterations.

  • @AldoV
    @AldoV 3 ปีที่แล้ว +2

    7:16 I skipped heart beat for few seconds lol

  • @faris_diz
    @faris_diz 3 ปีที่แล้ว

    💯

  • @PrivacyEnt
    @PrivacyEnt 3 ปีที่แล้ว +1

    why there is a cliffhanger on a shader tutorial haha

    • @BenCloward
      @BenCloward  3 ปีที่แล้ว +2

      Hahah! You want the truth? #1 My explanation about instruction count took longer than I thought, and #2 I ran out of time in my Saturday to work on it. :)

    • @PrivacyEnt
      @PrivacyEnt 3 ปีที่แล้ว

      @@BenCloward looking forward to it!! i am a big fan:))

  • @antoniosuarez7881
    @antoniosuarez7881 ปีที่แล้ว +1

    thanks, grate info