Shader Performance Optimization - UE4 Materials 101 - Episode 7

Ben Cloward

มุมมอง 56 299

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 10 ม.ค. 2025

ความคิดเห็น • 83

@PeterLeban 4 ปีที่แล้ว ⁺³⁷
That subtract 0.5, multiply and add 0.5 again is the best trick I've learned in a year.
@irjayjay 3 ปีที่แล้ว ⁺²
Thanx so much for this comment, I would've missed that if I didn't see your comment first!
@altycoggydeer 4 ปีที่แล้ว ⁺³³
The last technique with combining 3 textures into one just BLEW MY MIND! I thought that a tutorial on optimization was gonna be an embodiment of boredom, but you turned it into a fun, informative experience! Thank you, truly
@단비-w8n 2 ปีที่แล้ว ⁺⁷
When I was preparing to get a job as a character modeler, I first watched your video and studied. Now, I'm watching your video again and studying to be a technical artist in the character team. Thank you very much always.
@deanbouvier8529 4 ปีที่แล้ว ⁺⁵⁵
Hi Ben, this is good video to get people thinking or paying attention to optimizing materials as its something everyone should consider but I had a few thoughts while watching this.
You don't have to remove nodes from a material if they are not connected, these won't be compiled so they will have no cost, this is the same as if using switches, but a switch depending on how many unique variations of them being true or false will determine how many versions of compiled shaders are created, stored and potentially running at runtime. This is useful if you want to have a master material which is easier to maintain but allows flexibility.
The pipeline idea while in that case isn't really saving you anything under the hood in instructions does neaten up the material which can be nice thing to have as materials get complicated.
Regarding the packed texture, It's important to note Green channel has the best compression and you should always put Roughness the most important map in a Metal Rough workflow into Green, while Blue I believe has the highest tendency for artifacts so often AO goes there, do not use a Black map in a channel and then use an Alpha channel you just went from DXT1(4bps) to DXT5(8bps) doubling the memory on the texture for no reason, only use Alpha when you need another high compression/low artifact channel in the packing or you just need the 4 channels, some would still argue make a separate linear greyscale for that map but you also add a texture lookup so not sure its all the great of a saving in the end so that is up for debate. Also when packing channels always turn off sRGB (I usually switch to mask type compression) as these are being used as masks so you don't want any correction of the values.
Lastly I feel its important to mention that rarely do you need to supply a Specular map in UE4, most times setting the specular to a value based on its type of material or just to suit your taste is adequate and accurate enough. One of the benefits of PBR is not having faked Specular highlights added with additional maps, as this is properly calculated at runtime based on a number of factors (Check out Real Shading in Unreal Engine 4 by Brian Karis if you want the details of their implementation.) , and absolutely never use one if the material is Metallic. I've only seen it used for mico-facet, and IOR in special cases. In short using a specular map is rarely worth the cost.
@BenCloward 4 ปีที่แล้ว ⁺⁸
This is great stuff, Dean. Thanks a lot for the details.
@BenCloward 3 ปีที่แล้ว ⁺⁴
@Trevor O'Brecht Remember that texture memory isn't the only consideration. If you have 6 maps vs 4, that means you're sampling 6 textures in the shader. A texture sample is one of the slowest, most costly things you can do in a shader, so having 6 of them means the shader is slower than having 4 of them. If it were me, I would spend slightly more texture memory and pack them into fewer textures so that the shader runs faster.
@mhnoni 3 ปีที่แล้ว
I didn't understand "do not use a Black map in a channel and then use an Alpha channel" may you explain what did you mean by that?
@deanbouvier8529 3 ปีที่แล้ว ⁺²
@@mhnoni In terms of bit usage it's more efficient to use all the RGB channels of a texture sample before using the Alpha, the reference to black means one of the RGB is empty so what I was trying to say is use all free channels and write channel selection logic into a material before defaulting to using Alpha Channels when possible, the one case for using Alpha above other channels to pass info is when you need the fidelity or less chance of artifacts in compression.
@mhnoni 3 ปีที่แล้ว
@@deanbouvier8529 Ah I see, Thanks for the explanation.
Does alpha has less artifact than the Green channel?
@WiremuTeKani 3 ปีที่แล้ว ⁺²
I've just come across this series to familiarise myself with UE4 and its use of shaders. It's brilliant and I'm really enjoying your thorough explanations on each subject. Thank you for making this series!
@BIGTTSNORLAX 4 ปีที่แล้ว ⁺²
You are giving back to the community in a big way! Thank you
@zombi1034 ปีที่แล้ว ⁺³
24:31 I have always wondered what this "ASMR" is all about. Now I finally got it😂😂
@2strokedesign 4 ปีที่แล้ว ⁺¹⁸
18:59 sadly this doesn't work quite like that. A float4 is just 4 floats packed into one package, when you multiply a float4 with a float it's the same as explicitly multiplying 4 different floats, the number of instructions stays exactly the same because under the hood you have to multiply each component. Packing makes sense for texture samplers because the sampling itself has a constant cost regardless of how many channels you actually need you will always pay for 4, might as well use them for something.
But packing in the code at best doesn't make any difference performance-wise, and at worst you might end up with slower code(if you don't actually need a channel).
@BenCloward 4 ปีที่แล้ว ⁺²⁵
One of the best parts about making these videos is that I get to learn things from the comments. This is the perfect example. Since I made this video, I have since learn that the current generation of graphics hardware (GCN) is not faster at doing vector math (instead of scalar math) as previous hardware used to be. Page 22 of this pdf explains it pretty well - www.humus.name/Articles/Persson_LowlevelShaderOptimization.pdf. So yes, you're right. It used to be faster to do what I explained here, but not any more.
@sgtdevrupesh9106 2 ปีที่แล้ว
@@BenCloward Best part for me reading comments is that extra information I get from the video down in comments. Thank you all for amazing, free and complete information. (And thanks for PDF 😏).
@Burnrate 3 ปีที่แล้ว ⁺³
That little edit you did at 13:34 lol.
It showed less instructions for the power because you had the power set at 1. Setting it at 20 is what gave the 115. Just wanted to point that out in case anyone noticed the jump in all the numbers and wondered why.
@BenCloward 3 ปีที่แล้ว ⁺¹
Nice, catch! I can tell you're watching really carefully.
@abitw210 4 ปีที่แล้ว ⁺¹
thank you very much for these tutorials series. it's great to hear someone which such experience sharing this knowledge !
$@fracturedfantasy$
@fracturedfantasy 2 ปีที่แล้ว ⁺¹
Refactoring seems like a great skillset, gonna take some practice and trial and error to get good at that - I assume experience is the best ally here. Thanks again for the tutorial. 💪
@wmka 3 ปีที่แล้ว ⁺¹
Exactly what I'm looking for.
Thank you and have a great day.
@jamesemory8781 4 ปีที่แล้ว ⁺⁵
Random thing: If you have a multiply node with an add right after, it basically equals a single operand and gets compiled very well, so it's only a single instruction or so, at least that's what I was told.
@twisteddinolab7515 4 ปีที่แล้ว ⁺³
Clamp is expensive too. saturate is free. Set texture-samples to Wrap and if you use TGA 32b format you could use the alpha channel aswell to pack an extra linear texture,but i dont know if that costs more then regular rgb. Normally we use the RHA format: Roughness, Height, Ambient-Occlusion.
@_Caose ปีที่แล้ว ⁺⁵
Hi someone from 2 years ago, and probably has forgotten this comment long ago. I just want to say that, for the future generations, if a clamp node is set to 0-1, the engine optimizes it into a saturate node. Regards.
@turalmemmedov3648 5 ปีที่แล้ว ⁺²
Huge thanks. Amazing tutorial. Very informative.
@Omar.bin.khattab 4 ปีที่แล้ว
Thanks Ben ... For Your Time ... With My Respect
@rhulyon5728 3 ปีที่แล้ว ⁺²
Hello great video!, I know i got a little late to the party but I like to add a 5th method i use to optimize code: use custom uv when i can to pass complex calculation to the vertex shader instead of using the pixel pixel one for that
@BenCloward 3 ปีที่แล้ว ⁺²
Yes - that's a great one! Doing math in the vertex shader is almost always cheaper than doing it per-pixel. Thank you for adding this!
@Zumito 3 ปีที่แล้ว ⁺²
otro tip es que es mas rentable en el pasto utilizar el two sided desactivado, y generar una copia de la malla del pasto con las normales invertidas en blender para que funcione igual, lo malo es que si se ilumina de un lado, del otro es como que traspasa, ya que no tiene 2 caras, por lo que hay que saber utilizar esto
@petemache6751 4 ปีที่แล้ว ⁺¹
Hey Ben great breakdown here.
I think someone else already mentioned it but, float 4's are just 4 individual floats packed into a single, so the result would be the same as if you had 4 individual floats and multiplied those 1by1.
In terms of textures, its true that your texture sampler count is lowered, and your instruction count is lowered, but a texture with an alpha channel has 2x the resource size of a texture thats just RGB.
So whilst its true that the shader instruction and texture sampler count becomes lower, there is the increased memory usage from having a texture now be 2x what it was before adding the alpha channel in.
This is something I've noticed myself, but, I'd love to hear about it from someone like you, it might help me to understand it better, as I may be confusing resource size with ingame memory size, I'm not entirely sure.
@BenCloward 4 ปีที่แล้ว ⁺³
Hi Pete! Thanks for your thoughts. Yep, you're right. A DXT5 texture with an alpha channel is twice the texture memory as a DXT1 texture without alpha - so adding in that 4th channel doubles the texture memory size, just like you say. That means you have to be careful when you decide to use it. If you can get rid of a texture sample by doing it, I think it's worth it. Texture memory is a little bit more flexible than GPU performance since the engine will stream mip levels in and out depending on the load. So if I can save some GPU performance by using a bit more texture memory, I think that's a good trade. Another consideration is that the alpha channel (when using DXT5) is always the highest quality, least compressed channel. So if you have important data - like the roughness, it's probably best to put that in the alpha to keep the quality high. Whereas if you were to put it into R,G, or especially B, it's going to get chewed up a bit more by the compression.
@petemache6751 4 ปีที่แล้ว ⁺²
@@BenCloward Thanks Ben. That was a solid response. I'm always thinking of the texture memory, but you're right that at times it may be better to get better gpu performance.
I see it as being on a case by case basis somewhat, but I certainly agree.
@baorichard4345 4 ปีที่แล้ว ⁺¹
@@petemache6751 Dame, i learnt a l lot from these comments....
@manonthedollar 3 ปีที่แล้ว
Instead of plugging in nodes to your Base Color to preview them, you can right-click a node and choose "Begin Preview" to see it temporarily piped to the Base Color without untangling a bunch of stuff!
@chadvoller2031 4 ปีที่แล้ว
I'm wondering why if I create a material that has nothing connected, just a plain ol' vanilla node, that my instruction count is 117 vs yours that is more complex, say at about 15:00, is only 111. Has the update to 4.25.3 made the base shader cost more? Like the one you have using the noise to animate the UVs, built exactly how you have it, 129 vs 96. Is it also dependent on how the scene was set up? Confused I am.
@BenCloward 4 ปีที่แล้ว
There could be a lot of things that change this including engine version, project settings, scene settings, and root node settings. I wouldn't worry about trying to match my numbers exactly.
@looniper3551 3 ปีที่แล้ว
Instruction count is iffy. You are correct that some executions take longer. There are instructions that literally take over a hundred times as long as a simple addition.
There is also HOW the instructions are executed.
Say you're on an 80x86 (PC) processor.
You can add 1+1
.. by loading 1 into each of 2 registers and adding them into one, the other, a third register, or to a location in memory.
. by loading 1 into a register and 1 into a location in menory then adding them into one, the other, a third register, or to a location in memory.
.. but if you do it using the AH and AL registers it takes place the fastest of the instructions the processor can perform, and if you're resulting it into one of them it is as near to instant as any function machines perform.
But that same addition loaded into DH and a specific location in memory... it's going to take tense of times as long.
So efficient methodology is even more vital than number of instructions performed. The slightest shortcut in the conversion-to-native will means results that can be longer than interpolated transfer of a high level language (C++ or Blueprints) to the machine's code.
--
My method of testing is to simply use the complexity view....
On a Duplicate of the material where I've copied the whole process before pinning to the Material - 100 times... and lerped them all together.
So it has the same end result, but uses 100 times the processing to get there.
If it's green after that... you have nothing to worry about, on any platform.
@l_t_m_f 2 ปีที่แล้ว
Do you know why sometimes the Base pass shader instruction count doesn't show (UE5).
@kathoden 4 ปีที่แล้ว ⁺²
Maybe make a topic on virtual texturing? That would be awesome
@adamplechaty 3 ปีที่แล้ว
Wouldnt it be more efficient to plug in the blue channel with the packed metallic map into the metallic output? I thought black meant no metallic which corresponds to the zero value in that float?
@coffee-beast-99z 2 ปีที่แล้ว
Thanks for the video! I wonder, what is the impact of 100 instructions vs. 97? Thats a 3% difference. I suppose it has a meaningful impact if its used many times throughout a scene?
@tylerscott3192 2 ปีที่แล้ว
What video did you make this material in?
@Uncle_Fred ปีที่แล้ว
This question is a couple years late, but what are your thoughts Ben on using switch parameters to create a fairly comprehensive master material? In the master, most material features are slotted behind disabled switches, and only enabled in a material Instance when needed. Is this a decent way of going about things?
@BenCloward ปีที่แล้ว ⁺¹
Instead of making one giant material that does everything, my strategy is to make several different materials that each do a category of things. So for example, I'll create a rock shader that has a bunch of features related to the types of things rocks need. So it can be used for small rocks, boulders, and cliff faces depending on what features you enable or disable. Same thing for foliage. That one can be used for bushes, ground cover, tree canopies, etc, depending on what settings are turned on. So instead of just one, I divide the total set of required shaders into logical categories and then make one for each category. IMO, this is a good compromise between having too many shaders to maintain and having one big thing that's too complicated to add to or work on.
@Uncle_Fred ปีที่แล้ว
@@BenCloward Thank you for the response Ben.
@aphexx100 3 ปีที่แล้ว
@ben: the power node optimization is awesome. however if for some reason i would want to keep the "dissolve" of the power node. like think of curvature maps that you want to pinch towards an edge with a power node followed by a multiply to adjust intensity.
how could i make your optimization work the same way as the power node? is this possible?
@BenCloward 3 ปีที่แล้ว ⁺¹
If you need the specific functionality of the power node, go ahead and use the power node. The optimization is only for cases where you don’t need it to behave exactly like power.
@aphexx100 3 ปีที่แล้ว
@@BenCloward oh yes thank you ben. i just thought we can have the power node functionionality + the awesome instruction savings :D throught there might be some more math goodness that can do this
@Gaelrenaultdu06 4 ปีที่แล้ว ⁺¹
Hello, thanks for your tips, thoug, i have a question. In method 3, you say the best way to mesure if an "object" is correctly optimized, is to test it on the level. My question is, what if i have hundreds of different materials to test out ? Your best method implies testing every single object of the scene to find out which one is causing fps drops (for exemple).
@BenCloward 4 ปีที่แล้ว ⁺¹
You may have hundreds of objects, but hopefully you don't have unique materials on all of them. It's best (IMO) to create a small set of master materials (one for rocks, one for foliage, one for small props - for example) and then use those for every object that fits in that category - using material instances. That way, if you need to make changes to them, you don't have to go through and change the material on every single object. And that way, you have a much smaller set of materials to optimize. You can assign one of your master materials to a single object and performance test just that. Optimize it and make it run as fast as you can. Then it will also be optimized for all of the other objects you've applied it to.
@Gaelrenaultdu06 4 ปีที่แล้ว
@@BenCloward Thanks for your answer. I understand, the problem is, when it comes to something like an open world game, having the same material on plenty of objects can become noticeable quite quickly. What would you recommand ?
@BenCloward 4 ปีที่แล้ว
If you expose the texture slots as parameters, you can use material instances but swap out the textures for each object.
@Gaelrenaultdu06 4 ปีที่แล้ว
@@BenCloward Thanks !
@business2015 3 ปีที่แล้ว
When using a packed texture, is there any way to apply panners, rotations, etc. to the individual channels without doing a texture lookup for each? Can I do it downstream from the texture sample node?
@BenCloward 3 ปีที่แล้ว
No, that’s not possible. If you need to apply different UVs, it’s better to break the channels up into separate textures.
@business2015 3 ปีที่แล้ว
@@BenCloward Thanks!
@Restart-Gaming 4 ปีที่แล้ว
This still work I have a bought landscape from the asset store not to sure if its optimized when in etditor when looking down frame rate 45+ when looking forward 5 frame rate help please
@BenCloward 4 ปีที่แล้ว
Maybe try my tutorial video on making landscapes? There's a section where I talk about the settings for controlling how many triangles the landscape is using. th-cam.com/video/fpUOxwDNNcQ/w-d-xo.html
@Restart-Gaming 3 ปีที่แล้ว
@@BenCloward Thanks already have very large landscape that was a bought asset wanted to find out if you had some tips or tricks on optimizing the premade landscape trees grass etc would like to learn how to do it I think it would mostly be in the texture nodes all Greek to me hope you can help
@Maxparata 4 ปีที่แล้ว
Hi, I have a question : Where can we learn about the cost of different operations? How did you know that the power operation was more costly than a subtrat-multiply-add operation? Is there a list somewhere with the cost of every operations?
@BenCloward 4 ปีที่แล้ว ⁺¹
This is a tricky question to answer. The first problem is that the cost of each operation can be different on every platform or graphics hardware. The second problem is that the compiler that converts the shader to assembly language can often find shortcuts that make things run faster than predicted depending on what other operations are present and what order they’re in. So the cost of an operation can vary depending on the context. As I described in this video, you can use the instruction count as a rough approximation of performance, but the best way to know if something is fast is to test it on your target hardware.
@Maxparata 4 ปีที่แล้ว
@@BenCloward Alright I see, so it can depend on the compiler then. I guess the square root operation is also very costly? It is on the processor, so it might be the case on the GPU as well.
Things like sine, cosine, acos, tan etc... can also be very costly I guess?
I mean in general, then of course all depends on the target hardware and the number of objects that uses this shader.
@BenCloward 4 ปีที่แล้ว
Yes, that is correct. Multiply and add are the two cheapest. The hardware can often do a multiply and an add instruction together in a single cycle. I believe this is the reason that the contrast method I introduced here is cheaper than using the power node.
@momomadi2 4 ปีที่แล้ว
hello Ben -- when u used (Subtract multiply add) nodes - is that because the power node is heavy or because we use great numbers on the (exp) pin ???
and can we replace any power node with this three nodes - or it is specific on some cases ____ sorry if this sounds noob question
@andrey730 2 ปีที่แล้ว
I've tried it - power node seems to take up to 5 instructions after using power of 6 and further (if power is less than 6 it will take less instructions).
Adding any number add, subtract and multiply nodes following each other seems like have no effect on instruction count - it stays the same as without those nodes.
This replacement works for the image contrast but it's not panacea - sometimes you might actually need "power" effect.
@momomadi2 4 ปีที่แล้ว
hello ben - i wanna ask - is that true that we can get roughness from gloss map by adding minus 1 node ???
and another question is that i see a lot off developers using the Albedo to get the roughness and secular ? is that efficient way to do it?
@lobachevscki 4 ปีที่แล้ว
Yes, Glossiness and Roughness are basically an inversion of each other. If you have a glossiness map (intended let say for Unity) you can add a one-minus node in your shader to have it converted to roughness inside unreal (you can use a lerp with inverted 0 and 1 values). However, this is just a last resort if you dont have a way to generate the right roughness, as packages as Substance do perform some operations when they input to Unreal and to Unity, so ideally we author our textures outputing to the right program.
Base Color texture and roughness (as well as Normal and Height) when created in something like Designer or scanned are closely related, so if you want to save a texture fetch you can use a channel from your Base Color as roughness and it might work (depends on the look you want). You are sacrificing accuracy for optimization.
@alienrenders ปีที่แล้ว
I realize this video is old, but packing textures can be dangerous. You have to be aware of what compression you are using and what channels have higher precision. All three channels of RGB pack 4x4 (16 pixels) into a 64bit value. The alpha channel packs the same number of alpha values into 64bit as well. So there's WAY more bits per pixel in the alpha channel if you use DXT5. The color channels are reduced to 565 colors before compressing. And some code will apply a filter before compressing. Not only that, but combining textures this way will alter the colors produced by DXT5. You can easily get color bleeding. In this case, they're all grayscale, but the effect is similar. For most games, maybe losing precision on these textures isn't a big deal. But using the alpha and green channels should be used for those textures you want more precision for.
@BenCloward ปีที่แล้ว ⁺²
I wouldn't call that dangerous. I'd call it understanding how the system works at a low level and squeezing it to get the most efficiency possible. :) I've created a couple of other videos on this topic if you're interested: th-cam.com/video/WJkEacYRhPU/w-d-xo.html th-cam.com/video/mEDoy-N1ODQ/w-d-xo.html th-cam.com/video/m5bP-xc6Sgs/w-d-xo.html
@jcdentonunatco 3 ปีที่แล้ว
Thanks for this video. I noticed my materials all have instruction counts over 900, regardless of how simple they are. For example:
Base pass shader: 920 instructions
Base pass shader with Surface Lightmap: 943 instructions
Base pass shader with Volumetric Lightmap: 991 instructions
Base pass vertex shader: 101 instructions
Base pass vertex shader: 144 instructions
Texture samplers: 11/16
Texture Lookups (Est.): VS(0), PS(4)
User interpolators: 2/4 Scalars (1/4 Vectors) (TexCoords: 2, Custom: 0)
this is what pretty much all my materials look like.
Do you have any idea what is happening here?
@BenCloward 3 ปีที่แล้ว
I don't know off the top of my head. It sounds like maybe your engine is set up to do forward rendering instead of differed - but I have no idea how it may have been changed to do that.
@jcdentonunatco ปีที่แล้ว
@@BenCloward indeed it looks like my engine has forward shading enabled in the project settings. I'm curious why its so much higher in forward rendering, but I guess related to the fact all the rendering is done in a single pass?
@Zumito 3 ปีที่แล้ว ⁺³
utilice todo lo de este tutorial y logré hacer que el material de mi circular progress bar sea de 35 instrucciones (30 instrucciones son por el simple hecho de existir xd)
@BenCloward 3 ปีที่แล้ว
Bien hecho!
@erikee 3 ปีที่แล้ว
Thank you!
@Konksling123 3 ปีที่แล้ว
I have very basic materials with over 900 base instructions. What do i do? lol
@BenCloward 3 ปีที่แล้ว
I believe that your shader is set to use the forward renderer instead of the deferred renderer, or it's set to use transparency? Or maybe you have Unreal set up for mobile or low-end graphics? Shaders with very high instruction counts usually indicate that all of the lighting calculations are also being included in the instruction counts. With the instruction counts that I'm getting (in the low 100s), Unreal is using deferred rendering - which means that none of the lighting is included with these counts because the shader is only rendering to the Gbuffer and then all of the lighting is done afterward. So don't worry too much about the super high number. It's just using a different rendering method.
@fran.fernandez 3 ปีที่แล้ว
my ue4 is broke, my instructions never change
@abilalp9104 4 ปีที่แล้ว ⁺¹
thanks
@snowy0110 10 หลายเดือนก่อน
tip: watch it on 1.25 speed
@kennyleo4real 4 ปีที่แล้ว ⁺⁴
.asmr, lol
@aaakenway2416 4 ปีที่แล้ว
I noticed that too lol

ต่อไป

เล่นอัตโนมัติ

Advanced Channel Packing - Shader Graph Basics - Episode 23