THANK YOU, everything is outdated for godot, and its been awhile since ive coded in gdscript, but applying knowledge from other coding languages i got really excited for this, im happy people like you make this worthwhile
That's valid, but as you say it won't really affect performance and I do find it more readable with the 0 checks. That's the signature I expect to see when iterating over a Moore neighbourhood. Entirely personal preference. Thanks for watching!
this loop will likely be unrolled as a compilation step and this iteration won't be in the code at all. i don't know if the nested for loop will prevent unrolling. for(int i = 0; i
There seems to be a line missing in the video for in the glsl file at at 22:18 (return count;), also in the "main" function the video has "continue" while the github file has "return" instead. Maybe its possible to add annotations to the video?
Hey, when I try to build the project after all the coding is done, I can't get the "Build" button to show up, so I'm stuck with Dispatcher.cs not being able to do anything. When I try running it anyway I get a bunch of errors saying "The type or namespace name [a whole bunch of different variables] could not be found (are you missing a using directive or an assembly reference?) What does this mean?
Impossible to say without more information unfortunately. Which type or namespace is missing? This error is typically caused by a missing using statement at the top of a file.
@@hamsterbyte I think it will. Looking forward to more content. Gonna search through what you have later today before I start making requests. :p Keep up your hard work. Hydrate.
That's some really juicy efficiency. Nice job! My project is bottle necked by tilemaps' single threaded set_cell calls so I guess this would be the way onwards. Doesn't seem too easy though. Is there a reason why specifically GLSL shader is needed instead of a normal Godot shader?
Great question! If you are looking for a way to increase the performance of your TileMap based environment, I can steer you in the right direction there, because a compute shader is probably not the way you want to go. First let's ask a few questions. 1. Do you need to update the entire TileMap at once? 2. How and when are you calling the SetCell method? 3. Are you using a single TileMap or a series of them broken up into chunks? Check my channel for the video on making a terraria inspired tilemap generator. The first part of it is actually generating the tilemap using noise and then I give more detail on how to update the cells using the Moore neighbourhood and cellular automata. I also have a video on Wave Function Collapse that uses a TileMap based system and is quite performant as well. Now, if you want a little further info on the questions you asked I'll address that in two parts. 1. Using SetCell on your tilemap is a major bottleneck for a large dataset if not done properly. Unfortunately, if you are using a tilemap to render the visuals and trying to access that tilemap directly through the API, there's no way to fix that. It will always be a bottleneck, and that bottleneck isn't entirely related to the marshalling inefficiency of C# in Godot. The same bottleneck will be present in GDScript as well. It's down to the fact that SetCell must be called from the main thread. Since there is only one main thread, that means the logic will always be single threaded. You can run multithreaded CPU code in Godot with things like CallDeferred or CallThreadSafe, but all that does it wait for the main thread to become available and call the method there when it is. 2. The reason you cannot use a normal shader to run the simulation isn't because of Godot. It's because normal shaders and compute shaders are entirely different things. Graphics cards are purpose built for rendering pixels to the screen and can consume normal shaders without the need for any extra instructions because they are able to infer their usage directly from the shader code and are typically dispatched automatically as part of the rendering pipeline. Some examples of this would be fragment shaders or vertex shaders; this behaviour is not unique to Godot, but is true of all game engines and graphics libraries I have worked with over the past decade. Compute shaders are not rendering shaders, they are a set of custom instructions that are executed by your GPU. Essentially, you are trying to force your GPU to work like a CPU and though possible, it doesn't really WANT to do that and it needs quite a lot of hand holding. Your GPU is unable to make the same inferences about your custom compute shader that it can for normal shaders. The first shader in the video exists solely because we can't use a TileMap for rendering due to the single threaded constraint. The fastest option is to run the simulation logic on the GPU and simply return the updated dataset to our rendering shader which is also be executed by the GPU. The CPU is acting as a bridge between the two shaders and is still the bottleneck in this application, but since the single threaded logic being executed by the CPU has been minimized, that's where the performance gains are realized. If performance is paramount in a particular part of your game, it is best to execute as much of that logic as possible without accessing the Godot API; unless you are writing C++/Rust in GDExtension. The worst performance hits will come from accessing objects in the scene tree. Most of the data classes are written in pure C# in the .NET implementation of Godot and they can be used freely where applicable. I hope that helps to clear things up and thanks for watching!
It would be nice if there was native support for compute shaders, manually handling the Vulcan interface seems quite painful. One suggestion: it would be helpful to clarify how the work-group / dispatch count works, since you happened to choose numbers that make it a bit confusing (32x32x32x32).
There is native support for compute shaders. I think what you mean is a way to dispatch them without all the setup code. Unfortunately, a system that does that isn't really feasible because every compute shader is going to be different and the GPU can't make assumptions like it does with a typical rendering shader. This is not unique to Vulkan; it's a consequence of how graphics hardware processes information. As for the reason for 32*32 threads and 32*32 workgroups, that's something else entirely and involves a discussion about warps(NVIDIA) and wavefronts(AMD) and all kinds of other industry jargon that is largely irrelevant. All you need to know is that a single warp contains 1024 threads; 32*32. Breaking the total set up into subsets of 32*32 means each subset saturates a single warp on NVIDIA hardware. Given this limit, you can calculate that you need 32*32 of these warps to process the entire grid of 1024*1024 cells. Hope that clears things up a bit. Thanks for watching!
@@hamsterbyte When I said native support, I meant directly in the editor similar to fragment shaders. If fragment shaders can be supported, then so too can compute shaders, if you accept certain assumptions / limitations (fragment shaders do that too). I think the bigger problem is that the development effort required likely outweighs the current feature benefit given that the number of users who would use a compute shader is a small percentage. That claim is obvious if you look at how CUDA works - it's very lightweight in terms of setup, significantly less work than Vulcan (OpenGL compute shaders are also less work than Vulkan). The trick is providing an interface to hide the Vulkan semantics, which is what the fragment shaders have been able to do. In my comment about clarifying the work-group / dispatch count, I meant choosing 32x32 and 32x32 is not clear that the numbers are chosen to be 1024 chunks of 1024 threads, or if they are supposed to the the same thing. So a comment in the github repo or choosing different numbers would be helpful (like 32x32 and 16x16 if you made the target 512x512 instead). Then it would be clear that the two sets of numbers aren't directly related. Also, if I recall correctly, AMD mostly uses wavefronts of 32 now for performance reasons, making the distinction moot except in specific edge cases. And for a tutorial like you were doing, balancing wavefront/warp occupancy isn't really necessary. That said, does your example only work on NVidia GPUs? I know you don't need to set the local_size_x to 32 and it can be larger, but it seems that the maxComputeWorkGroupInvocations is usually limited to 256 on AMD cards (your example is invoking 1024).
Well, that's a pretty large question to unpack in a comment, but the simplest explanation is that graphical shaders, i.e. vertex and fragment shaders, are dispatched automatically as part of the rendering pipeline; this is true of most if not all modern engines. The housekeeping code you are referring to is actually the dispatching and setup/cleanup code. It exists because compute shaders are not part of the rendering pipeline and must be dispatched manually. As you saw in the video, this often involves quite a lot of extra work to tell the GPU: what information to process, where that information is, where to store that information, is that information read or write only, what type of information you are processing, if you want that information back at some point, when are we done with the information, how much information to process in each thread, how many groups of threads are needed to process the information, etc...you can see how this all adds up rather quickly. The GPU needs to know all of these things and there's no way to infer this information from a custom compute shader as there is with shaders that just process pixels on your screen.
Yeah, you will need to do all of the same things regardless of language. There's actually a page in the Godot docs that gives some insight on how to do something similar in GDScript, as well as a Github repo linked on that same page in the docs that uses an image for generating a heightmap with a compute shader. I used the Git resources as reference in my implementation and they are written in GDScript. Good luck!
C# is fully supported in Godot. As of version 4.2 it even supports .NET 8, but you have to be careful when accessing objects in the scene tree as that will usually incur some marshalling costs. Just think a little more about how you structure your logic to minimize that and everything is fine. Thanks for watching!
In Godot 4.2 you can tell Godot to be fully strict about typing, and give off errors when you don't specify a type. It's in Project Settings -> Debug -> GDScript -> Untyped Declaration.
@@skaruts I have this setting enabled and that's cool and all but under the hood it is still a dynamically typed language with all the performance penalties that come with it
@@sunofabeach9424 static-typing should speed up the language, though, because when you statically-type a variable, that removes the need for the interpreter to do runtime type checks every time the variable is used. I'm not sure how much of a boost there is right now, since this is part of recent improvements to gds that are still a work in progress. But they're working on improving it further, anyway.
I feel like there was far too little explained here. For large chunks of the video it's just copying whatever you type with no real explanation on what each part is doing. I tried for 2 hours to follow along with this thing but gave up in the end. Props for trying to make some content about shaders. The solution looks interesting, and I hope in future videos you maybe spend less time live typing and more time explaining why you built it the way you did.
@@Lexie_TJust stating the facts. He literally tells you why you do what you do at every step so if you couldn't figure it out I can only surmise that you had your audio off.
@bagboybrown What did he mean at the timestamp 14:45 when he said open it in our ide? the only ide i have is an arduino ide, how is it obvious to everyone what ide he is using? Im asking you because it seems like you understood it.
@@TriviaQuizocityThat's exactly what he means. I'm not familiar with Arduino, but Rider Jetbrains (paid) or Visual Studio (free) are both compatible with Godot.
THANK YOU, everything is outdated for godot, and its been awhile since ive coded in gdscript, but applying knowledge from other coding languages i got really excited for this, im happy people like you make this worthwhile
You're welcome! I'm glad you found it helpful!
Just got suggested your videos. Good work man. Keep it up.
Well, I'm glad TH-cam finally got something right with their suggestions! Thanks for watching!
Brilliant video. Really well made. Thoroughly enjoyed it!
Glad you enjoyed it! Thanks for watching!
Hope you ll get more views. That is high quality content
Hope you find it helpful!
You could consider making the loop signatures `for (int i = -1, i
That's valid, but as you say it won't really affect performance and I do find it more readable with the 0 checks. That's the signature I expect to see when iterating over a Moore neighbourhood. Entirely personal preference. Thanks for watching!
this loop will likely be unrolled as a compilation step and this iteration won't be in the code at all. i don't know if the nested for loop will prevent unrolling. for(int i = 0; i
This was excellent!
Thanks. I really appreciate that! Happy coding
Good work man.
Appreciate it!
There seems to be a line missing in the video for in the glsl file at at 22:18 (return count;), also in the "main" function the video has "continue" while the github file has "return" instead. Maybe its possible to add annotations to the video?
There were a few mistakes in the first run through of the GLSL, it was addressed at the end of the video
Hey, when I try to build the project after all the coding is done, I can't get the "Build" button to show up, so I'm stuck with Dispatcher.cs not being able to do anything. When I try running it anyway I get a bunch of errors saying "The type or namespace name [a whole bunch of different variables] could not be found (are you missing a using directive or an assembly reference?)
What does this mean?
Impossible to say without more information unfortunately. Which type or namespace is missing? This error is typically caused by a missing using statement at the top of a file.
Subbed so fast.
Thanks for watching. I hope my content helps you out!
@@hamsterbyte I think it will. Looking forward to more content. Gonna search through what you have later today before I start making requests. :p
Keep up your hard work. Hydrate.
That's some really juicy efficiency. Nice job!
My project is bottle necked by tilemaps' single threaded set_cell calls so I guess this would be the way onwards. Doesn't seem too easy though. Is there a reason why specifically GLSL shader is needed instead of a normal Godot shader?
Great question! If you are looking for a way to increase the performance of your TileMap based environment, I can steer you in the right direction there, because a compute shader is probably not the way you want to go. First let's ask a few questions.
1. Do you need to update the entire TileMap at once?
2. How and when are you calling the SetCell method?
3. Are you using a single TileMap or a series of them broken up into chunks?
Check my channel for the video on making a terraria inspired tilemap generator. The first part of it is actually generating the tilemap using noise and then I give more detail on how to update the cells using the Moore neighbourhood and cellular automata. I also have a video on Wave Function Collapse that uses a TileMap based system and is quite performant as well.
Now, if you want a little further info on the questions you asked I'll address that in two parts.
1. Using SetCell on your tilemap is a major bottleneck for a large dataset if not done properly. Unfortunately, if you are using a tilemap to render the visuals and trying to access that tilemap directly through the API, there's no way to fix that. It will always be a bottleneck, and that bottleneck isn't entirely related to the marshalling inefficiency of C# in Godot. The same bottleneck will be present in GDScript as well. It's down to the fact that SetCell must be called from the main thread. Since there is only one main thread, that means the logic will always be single threaded. You can run multithreaded CPU code in Godot with things like CallDeferred or CallThreadSafe, but all that does it wait for the main thread to become available and call the method there when it is.
2. The reason you cannot use a normal shader to run the simulation isn't because of Godot. It's because normal shaders and compute shaders are entirely different things. Graphics cards are purpose built for rendering pixels to the screen and can consume normal shaders without the need for any extra instructions because they are able to infer their usage directly from the shader code and are typically dispatched automatically as part of the rendering pipeline. Some examples of this would be fragment shaders or vertex shaders; this behaviour is not unique to Godot, but is true of all game engines and graphics libraries I have worked with over the past decade. Compute shaders are not rendering shaders, they are a set of custom instructions that are executed by your GPU. Essentially, you are trying to force your GPU to work like a CPU and though possible, it doesn't really WANT to do that and it needs quite a lot of hand holding. Your GPU is unable to make the same inferences about your custom compute shader that it can for normal shaders.
The first shader in the video exists solely because we can't use a TileMap for rendering due to the single threaded constraint. The fastest option is to run the simulation logic on the GPU and simply return the updated dataset to our rendering shader which is also be executed by the GPU. The CPU is acting as a bridge between the two shaders and is still the bottleneck in this application, but since the single threaded logic being executed by the CPU has been minimized, that's where the performance gains are realized.
If performance is paramount in a particular part of your game, it is best to execute as much of that logic as possible without accessing the Godot API; unless you are writing C++/Rust in GDExtension. The worst performance hits will come from accessing objects in the scene tree. Most of the data classes are written in pure C# in the .NET implementation of Godot and they can be used freely where applicable. I hope that helps to clear things up and thanks for watching!
It would be nice if there was native support for compute shaders, manually handling the Vulcan interface seems quite painful.
One suggestion: it would be helpful to clarify how the work-group / dispatch count works, since you happened to choose numbers that make it a bit confusing (32x32x32x32).
There is native support for compute shaders. I think what you mean is a way to dispatch them without all the setup code. Unfortunately, a system that does that isn't really feasible because every compute shader is going to be different and the GPU can't make assumptions like it does with a typical rendering shader. This is not unique to Vulkan; it's a consequence of how graphics hardware processes information.
As for the reason for 32*32 threads and 32*32 workgroups, that's something else entirely and involves a discussion about warps(NVIDIA) and wavefronts(AMD) and all kinds of other industry jargon that is largely irrelevant. All you need to know is that a single warp contains 1024 threads; 32*32. Breaking the total set up into subsets of 32*32 means each subset saturates a single warp on NVIDIA hardware. Given this limit, you can calculate that you need 32*32 of these warps to process the entire grid of 1024*1024 cells. Hope that clears things up a bit. Thanks for watching!
@@hamsterbyte When I said native support, I meant directly in the editor similar to fragment shaders. If fragment shaders can be supported, then so too can compute shaders, if you accept certain assumptions / limitations (fragment shaders do that too). I think the bigger problem is that the development effort required likely outweighs the current feature benefit given that the number of users who would use a compute shader is a small percentage. That claim is obvious if you look at how CUDA works - it's very lightweight in terms of setup, significantly less work than Vulcan (OpenGL compute shaders are also less work than Vulkan). The trick is providing an interface to hide the Vulkan semantics, which is what the fragment shaders have been able to do.
In my comment about clarifying the work-group / dispatch count, I meant choosing 32x32 and 32x32 is not clear that the numbers are chosen to be 1024 chunks of 1024 threads, or if they are supposed to the the same thing. So a comment in the github repo or choosing different numbers would be helpful (like 32x32 and 16x16 if you made the target 512x512 instead). Then it would be clear that the two sets of numbers aren't directly related. Also, if I recall correctly, AMD mostly uses wavefronts of 32 now for performance reasons, making the distinction moot except in specific edge cases. And for a tutorial like you were doing, balancing wavefront/warp occupancy isn't really necessary. That said, does your example only work on NVidia GPUs? I know you don't need to set the local_size_x to 32 and it can be larger, but it seems that the maxComputeWorkGroupInvocations is usually limited to 256 on AMD cards (your example is invoking 1024).
conway's game of life is less of a game
*and more of a life*
Bahaha
A relatively simple compute shader, but tons of housekeeping are required to run it. Why don't other types of shaders need this lengthy process?
Well, that's a pretty large question to unpack in a comment, but the simplest explanation is that graphical shaders, i.e. vertex and fragment shaders, are dispatched automatically as part of the rendering pipeline; this is true of most if not all modern engines. The housekeeping code you are referring to is actually the dispatching and setup/cleanup code. It exists because compute shaders are not part of the rendering pipeline and must be dispatched manually. As you saw in the video, this often involves quite a lot of extra work to tell the GPU: what information to process, where that information is, where to store that information, is that information read or write only, what type of information you are processing, if you want that information back at some point, when are we done with the information, how much information to process in each thread, how many groups of threads are needed to process the information, etc...you can see how this all adds up rather quickly. The GPU needs to know all of these things and there's no way to infer this information from a custom compute shader as there is with shaders that just process pixels on your screen.
fantastic video, I'll give a try to implementing it. I think the dispatcher code should translate 1 to 1 to GDscript, more or less?
Yeah, you will need to do all of the same things regardless of language. There's actually a page in the Godot docs that gives some insight on how to do something similar in GDScript, as well as a Github repo linked on that same page in the docs that uses an image for generating a heightmap with a compute shader. I used the Git resources as reference in my implementation and they are written in GDScript. Good luck!
I wish godot would use something more strict as its main language like this one instead of python-ish gdscript
C# is fully supported in Godot. As of version 4.2 it even supports .NET 8, but you have to be careful when accessing objects in the scene tree as that will usually incur some marshalling costs. Just think a little more about how you structure your logic to minimize that and everything is fine. Thanks for watching!
In Godot 4.2 you can tell Godot to be fully strict about typing, and give off errors when you don't specify a type. It's in Project Settings -> Debug -> GDScript -> Untyped Declaration.
@@skaruts I have this setting enabled and that's cool and all but under the hood it is still a dynamically typed language with all the performance penalties that come with it
@@hamsterbyte I tolerate C# even less than gdscript... I think that something like Golang superset would suit this engine very well
@@sunofabeach9424 static-typing should speed up the language, though, because when you statically-type a variable, that removes the need for the interpreter to do runtime type checks every time the variable is used.
I'm not sure how much of a boost there is right now, since this is part of recent improvements to gds that are still a work in progress. But they're working on improving it further, anyway.
I feel like there was far too little explained here. For large chunks of the video it's just copying whatever you type with no real explanation on what each part is doing. I tried for 2 hours to follow along with this thing but gave up in the end. Props for trying to make some content about shaders. The solution looks interesting, and I hope in future videos you maybe spend less time live typing and more time explaining why you built it the way you did.
Seemed pretty clear to me, try watching it with the volume on.
@@bagboybrown I was clear that I was thankful for the video and just giving feedback. Don't need to take it to heart bagboy.
@@Lexie_TJust stating the facts. He literally tells you why you do what you do at every step so if you couldn't figure it out I can only surmise that you had your audio off.
@bagboybrown What did he mean at the timestamp 14:45 when he said open it in our ide? the only ide i have is an arduino ide, how is it obvious to everyone what ide he is using? Im asking you because it seems like you understood it.
@@TriviaQuizocityThat's exactly what he means. I'm not familiar with Arduino, but Rider Jetbrains (paid) or Visual Studio (free) are both compatible with Godot.