Hi everyone, hope you enjoy this new episode. If there are any topics that you think would be a good fit for future coding adventure episodes, I'd love to hear them! Also, if you'd like to support me in creating more videos, I have a patreon page here: www.patreon.com/SebastianLague?ty=h Either way, thanks for watching :)
Oh! I saw this video on procedural mesh generation in 3D, so rather then doing a single plane of terrine you could do a 3D connect shape which can result in procedural trees, with custom dynamic trunks and leaves, custom rocks, and other plants, you could make the plant type change in values depending on biome, you could change texture type and randomly generate that, basically fully randomly generated plants in biomes, that'd be super sick!
Hey man, I would really love to see a couple videos connecting some of your recent projects! If you could add this erosion technique with the procedural planets. Then maybe go more in-depth on erosion or maybe adding procedural rivers/lakes? Just throwing out some ideas. Love all your videos, keep up the great work!
Yes, I totally agree! I was thinking that another good choice for music during this 100% accurate scene would be "Gonna Fly Now" from the Rocky movies. The tune really gets you excitied and fired up about exertion--only programming is mental exertion, not physical exertion.
@@zyansheep Not even skill will save you. My professor, which has 40+ years experience working in C the other day spent 30 minutes trying to fix a segmentation fault in a simple program. It's just our destiny as programmers
Graphics programming student here; numthreads is the amount of threads that share data and are executed in one go. Every group is made out of numthreads and so dispatching less groups is generally faster (less stalls), unless these groups access shared data (or do group syncs). Having numthreads as 1,1,1 will destroy the whole idea of the gpu; paralization (as it just runs minimum number of threads and then masks out everything but one) since gpu cores are way less powerful by themselves then cpu cores. The threads that are dispatched should be a multiple of the GPU's architecture's minimum threads for optimal performance (32 for nvidia, 64 for amd) since less threads will be wasted. These "minimum threads" are also called a warp and all threads in a warp should follow the same branch (otherwise, branch divergence will cause both branches to be executed, but the branch that shouldn't execute the code will be masked out), this is done because the gpu draws its power from SIMD; a technique that allows you to operate the same operation on multiple objects. Their advertised threads are pretty much faked, as the threads they mention have to do the same thing for every warp; so not functioning like cpu threads. Nice to see something about compute shaders :) You can do infinite things with them; generate things procedurally, do lighting & ssao, culling and setting up draw calls, etc. EDIT: Do note that more threads in a group isn't always better in algorithms that have branch divergence, for/while loops, syncs and/or write to memory inconsistently (or use groupshared memory). There's a lot of things that can affect this, mostly the GPU architecture, so be sure to test performance with different group sizes.
Adding on to this discussion of maximizing performance with compute shaders. A thing that is really worth mentioning, is the nature of ComputeBuffers and the GetData() method. The GetData method, along with other kinds of GPU buffer reads, such as readPixels() for textures, are not asynchronous. They force a hard stall to the CPU thread, and it waits until the data you want from the GPU is ready. And along with that, even if that data is ready, it could be further stalled by anything else that puts the GPU in a busy state. So the time it could take is really not easy to judge. Like... I experimented with it before, and purposefully set up an automatically approximating delay between the dispatch call, and the data fetch, and that did a sort of decent job, but in the end, being just a little bit too early, or too late on the GetData fetch could cause a lag spike. So after looking into it for quite a while, I found this... github.com/digitalsanity/AsyncTextureReader I'll start with a note that I haven't properly tested it in the latest versions of Unity 2018, but it was working in 2017 for sure. Assuming it does work though, It's a very easy to use and modify plugin that allows you to signal that you want data from a GPU buffer immediately after Dispatch, and you can have a constant check for it's status in a loop, without any laggy synchronization issues. (Edit: Oh right, the name of the plugin is a little confusing, it says texture, but it does support ComputeBuffers as well). There is however, another potential solution directly supported in Unity 2018 brought up here forum.unity.com/threads/asynchronously-getting-data-from-the-gpu-directx-11-with-rendertexture-or-computebuffer.281346/page-3#post-3358878 Though I have not tried it out myself. Edit: It's in the documentation docs.unity3d.com/ScriptReference/Rendering.AsyncGPUReadback.html Jeez am I behind on Unity... Kinda stopped focusing on development stuff for like a year now.
Hey, just wondering, you say you're a graphics programming student, do you only study this or other types of programming too. What schools would you recommend?
@@littlenarwhal3914 it's a study in Breda (BUas) but mainly focused on game programming (C++). There's a small percentage that ends up doing graphics, but you do get forced to at least do some basic C++ & opengl. Most things you'll end up teaching yourself, as the field is constantly changing and there's a lot of difference even between graphics programmers, so they end up steering you based on your projects and documentation
"Oh ok, so with light bouncing, the spheres will take on the color of adjacent spheres a little more, thats neat" SPHERES BECOME BEAUTIFUL PERFECT MIRRORS
I'm a former programmer who has changed careers to an unrelated field. I just wanted to say how much joy I get from your videos; they truly capture the spirit of why I loved programming.
I feel lucky to have stumbled across your videos. Every one I've watched so far is showing me a different way of looking at things that I didnt think of. Calculating shadows backwards...wow. I love the style of these videos, and the things they are opening my mind to feels extremely important. Thank you!
Very impressed with your newer videos coming out Sebastian! Feels much high quality, more engaging and fun, open and honest about your skill level and how you learned it, well explained, entertaining, and still so knowledgeable, well well done, amazing job, keep up the good work man!
I can't believe that I've never seen your videos before And you have been doing this for... I don't know, 6 years? You surely deserve more attention, keep doing this great work
His (slight) accent sounds German to me. It doesn't matter all that much ofc, but it would've really surprised me if he hadn't at least lived in Germany (or another German speaking country) at some point.
Yup, you still got it. I've been coming back to watch your videos on procedural gen, along with that marching square thing more than once. Impressive stuff you did there, lovely to finally have an understanding on how they compute the magic albedo number.
Loved the montage of fake typing. Hilarious, would love to see more of that. I enjoy this channel because it fits the code bullet niche but tackles more complicated topics. Can’t wait to see more simulations and other projects. Would it make sense for all the droplets to pool into lakes? Maybe store the final location of all droplets, then calculate lake volume that way?
Due to evaporation & ground absorbance, that might be too simplistic (and/or generate weird terrain, namely an excess of water). It's probably a good start however!
Sebastian, thank you for your fantastic videos. Great balance of depth+breadth. I've been getting back into Unity lately now that they support asynchronous GPU feedback --- this allows you to fetch data from the GPU buffers without causing a pipeline stall. Not needed in your current use case, but if you ever need to fetch results from a compute shader after every frame, you'll want to do so asynchronously. Anyway, great timing on this series, and I really look forward to following your progress with Unity's Compute Shaders!
This really is amazing. It's the kind of stuff I really like. My suggestions, just off the top of my head, add some sort of absorption rate, giving raise to river, lakes, etc. Following that, perhaps even consider the still water bodies that'll form, how they interact with the environment and are affect by something or rather.
Thanks so much for making these Coding Adventure videos - they're great explanations of interesting problems and very inspirational to try them out yourself. Keep it up!
I don't code but this is so interesting! I found you through the Ray Marching video and it really blew my mind. I've worked so often with booleans Mandelbrot sets and Metaballs, but getting an insight in the fundamentals of how it would be solved in code is amazing. Keep the good work up!
in 4:53, here is why 1024 works better : GPU is a collection of threads aka small cpus on one cheep. thus by defenition there will be a limt number of how many threads on each GPU. currently i think the GTX series has max of like 3000 and something thread. Now you want to run something in parallel generally the number of stuff you wanna do will exceed the number of threads for instance if you want to do a ray tracing of an image of size 600 X 400 you have 240000 pixel but in reality you can run only 3000 of them in parallel . the solution is work loads each GPU is divided into multiple Warps each warp contain only 32 thread that has shared memory and small cache memory to run 32 kernel as fast as possible and alternately. If you now make the number of thread (1024) a multiple of 32, you help the GPU organize the work load and u will have ur 3000 thread bizi all time thus better performance
I went and followed that paper on compute shaders you linked in the description. Jesus, respect goes out to you man. I am able to follow it and kind of understand what was going on, but I have no idea how I could take that information and use it to make a compute shader for something else like erosion. Time to do more reading and learning I guess!
Watching this made me think how cool it would be to save the heightmap of the original landscape before erosion and make each droplet at its creation check if it is on a pixel that has been eroded a certain amount already. The deeper you go, the more dense the ground gets thus it should erode slower. Combining this with a random noise map of "dirt thickness" I think a much more realistic erosion could be simulated with basically no added calculations. Also, the formula (original height - current height) shows how much the ground eroded and -more importantly- based on the "dirt thickness" noise map, it gives you what color the pixel is. Without adding significant calculations to the program, you can color the ground easily, even with multiple ground layers and rock types corresponding to different ground density levels.
This looks like fun, I think I'll try to follow along for this one. I remember doing ray-tracing in Nvidia CUDA long time ago for college, we've come a long way since then.
Given a problem that can be solved both ways, I'm fairly sure it's more efficient to use compute shaders and solve it in hardware rather than on CPU time - if only because the inputs to the shader are managed on CPU, but also because you could run other problems that do NOT have a GPU-friendly solution on the CPU in the meantime. Excellent video, thanks :-)
A bit about how GPUs work (oversimplification) The GPU has a lot of cores and a main core. When you pass a kernel to the main core, the main core sends the kernel over to free cores. each core is called a group, in this example. The main core also passes the group's ID as (x,y,z) coords. Each core can then run the kernel on a few threads. Since different threads from the same group run on the same core, they can somewhat synchronize and communicate. The same is not true for threads from different groups, as different cores might not even be able to run at the same time.
When I saw the first video of your terrain gen last month, running it on a GPU was exactly my thought. Since I wanted to try and do something with CUDA, I did pretty much the same thing you're doing with compute shaders in Unity now. Except that I wrote a DLL in C++ for it and used CUDA there, which is just a bit closer to hardware level. Needless to say, going for shaders in C# is probably the better option though as long as you don't try to squeeze out every single bit of performance, as it isn't tied to the OS or an nVidia graphics card. Anyways, here are a couple things I'd like to mention: - For your heightmap generator you're currently starting map.Length many thread groups with just 1 thread each. It should be more like map.Length/1024 threadgroups with 1024 threads each, otherwhise all those threads will be run sequentially and there is no benefit from using a GPU. Keep in mind though that if map length is not divisible by 1024 you need to add an additional threadgroup. Also, you could launch it as a grid so you don't need to calculate x and y all the time (modulo on GPUs is expensive). You'll need to check if your position is still inside the map though, since your GPU will always launch a full threadgroup, even if not all of those threads are needed and would then write to unallocated memory. - Instead of using atomic min/max in every thread you could write yet another kernel for each of those, that implement it as an optimized reduce algorithm that you could launch after the initial computation, before clearing the mapBuffer from your GPU. Maybe there is even a library for that. In my case I just used cuBLAS. Reduce algorithms can be a real pain to optimize though, esp. if you're not familiar with how your GPU handles its threads in depth, and the noise gen isn't really the part that needs all the speed, so feel free to ignore that. - I really like the change you made for the erosion brush, having a list of weights for every map point was really set out to scale badly. I just put essentially the same loop you have for calculating weights into my kernel, instead of creating a list once and using that instead, which isn't really a good idea. I'm just recalculating the same stuff over and over again atm, with sqrts being rather expensive to use. Bad me :( - One thing I haven't tried myself yet and I don't know if its even possible using shaders in Unity: if you write the hight map to the GPUs texture memory it allows you to just read 'in between pixels', doing all the interpolation using hardware acceleration. At least nVidia cards actually have hardwired circuits for interpolation of textures. Pretty sick. Really, thank you for all the inspiration. I really learned a lot of new things from trying to make a CUDA version of this. Your videos are awesome! PS: I also learned that Unity likes to crash a lot if you're not careful with hooked DLLs, better avoid them if possible :p
Really cool and awesome suggestions by other users. It would be so cool to see something like Age of Empires use similar techniques so terrain gradually changes over 1000s of years.
4:06 "The title of the video is Compute Shaders so I should probably...": No! Don't EVER let the title hold you back, the sidepaths and off-topics are some of the best parts of your videos.
this video from nearly 3 years ago was the second coding adventure video he ever made, when he was still figuring out what it should be (not that he necessarily isn't still doing that)
I did something similar to learn ruby. Not having easy access to a GPU from code I instead moved dirt in stages, a sort of progressive refinement. I started with a very low resolution map of the world, wore it down, then doubled the resolution and added some random detail. Each stage of the world had the same number of raindrops except the last but since the resolution was smaller the first raindrops were very wide (like glaciers or something). The last stage would get more raindrops. The elaboration I made was to track a global sea level. If a raindrop was below sea level its carrying capacity would be zero. This gave nice river deltas. I could tell that rivers and maybe ponds were happening, but I couldn't get a good way to detect where the rivers were.
Compute shaders are pretty neat and gave me a good lesson in how GPUs schedule cores towards tasks being run and how to optimise towards that (i.e. scheduling a workgroup with the number of threads available in a "wavefront" for each dispatch). Made a pretty neat raymarcher with some basic shadows and reflections based on some work I'd done in shadertoy, and then because I'm lazy I do a very basic blit of the image I generate, right to the swapchain. It probably runs faster though cause I'm not calling the vertex/pixel shader unnecessarily to get the image up, and the GPU can bypass all the rasterization hardware I believe it's pretty necessary to optimize compute shader code such that your workgroups can all be executed at once (and without wasting any threads!!). For NVIDIA you're looking at executing 32 threads at once for a single workgroup - though I'd imagine doing 64 would not have too much of a drawback - even if two SIMD units have to be activated for that workgroup. For AMD, it's a bit more generous (for some) with 64 threads, so if you're doing quite a large dispatch with many independent threads then an 8x8x1 workgroup for instance runs quite nicely. Intel is a lot more restrictive though - without looking into Xe (which hopefully has some improvements), with your average integrated graphics you can only really dispatch 8 threads at once - but then again developing for intel is a waste of time.
nice one! entertaining and helpful. barebone implementation summarized in the video, more detailed links. awesome! this now makes my terrain generator go nnniiiooommmm
That erosion effect is so great. If I was a little more handy with code I'd try to port it to C, compile it into a .dll and make it accessible to Python (from there I'd make a Blender addon).
Wow this is very informative! Are you going to ever delve into voxel terrain at some point with the errosion script to make the terrain more realistic?
Could you actually do some videos on things like ray-tracing or shaders? I know you said you know hardly a thing about either but your very brief explanation of rays and ray tracing was great.
I know I'm very late to this but turns out A LOT of people don't understand how numthreads works. in your GPU, you are telling your computer to set up many sub compute modules, each one having an id. if you think of the numthreads id as an index in a double for loop (at least for (8,8,1)), it will make a bit more sense. instead of basic example in slight sudo code: texture = new texture(100, 100); black and white or one channel for(int i = 0 ; i < texture.width; i++) { for(int j = 0 ; j < texture.height; j++) { texture[i][j] = 1; } } converts to this. [numthreads(100, 100, 1)] void TextureWhite(uint3 id : SV_DispatchThreadID) { texture[id.xy] = 1; } CPU takes 100*100 cycles to go through the texture to change each pixel to white one at a time GPU takes (with numthreads(100+,100+,1)) 1 cycle to go through the texture and change all the pixels to white at the same time the id variable represents the i,j,k that would be in your for loop where the CPU iterates over each pixel one at a time, the graphics card can iterate over each index the for loop would have called but in a grid all at the same time. if you are going to be computing for lets say a 100x100 resolution texture with RGB, then you should use numthreads(100,100,3) having the id.x be the x coordinate, id.y be the y coordinate and the id.z be the color channel. if you are working on an array of floats with lets say 100 elements, then your numthreads should be (100,1,1) and have the id.x be the index to the float. if you are working with 100 floats in an array, its okay to have lets say (10, 1, 1) but the problem is it will take 10 passes for the GPU to complete the processing and its also okay to have more numthreads than necessary but if you know how much you are going to be working with at a time, its better to set it up to the max you are expecting so your graphics card isn't setting up threads for nothing to happen. so in summary, your graphics card can compute a group of for loops for you at the same time, your numthreads is setting up a grid to compute what is basically a for loop and its a good idea to keep the x and y of the numthreads around what you think you are going to be working with. also if you know what the amount of threads you are going to get into your kernel are all the time, I guess its still a good practice to divide it in the dispatch so it doesn't overflow. but having more compute units dispatched and processing in one pass rather taking a safe 8,8,1 in multiple passes is better. might still be confusing but I hope this helps someone!
Great work! If you get a chance, try to put a hot air balloon in you scene that travels around with a player camera while the erosion iterates. I'm curious to know what the performance impact and/or considerations will turn out to be. However, I make no promise of following up on my own initiative so if I am too frustrating to ignore, feel free to DM me.
Hey there, a performance advice: use a bigger threads group where the product between the dimension is a multiple of 32 for nvidia or 64 for amd gpus. You will have way better performances!
Hello! I was inspired by this video to attempt creating my own compute shader for raytracing, as you did. I ran into several issues along the way with just setting up a render texture and rendering it to the screen. I was wondering if you would consider making a tutorial for how this stuff works, and explaining a lot more in depth what you had to do to get the compute shader drawn to the screen in this video. Thanks for considering my idea!
Hi everyone, hope you enjoy this new episode. If there are any topics that you think would be a good fit for future coding adventure episodes, I'd love to hear them!
Also, if you'd like to support me in creating more videos, I have a patreon page here: www.patreon.com/SebastianLague?ty=h
Either way, thanks for watching :)
Level of detail system for the procedural planets series.That will be great 👍
Oh! I saw this video on procedural mesh generation in 3D, so rather then doing a single plane of terrine you could do a 3D connect shape which can result in procedural trees, with custom dynamic trunks and leaves, custom rocks, and other plants, you could make the plant type change in values depending on biome, you could change texture type and randomly generate that, basically fully randomly generated plants in biomes, that'd be super sick!
HAH "bumpy ride" gadem
more compute shader
Hey man, I would really love to see a couple videos connecting some of your recent projects! If you could add this erosion technique with the procedural planets. Then maybe go more in-depth on erosion or maybe adding procedural rivers/lakes? Just throwing out some ideas. Love all your videos, keep up the great work!
That montage is the most accurate depiction of coding i’ve seen in my life
Never works first try.
@@fders938 Unless you are *really* lucky or *really* **really** skilled...
Yes, I totally agree! I was thinking that another good choice for music during this 100% accurate scene would be "Gonna Fly Now" from the Rocky movies. The tune really gets you excitied and fired up about exertion--only programming is mental exertion, not physical exertion.
@@zyansheep Not even skill will save you.
My professor, which has 40+ years experience working in C the other day spent 30 minutes trying to fix a segmentation fault in a simple program.
It's just our destiny as programmers
@@zyansheep "skill" in coding means you already tackled a problem in any way or shape. Else is "let's go read some docs about how to do this"
Love this channel, you don't feel pressure to follow along. You can just enjoy watching someone learn and write new code, which is always exciting
Glad you found the ray tracing tutorial useful, thanks for featuring it here :) David / Three Eyed Games
Thanks for making it!
That GPU erosion definitely looks like something we should put into Mercator by the way... th-cam.com/video/0iqMQjYrbSk/w-d-xo.html
Your tutorial is really meaningful,when will the part 4 be released?
@@-vanitas5229 Currently working on a bunch of other things, so not soon. Sorry.
@@threeeyedgames2858 Alright,good luck! BTW your tutorial is poetic XD.
Graphics programming student here; numthreads is the amount of threads that share data and are executed in one go. Every group is made out of numthreads and so dispatching less groups is generally faster (less stalls), unless these groups access shared data (or do group syncs). Having numthreads as 1,1,1 will destroy the whole idea of the gpu; paralization (as it just runs minimum number of threads and then masks out everything but one) since gpu cores are way less powerful by themselves then cpu cores. The threads that are dispatched should be a multiple of the GPU's architecture's minimum threads for optimal performance (32 for nvidia, 64 for amd) since less threads will be wasted. These "minimum threads" are also called a warp and all threads in a warp should follow the same branch (otherwise, branch divergence will cause both branches to be executed, but the branch that shouldn't execute the code will be masked out), this is done because the gpu draws its power from SIMD; a technique that allows you to operate the same operation on multiple objects. Their advertised threads are pretty much faked, as the threads they mention have to do the same thing for every warp; so not functioning like cpu threads. Nice to see something about compute shaders :) You can do infinite things with them; generate things procedurally, do lighting & ssao, culling and setting up draw calls, etc.
EDIT: Do note that more threads in a group isn't always better in algorithms that have branch divergence, for/while loops, syncs and/or write to memory inconsistently (or use groupshared memory). There's a lot of things that can affect this, mostly the GPU architecture, so be sure to test performance with different group sizes.
Adding on to this discussion of maximizing performance with compute shaders.
A thing that is really worth mentioning, is the nature of ComputeBuffers and the GetData() method.
The GetData method, along with other kinds of GPU buffer reads, such as readPixels() for textures, are not asynchronous.
They force a hard stall to the CPU thread, and it waits until the data you want from the GPU is ready. And along with that, even if
that data is ready, it could be further stalled by anything else that puts the GPU in a busy state. So the time it could take is really not
easy to judge.
Like... I experimented with it before, and purposefully set up an automatically approximating delay between the dispatch call, and the data fetch,
and that did a sort of decent job, but in the end, being just a little bit too early, or too late on the GetData fetch could cause a lag spike.
So after looking into it for quite a while, I found this... github.com/digitalsanity/AsyncTextureReader
I'll start with a note that I haven't properly tested it in the latest versions of Unity 2018, but it was working in 2017 for sure.
Assuming it does work though, It's a very easy to use and modify plugin that allows you to signal that you want data from a
GPU buffer immediately after Dispatch, and you can have a constant check for it's status in a loop, without any laggy synchronization issues.
(Edit: Oh right, the name of the plugin is a little confusing, it says texture, but it does support ComputeBuffers as well).
There is however, another potential solution directly supported in Unity 2018 brought up here forum.unity.com/threads/asynchronously-getting-data-from-the-gpu-directx-11-with-rendertexture-or-computebuffer.281346/page-3#post-3358878
Though I have not tried it out myself.
Edit: It's in the documentation docs.unity3d.com/ScriptReference/Rendering.AsyncGPUReadback.html
Jeez am I behind on Unity... Kinda stopped focusing on development stuff for like a year now.
I believe the warp size of NVIDIA is 32 and of AMD it's 64.
@@krytharn my bad, changed it
Hey, just wondering, you say you're a graphics programming student, do you only study this or other types of programming too. What schools would you recommend?
@@littlenarwhal3914 it's a study in Breda (BUas) but mainly focused on game programming (C++). There's a small percentage that ends up doing graphics, but you do get forced to at least do some basic C++ & opengl. Most things you'll end up teaching yourself, as the field is constantly changing and there's a lot of difference even between graphics programmers, so they end up steering you based on your projects and documentation
3:20 those flying prices of code even have a shadow, good job Sebastian
OMG, I didn't even noticied.
That's not a shadow projected in 3d, just a blurred black version of the og image.
Price of code is really flying
This is the greatest series of TH-cam, hands down
I subscribed just because of coding adventures
"Oh ok, so with light bouncing, the spheres will take on the color of adjacent spheres a little more, thats neat"
SPHERES BECOME BEAUTIFUL PERFECT MIRRORS
Its only beautiful and perfect because you see yourself in it ;)
@@otinaj no u dont
@@josephdavison4189 it’s a joke
@@user-dh8oi2mk4f it's a compliment actually
I KNOW RIGHT,,, I was expecting just a simple colour gradient sort of thing but NOPE SHINY MIRROR BALLS!!!!
2:44 RTX: OFF
2:48 RTX: ON
2:55 RTX: ONER
2:59 RTX: EVEN MORE ON
3:03 RTX: THE ONEST
RTX
RTX 360
RTX ONE
RTX ONE X
RTX
RTXS
RTXR
RTX
RTX 2
RTX 3
RTX 4 PRO
2.55 rtx oner 2.59 rtx boner
3:10 RTX R42069 TI
I'm a former programmer who has changed careers to an unrelated field. I just wanted to say how much joy I get from your videos; they truly capture the spirit of why I loved programming.
Loved this kind of playful video, educational but also fun and entertaining. Please do more of these in the future!
I have literally no idea what's going on but I'ma keep watching.
Lol, same
3:16 That was unexpected and absolutely hilarious :p
agreed, made my day
I got an ad
I agree
It looked like infinite black holes scary ⚫️ OMG THERES ONE COMING RUN!
3:17 I love his extremely realistic coding sequences! They get a realism-approval by me - a fellow coder :D
I feel lucky to have stumbled across your videos. Every one I've watched so far is showing me a different way of looking at things that I didnt think of. Calculating shadows backwards...wow. I love the style of these videos, and the things they are opening my mind to feels extremely important. Thank you!
Very impressed with your newer videos coming out Sebastian! Feels much high quality, more engaging and fun, open and honest about your skill level and how you learned it, well explained, entertaining, and still so knowledgeable, well well done, amazing job, keep up the good work man!
I just love how those spheres with hard shadows look halfway the video, gently bobbing up and down, esthetically pleasing.
Thanks for this intro into compute shaders! They are definately something I want to learn more about.
I can't believe that I've never seen your videos before
And you have been doing this for... I don't know, 6 years?
You surely deserve more attention, keep doing this great work
this video is really good. So good that I started learning about compute shaders because of it.
Thank you for your amazing work.
You're a god-dam legend, man xD Respect from a fellow South African!
Oh? Didn't know he's Safrican... Now I feel even more inferior! lol
? his channel description says he's german..
@@Stonium bc hes not?
@@dxrpz1669 I don't know what to tell you.
His (slight) accent sounds German to me. It doesn't matter all that much ofc, but it would've really surprised me if he hadn't at least lived in Germany (or another German speaking country) at some point.
I love this mans voice so much, it’s very calming. I put on his videos to help me sleep sometimes, works a treat
Yup, you still got it. I've been coming back to watch your videos on procedural gen, along with that marching square thing more than once. Impressive stuff you did there, lovely to finally have an understanding on how they compute the magic albedo number.
Loved the montage of fake typing. Hilarious, would love to see more of that. I enjoy this channel because it fits the code bullet niche but tackles more complicated topics. Can’t wait to see more simulations and other projects. Would it make sense for all the droplets to pool into lakes? Maybe store the final location of all droplets, then calculate lake volume that way?
Due to evaporation & ground absorbance, that might be too simplistic (and/or generate weird terrain, namely an excess of water). It's probably a good start however!
Maybe if you experiment with a kind of evaporation and/or absorption logic you'd get a nicer waterscape
Sebastian, thank you for your fantastic videos. Great balance of depth+breadth. I've been getting back into Unity lately now that they support asynchronous GPU feedback --- this allows you to fetch data from the GPU buffers without causing a pipeline stall. Not needed in your current use case, but if you ever need to fetch results from a compute shader after every frame, you'll want to do so asynchronously. Anyway, great timing on this series, and I really look forward to following your progress with Unity's Compute Shaders!
This really is amazing. It's the kind of stuff I really like.
My suggestions, just off the top of my head, add some sort of absorption rate, giving raise to river, lakes, etc. Following that, perhaps even consider the still water bodies that'll form, how they interact with the environment and are affect by something or rather.
Thanks so much for making these Coding Adventure videos - they're great explanations of interesting problems and very inspirational to try them out yourself. Keep it up!
I don't code but this is so interesting! I found you through the Ray Marching video and it really blew my mind. I've worked so often with booleans Mandelbrot sets and Metaballs, but getting an insight in the fundamentals of how it would be solved in code is amazing. Keep the good work up!
in 4:53, here is why 1024 works better : GPU is a collection of threads aka small cpus on one cheep. thus by defenition there will be a limt number of how many threads on each GPU. currently i think the GTX series has max of like 3000 and something thread. Now you want to run something in parallel generally the number of stuff you wanna do will exceed the number of threads for instance if you want to do a ray tracing of an image of size 600 X 400 you have 240000 pixel but in reality you can run only 3000 of them in parallel . the solution is work loads each GPU is divided into multiple Warps each warp contain only 32 thread that has shared memory and small cache memory to run 32 kernel as fast as possible and alternately. If you now make the number of thread (1024) a multiple of 32, you help the GPU organize the work load and u will have ur 3000 thread bizi all time thus better performance
Whoa. I'm just getting into unity, I had no idea it was capable of this. Very cool. Instant sub
i watch your videos when i am drawing as I find them very relaxing and interesting (even tho idk whats going on half the time). please make more!
I just wanted to say that your terrain looks EXTREMLY realistic!!!!
I'm glad I found this channel.
I just watched some other videos in this series. I’m in love with it. Good Job!! Can’t wait to see more!!!
You are actually my fav TH-camr! :)
This was a really interesting video. Can't wait for the next coding adventure. Keep up the good work, Sebastian ;).
"Scooch closer"
That's one hell of an action button if I've ever seen one
I just went back and looked at that 😂
I went and followed that paper on compute shaders you linked in the description. Jesus, respect goes out to you man. I am able to follow it and kind of understand what was going on, but I have no idea how I could take that information and use it to make a compute shader for something else like erosion. Time to do more reading and learning I guess!
fantastic content. Thanks for breaking down each step and showing code snippets. I feel inspired to start something myself now
I keep coming back to this video because I find it so interesting
Watching this made me think how cool it would be to save the heightmap of the original landscape before erosion and make each droplet at its creation check if it is on a pixel that has been eroded a certain amount already. The deeper you go, the more dense the ground gets thus it should erode slower. Combining this with a random noise map of "dirt thickness" I think a much more realistic erosion could be simulated with basically no added calculations. Also, the formula (original height - current height) shows how much the ground eroded and -more importantly- based on the "dirt thickness" noise map, it gives you what color the pixel is. Without adding significant calculations to the program, you can color the ground easily, even with multiple ground layers and rock types corresponding to different ground density levels.
This guy should be awarded with the most educated teacher on TH-cam
Keep this series going! It so good
This guy could solve world hunger with compute shaders
This looks like fun, I think I'll try to follow along for this one. I remember doing ray-tracing in Nvidia CUDA long time ago for college, we've come a long way since then.
Given a problem that can be solved both ways, I'm fairly sure it's more efficient to use compute shaders and solve it in hardware rather than on CPU time - if only because the inputs to the shader are managed on CPU, but also because you could run other problems that do NOT have a GPU-friendly solution on the CPU in the meantime. Excellent video, thanks :-)
thank you for the learning ressources; your project looks nice :)
Oh, nice. Always glad to see shaders videos.
Great short introduction to compute shaders!
I wish I had this when I have tried to learn how they work..
sebastian lague is the god of optimizing code
wow this is excellent. i just started messing with shaders too so this is a great help. thanks!
Verg first video that i saw! And this is worth Subscribing !
Just discovered your channel. Great vid, very informative and loads of cool stuff. Appreciate the hard work and sharing your knowledge
A bit about how GPUs work (oversimplification)
The GPU has a lot of cores and a main core.
When you pass a kernel to the main core, the main core sends the kernel over to free cores.
each core is called a group, in this example.
The main core also passes the group's ID as (x,y,z) coords.
Each core can then run the kernel on a few threads.
Since different threads from the same group run on the same core, they can somewhat synchronize and communicate.
The same is not true for threads from different groups, as different cores might not even be able to run at the same time.
The "scooch closer" button gets me everytime
When I saw the first video of your terrain gen last month, running it on a GPU was exactly my thought. Since I wanted to try and do something with CUDA, I did pretty much the same thing you're doing with compute shaders in Unity now. Except that I wrote a DLL in C++ for it and used CUDA there, which is just a bit closer to hardware level. Needless to say, going for shaders in C# is probably the better option though as long as you don't try to squeeze out every single bit of performance, as it isn't tied to the OS or an nVidia graphics card. Anyways, here are a couple things I'd like to mention:
- For your heightmap generator you're currently starting map.Length many thread groups with just 1 thread each. It should be more like map.Length/1024 threadgroups with 1024 threads each, otherwhise all those threads will be run sequentially and there is no benefit from using a GPU. Keep in mind though that if map length is not divisible by 1024 you need to add an additional threadgroup. Also, you could launch it as a grid so you don't need to calculate x and y all the time (modulo on GPUs is expensive). You'll need to check if your position is still inside the map though, since your GPU will always launch a full threadgroup, even if not all of those threads are needed and would then write to unallocated memory.
- Instead of using atomic min/max in every thread you could write yet another kernel for each of those, that implement it as an optimized reduce algorithm that you could launch after the initial computation, before clearing the mapBuffer from your GPU. Maybe there is even a library for that. In my case I just used cuBLAS. Reduce algorithms can be a real pain to optimize though, esp. if you're not familiar with how your GPU handles its threads in depth, and the noise gen isn't really the part that needs all the speed, so feel free to ignore that.
- I really like the change you made for the erosion brush, having a list of weights for every map point was really set out to scale badly. I just put essentially the same loop you have for calculating weights into my kernel, instead of creating a list once and using that instead, which isn't really a good idea. I'm just recalculating the same stuff over and over again atm, with sqrts being rather expensive to use. Bad me :(
- One thing I haven't tried myself yet and I don't know if its even possible using shaders in Unity: if you write the hight map to the GPUs texture memory it allows you to just read 'in between pixels', doing all the interpolation using hardware acceleration. At least nVidia cards actually have hardwired circuits for interpolation of textures. Pretty sick.
Really, thank you for all the inspiration. I really learned a lot of new things from trying to make a CUDA version of this. Your videos are awesome!
PS: I also learned that Unity likes to crash a lot if you're not careful with hooked DLLs, better avoid them if possible :p
These videos are very interesting, would love to see more of these.
Really cool and awesome suggestions by other users.
It would be so cool to see something like Age of Empires use similar techniques so terrain gradually changes over 1000s of years.
super cool, super inspiring. Please keep these up!
Excellent video, looking forward to the next.
4:06 "The title of the video is Compute Shaders so I should probably...": No! Don't EVER let the title hold you back, the sidepaths and off-topics are some of the best parts of your videos.
this video from nearly 3 years ago was the second coding adventure video he ever made, when he was still figuring out what it should be (not that he necessarily isn't still doing that)
I did something similar to learn ruby. Not having easy access to a GPU from code I instead moved dirt in stages, a sort of progressive refinement. I started with a very low resolution map of the world, wore it down, then doubled the resolution and added some random detail. Each stage of the world had the same number of raindrops except the last but since the resolution was smaller the first raindrops were very wide (like glaciers or something). The last stage would get more raindrops.
The elaboration I made was to track a global sea level. If a raindrop was below sea level its carrying capacity would be zero. This gave nice river deltas. I could tell that rivers and maybe ponds were happening, but I couldn't get a good way to detect where the rivers were.
I like how you named the button called scooch closer, and literally made it just for moving the camera forward lol
Compute shaders are pretty neat and gave me a good lesson in how GPUs schedule cores towards tasks being run and how to optimise towards that (i.e. scheduling a workgroup with the number of threads available in a "wavefront" for each dispatch). Made a pretty neat raymarcher with some basic shadows and reflections based on some work I'd done in shadertoy, and then because I'm lazy I do a very basic blit of the image I generate, right to the swapchain. It probably runs faster though cause I'm not calling the vertex/pixel shader unnecessarily to get the image up, and the GPU can bypass all the rasterization hardware
I believe it's pretty necessary to optimize compute shader code such that your workgroups can all be executed at once (and without wasting any threads!!). For NVIDIA you're looking at executing 32 threads at once for a single workgroup - though I'd imagine doing 64 would not have too much of a drawback - even if two SIMD units have to be activated for that workgroup. For AMD, it's a bit more generous (for some) with 64 threads, so if you're doing quite a large dispatch with many independent threads then an 8x8x1 workgroup for instance runs quite nicely. Intel is a lot more restrictive though - without looking into Xe (which hopefully has some improvements), with your average integrated graphics you can only really dispatch 8 threads at once - but then again developing for intel is a waste of time.
nice one! entertaining and helpful. barebone implementation summarized in the video, more detailed links. awesome! this now makes my terrain generator go nnniiiooommmm
That erosion effect is so great. If I was a little more handy with code I'd try to port it to C, compile it into a .dll and make it accessible to Python (from there I'd make a Blender addon).
I have no Idea what you are talking about. But I love it!
3:16 was AMAZING
really enjoyingthis series
Awesome video! A lot of good stuff here 😁
These videos are awesome! Please keep it up
3:18 When you have to finish your coding project but the deadline is in 3 hours.
Sir I don't believe you are from planet Earth. Exquisite Stuff!
Wow this is very informative! Are you going to ever delve into voxel terrain at some point with the errosion script to make the terrain more realistic?
Thanks, yes I do plan to do that at some point!
Very cool. Nice job dude.
I really like the series, keep it up :)
Is there anything you are not able to accomplish? xD Great series btw!
Wow, the speed increase is incredible. I need to use this 😊
Hi Sebastian, great video, as always! What's your Editors (i guess VS Code) color theme?
What IDE do you use? It looks great 😁
it's guys like this that really make me wish i could afford to support them
“Scooch closer” 🤣
Wow! I had no idea GPGPU raytracing had progressed so far. :)
Uh yeah it's not new at all
I love your work!
Could you actually do some videos on things like ray-tracing or shaders? I know you said you know hardly a thing about either but your very brief explanation of rays and ray tracing was great.
I love to watch your videos they are great I learn a lot about unity. thank you so much. 😻
Very well put together, but I wish you had given more info about GPUs @ 5:00
I know I'm very late to this but turns out A LOT of people don't understand how numthreads works. in your GPU, you are telling your computer to set up many sub compute modules, each one having an id. if you think of the numthreads id as an index in a double for loop (at least for (8,8,1)), it will make a bit more sense. instead of
basic example in slight sudo code:
texture = new texture(100, 100); black and white or one channel
for(int i = 0 ; i < texture.width; i++) {
for(int j = 0 ; j < texture.height; j++) {
texture[i][j] = 1;
}
}
converts to this.
[numthreads(100, 100, 1)]
void TextureWhite(uint3 id : SV_DispatchThreadID) {
texture[id.xy] = 1;
}
CPU takes 100*100 cycles to go through the texture to change each pixel to white one at a time
GPU takes (with numthreads(100+,100+,1)) 1 cycle to go through the texture and change all the pixels to white at the same time
the id variable represents the i,j,k that would be in your for loop
where the CPU iterates over each pixel one at a time, the graphics card can iterate over each index the for loop would have called but in a grid all at the same time.
if you are going to be computing for lets say a 100x100 resolution texture with RGB, then you should use numthreads(100,100,3) having the id.x be the x coordinate, id.y be the y coordinate and the id.z be the color channel. if you are working on an array of floats with lets say 100 elements, then your numthreads should be (100,1,1) and have the id.x be the index to the float. if you are working with 100 floats in an array, its okay to have lets say (10, 1, 1) but the problem is it will take 10 passes for the GPU to complete the processing and its also okay to have more numthreads than necessary but if you know how much you are going to be working with at a time, its better to set it up to the max you are expecting so your graphics card isn't setting up threads for nothing to happen. so in summary, your graphics card can compute a group of for loops for you at the same time, your numthreads is setting up a grid to compute what is basically a for loop and its a good idea to keep the x and y of the numthreads around what you think you are going to be working with.
also if you know what the amount of threads you are going to get into your kernel are all the time, I guess its still a good practice to divide it in the dispatch so it doesn't overflow. but having more compute units dispatched and processing in one pass rather taking a safe 8,8,1 in multiple passes is better.
might still be confusing but I hope this helps someone!
Love the “scooch closer” button
Great work! If you get a chance, try to put a hot air balloon in you scene that travels around with a player camera while the erosion iterates. I'm curious to know what the performance impact and/or considerations will turn out to be. However, I make no promise of following up on my own initiative so if I am too frustrating to ignore, feel free to DM me.
Those spheres went from ☠️ to 😎 really quick
Hey there, a performance advice: use a bigger threads group where the product between the dimension is a multiple of 32 for nvidia or 64 for amd gpus. You will have way better performances!
Nice Video
Are you going to make an E08 of Procedural Planet ??
youre like ahead of me on my quest I feel like your dream game/ project is similar to mine
Thank you for these insights mate.
Love it
Very cool! Loving it :)
I'd really love to see a close up of all the details of the finished result, and maybe even increase the size/detail of the mountain
Imagine how cool the spheres would look with ray casting
Hello! I was inspired by this video to attempt creating my own compute shader for raytracing, as you did. I ran into several issues along the way with just setting up a render texture and rendering it to the screen. I was wondering if you would consider making a tutorial for how this stuff works, and explaining a lot more in depth what you had to do to get the compute shader drawn to the screen in this video.
Thanks for considering my idea!