You might be interested to know this video was the top recommendation for me after going only three videos deep into the algorithm (this was number four) so you're definitely reaching people. Very cool work!
CPU cores are way more complex than GPU cores, so a better strategy for multithreading is probably to divide the screen into tiles and render a whole tile with one core at a time.
I think you have to do it in smaller chunks that the caches for each processor can handle and then it might have a chance of working, but it still might overload the bus from the excess traffic each process takes.
@@undeadpresident Yeah it all depends on how much memory traffic it generates. I don't mean that it could beat a GPU at all, but it would be certainly be faster than each core doing one pixel. Not only because there may be shared data in common with neighboring pixels, also because it also parallelizes rasterizing the triangles. Now that I think of it, it should do a previous step of distributing the triangles on each core to transform the vertices to screen space (and each core putting them in buckets, one for each tile, that will then be used in the next step of rasterizing).
@@khlorghaal There's a big difference in the video, though: only each individual pixel is parallelized while rasterizing is not. I'd argue that the division of work is bad regardless of whether it's a CPU or GPU. No GPU renders one pixel per shader core.
Like 1 year ago I was a python developer (now also changed to java) and tried to code a 3d graphics engine myself. I didn't get further that rendering a wireframe because I forgot to implement normal faces and lost motivation. Nice job with your engine tho! Like your editing skills btw, hope you get famous soon!
Nice work! I spent a long time making a software 3D engine in C++. I spent a whole summer (12 hours a day, 7 days a week) writing a 3DS format loader for it (it was horribly undocumented for much of it). The results were well worth it though. The Andre Lamothe books on 3D game programming (Black Art of 3D Game Programming & Tricks of the 3D Game Programming Gurus) would be very useful to you.
Oh this reminds me of my university days back when I was writing 3D engines as projects. I cheated though as I used openGL and C++, not as hardcore as doing the whole pixel pipeline myself 😅
The scariest part about this video, for me, is that I recognize the first two reference sources you show. Also I’m definitely going to check out that third one.
having shaders be an abstract class and performing a virtual method call for each pixel and vertex is definetly slowing things down a *lot*. it might be a better idea for shaders to be fed data, and loop over it/modify it internally within one method.
Pretty sure there would be overhead for all that dynamic dispatch. Making a shader write to a "frame buffer" internally would definitely help I think@@JSquare-dev
Especially now with the successful remaster of 90s shooters and new same style indie games. Performance is a consideration though. Quake 2 runs with ~60 FPS(with colored light) on my Ryzen 3700x at 1080p. But that is single threaded though for the entire engine. Not using the 12x potential performance gain.
@@techpriest4787 Since you mentioned that, I think it would be fairly interesting to see a 3D software-rendered indie title - if something of the sort ever happens. In the early days of hardware-accelerated 3D, when people just about began using OpenGL, Direct3D etc. for games, there were huge amounts of cool things you could do in a software renderer, but not so easily with a "proper" rendering API, at least not without some good compromises. Unreal's unique texture filtering, Half-Life's animated water ripples, Trespasser's bumpmaps and specular reflections. If you wanted to make that work in OpenGL 1.2 for example, you'd have to modify a texture and reupload it to the GPU, which is kind of a big deal: IO speeds can be a huge bottleneck. Nowadays, that's all very easy to do within shaders. So I do wonder if, in a modern-day context, a software renderer has any advantage (no matter how niche) other than not requiring a GPU, of course. I can imagine a game where the CPU does the rendering but the GPU does a whole lot of the logic (in compute shaders). Very silly if you ask me, a lot of time would get wasted on reading the data from the GPU.
@@Admer456 for the gamers a software renderer would also make the retro game more authentic. It would not only look like a steam locomotive but also run like one. It is good for marketing though. Well. Kinda of. If you use multi threading then performance considerations with original Quake 2 grade graphics are forgotten because we live already in the post quad core era thanks to AMD. The players would never have to reduce the resolution/culling distance perhaps. It would just be a marketing gimmick although a good one. The cheapest AMD I could find at my store for like ~70 Euros has 6 cores and 12 threads already... that is a potential ~8x speed up over single threaded. I think the programmer has more to gain. A software renderer would be cross platform naturally. I personally am very touchy about my programming languages. I went from C, to C#, to F# and finally to Rust. Now I do not want to use anything but Rust for everything I do. And that would of course exclude languages like GLSL and HLSL too form what I do. So a SW API would spare me this silly computer within a computer thing a.k.a. GPU and make the engine simpler I think. Well. Not really simpler. Because we will have still to make it fist down to our own pixel sampler. But a SW engine also would run more consistent across hardware since there is no driver that could mess things up. Especially with AMD and OpenGL. And there is no fixed function pipe line that has to be understood for Nvidia, AMD and now even for Intel GPUs... Regarding specific SW features I have no idea. I am still quite inexperienced when it comes to graphics. Yes I noticed the ripples in Half Life 1. But I only understand the Quake 3 material system. I learned that the Quake 2 remaster seem to have ditched the BSP tree and replaced it with a grid based system. Though the remaster is using Vulkan and not SW. I still have no idea what the real benefit is or rather how really. The only reason why I thought myself that a grid system is a good idea is because I had no real clue how BSP worked. So a grid system seem natural to me... :D
You might want to try writing an interpreter and a compiler? For compiling code to run into GPU maybe? This really got my attention nice job writing rasterizer
If you're really gonna go down this rabbit hole, you might want to check out how quake handled software rendering. Basically, precalculate a superset of all possibly visible faces for each possible camera position in your scene, then render one scanline at a time maintaining global and active edge tables
Do fix your multithreading, handle individual triangles on each core, instead of using multiple threads per triangle. Starting and stopping threads is especially expensive. You probably do that, and probably could run your program 3x faster by using a threadpool alone. Or, as others have suggested, let each core handle a separate part of the screen. To fix the depth issues, the first approach might cause, you could have locks on a reduced size framebuffer, maybe 1/8th x 1/8th.
I implemented my suggestion, and am now getting 21 fps on 1080p instead of 7 fps... so a bit of a gain. When inspecting the code, I found that allocation rates are ginormous, 7GB/thread/s. That could be fixed, and then it probably runs at 30 fps. PS: General hint: debug performance with a program like VisualVM 😁.
I saw your work on github and it's amazing works like a charm. I wish eclipse had a profiler but unfortunately it doesn't. I found a third party profiler but it was paid and I'm broke lol.
@@JSquare-dev Thank you :), and yes, Intellij Idea has one in their paid edition 😅 (nice for those at work, or if they have lots of money). I'm using VisualVM, because when I started Java, I was just a student and broke, too 😁, and it's free 😊.
What probably got the slowness is the shaders. From my experience polygons and textures render fine but shaders yikes that's where it gets tough. I am trying an fpga approach. I heard opengl works with graphics cards. So I might go with that
Probably a factor of 3, but you need heavy tricks after that (perspective corrected rather than correct, etc). I've done it & it definitely looks faster than his, but you can only go so far on the CPU. Even Quake took the optimization god Abrash everything he had, utilizing the U & V pipes for simultaneous execution of instructions, to make it run reasonably on top tier hardware of the time (Pentium 90 around then)
Probably a little bit faster but nothing considerable. My implementation itself is also bad like for example the rasterizer interpolates every possible attribute regardless of me using it in a shader, there's a lot of overdraw etc. I could do many things to improve it but I got lazy😅
@@NinjaRunningWildyeah, but as you point it out: it was a single core pentium 90mhz cpu for quake. i would think that proper optimisation on a multicore multithreaded multi ghz speed cpu could go much further. Just thinking about Unreal (in 98) it was able to pull pretty good tricks on a 200mhz cpu (still single core) and was written in C. so on todays CPU you should be able to go much further, of course not beating a GPU but i would guess a decent resolution and performance could be achieved (i am away from my windows PC but i think i remember that even 98’s Unreal in Software render mode can be pulled up to properly render lightning effects (coronas, shadows etc) simply because you were able to run it on a 1ghz cpu. what i am trying to say is: you CAN go pretty far on software render without being a “god” of graphics :)
@@mityaboy4639 Unreal was in C++ with some assembly here'n'there, but yeah. It also had its own lil scripting language for gameplay logic (just like Quake), procedural animated textures and animated lightmaps, something I did not see in other games of that era other than Trespasser. I'm impressed it even ran on hardware of that time, it was essentially the Crysis of 1998.
Mate ive done the same thing! I started in november 2022, i had done a graphic engine in c++ that yet werent able to handle per pixel shading but only rasterize few triangles with different colors. Performance was a problem for me too, considering my little experience with coding (i was 16 when i started). On june ive started a new engine, this time i added multithreading with cuda. Performance are still an issue but i can handle 1000+ triangle with 40fps. My latest achievements are normal maps and bloom, but bloom drag down fps from 40 to almost 20 due to interpolation when upscaling. (obviously i havnt used opengl nor vulkan )
Unfortunately, i have some bugs that have to be fixed yet, one with bloom and one with clipping (its a rare case though) im working on it 5/6 hours per day. Due to my little experience with coding, its still a mess and im sure someone could optimize it better
That's very cool and relatable with the optimization part. You add one cool feature (like bloom in your case) and it drags everything down. Also I am surprised cuda didn't help you too much I thought with parallelization I could easily get 60+ fps. I wanna give this another try sometime
Really awesome as a learning project. Multi-threading for each triangle probably is a bad idea since there aren’t that many pixels per triangle. Would be better to multithread by breaking up all triangles in the whole frame. Of course, your 8 core CPU will never beat a 2000 core GPU at doing flops.
Vertex in = new Vertex(); statement inside void flatTriangle(..) can be replaced by declaring a class variable. If you allocate like this your renderer will get slowed down.
There's a precompiled jar for you to use. It's in the releases of the github repo here's a link: github.com/Hyrdaboo/DwarfGameEngine/releases I made a small tutorial there on wiki section if you want to use this. It is only for eclipse though
I Don't know, But I just really like you game art style that PSX Retro Aesthetic. But I saw that you talk about you facing performance issues, But After sometime you optimized it (I hope so). BTW, love you work, and appreciate your hard work
Oi, you should make a ray tracer instead, chances are it's going to be faster than rasterization, provided you use good enough acceleration algorithms. Also use C++, can Imagine that Java might not be the best tool for heavy computations 😅
I've checked your source code. Aren't you being limited by Java AWT's performance? It seems that you prepare a pixel array that you push into AWT as an image. You could enable GPU acceleration for java2d. Java windows would start to be rendered in OpenGL. While your intentions are that you create a software render, which you did, I think that enabling acceleration for Java AWT will speed up things while you think on a better solution for how to render the screen in software mode.
I think awt is already using some kind of gpu acceleration in the background isn't it? Also I have checked if awt is the bottleneck before. By rendering only solid triangles without any kind of interpolation calculations I get around 200fps on 720p so it's really my fault more than java's
idk but should not there be some preaclculated matix for interpolation? I guess it maybe expensive to calc double floats or something like this. I readed that old games uses a lot of pre-calulated stuff as optimization@@JSquare-dev
@@flippery-flop I am not sure about that but because triangles constantly change in a game you have to calculate it every time you draw something. That's how actual shaders work the shader program runs for every triangle and outputs a color
Me and one friend where making the exact same idea, also have seen the same videos haha, and were currently stuck on PerspectiveCamera Movement, hopefully we can apply somthing of your project there
One think help us to improve performance was to make a arrays of floats to save some predefined values of sin and cos, so the program dont have to calculate everytime, it cost a little bit more memory but helps performance
Everything matrix math related including projection is from javidx9. Triple check if your projection matrix is correct I had that issue as well. Good luck on your engine
@@latanqueta457 My bottleneck is not really the projection side of things. Drawing solid triangles is in fact very fast (I get around 200fps on my intel i3). I messed up with the rasterizer and calculating lighting is even more expensive
Yes! I spent quite a time of today checking your code, it's really impressive but I will need to modify since I'm making something more oriented to a voxel engine
Maybe youd get a good speed up in the drawing and fillrate department by using opengl or directx etc just to render out your software transformed and lit geometry. Otherwise I know that as soon as you lose "real time" from your rendering performance you'll probably tend to lose interest too.
I could do that but the point of this project was to code everything from scratch including everything GPU handles for us. I will try making an actual GPU engine too sometime
@@faultboy No it's not.. I mean, it's less than ideal for 3D, but Java provides a lot of capabilities in its graphics libraries (AWT/Swing and Graphics2D). Then with some tricks like double buffering (one thread renders, one thread shows) and forcing JPanel updates, you can get some pretty decent performance out of it all.
This is awesome. You should look into SIMD and parallelize your most intensive calculations this way, and perhaps rewrite the project in C or C++ if you can't use intrinsics in Java, since most C/C++ compilers has intrinsic headers that you can use. A good excuse to learn C/C++ if you haven't already!
I have no idea what any of that means😅 I do have a small amount of experience with C++ but not enough to do any of the stuff you mentioned. Writing an engine in C++ is something I have already been considering I'll definitely do that at some point.
@@JSquare-devwriting it in Rust is also a good opportunity to learn why C/C++ is being abandoned in the web industry already. Microsoft is not recommending it anymore for new projects. Seems Win32 is also getting rusty. Linux that strictly used C is getting Rusty as well.
Have you thought about building a custom graphics driver that takes advantage of the GPU. Without drivers, OpenGL, Directx, and Vulkan are basically just software renderers. You could just build your own API to take advantage of the GPU. Prior to modern APIs, some older graphics hardware did not even have a z-buffer and really only took care of transformations and maybe texture mapping (like the PS1) so you don't even neccesarily need your driver to handle the entire pipeline. You would drastically increase performance just by offloading texturing and just handling transformations in software.
Your own graphics driver? That sounds complete lunacy to me. You'd have to actually handle the IO for a GPU at that point which is an INSANE scope to take on (like actual full R&D team at Nvidia levels of insanity). I could totally see something like this working on the GPU fairly easily with something like Cuda, but without an api into the GPU there's no way. Unless you perhaps have some novel makeshift GPU lying around, but even then it's a big "Why?" as it wouldn't scale.
@@ThefamousMrcroissant I think anyone who is building a software renderer would have some interest in it. Its probably one of the hardest things you could do next to maybe writing a kernel from scratch. I suggested it mainly because its hard ( but not impossible).
@@NinjaRunningWild Why did you feel the need to insult me without even addressing my post? I go by n00b but I have written my own compiler and have even written a simple OS. I know how incredibly difficult driver/API development is. I was obviously not suggesting this as a business venture but rather a learning experience.
@@n00bc0de7 I've written a fair share of drivers. It's not that drivers are complicated per say, it's that GPUs are. Like I said it completely depends on the context, but if you'd want to make something for a modern architecture with thousands of SUs and pipelines just the mere prospect of IO would be an insane task. Even doing something fairly basic in cuda requires a good amount of knowledge of your GPUs specifications (such as warps, memory size and cache sizes), imagine an actual driver for these monsters.
You might be interested to know this video was the top recommendation for me after going only three videos deep into the algorithm (this was number four) so you're definitely reaching people. Very cool work!
Good to know. Thanks for the feedback!
It was #1 in my feed! :)
@@Itschotsch same here
@@JSquare-devfor me it was on the recommended page as the first result. Great video btw. Keep it up
Pretty cool!
Omg it's the man himself. Love your vids
CPU cores are way more complex than GPU cores, so a better strategy for multithreading is probably to divide the screen into tiles and render a whole tile with one core at a time.
I think you have to do it in smaller chunks that the caches for each processor can handle and then it might have a chance of working, but it still might overload the bus from the excess traffic each process takes.
@@undeadpresident Yeah it all depends on how much memory traffic it generates. I don't mean that it could beat a GPU at all, but it would be certainly be faster than each core doing one pixel. Not only because there may be shared data in common with neighboring pixels, also because it also parallelizes rasterizing the triangles. Now that I think of it, it should do a previous step of distributing the triangles on each core to transform the vertices to screen space (and each core putting them in buckets, one for each tile, that will then be used in the next step of rasterizing).
not really, the only big difference is OOOE, GPUs nowadays are superscalar and have cache all the same
@@khlorghaal There's a big difference in the video, though: only each individual pixel is parallelized while rasterizing is not. I'd argue that the division of work is bad regardless of whether it's a CPU or GPU. No GPU renders one pixel per shader core.
@@DiThi i thought this is why rtx cores can't really scale up to 4k yet. the 1 core per pixel idea is not that far out.
Like 1 year ago I was a python developer (now also changed to java) and tried to code a 3d graphics engine myself. I didn't get further that rendering a wireframe because I forgot to implement normal faces and lost motivation. Nice job with your engine tho! Like your editing skills btw, hope you get famous soon!
Same actually. At first it was only a wireframe engine but then I came back to it. That's why it took 1 year I wasn't constantly working on it.
Nice work! I spent a long time making a software 3D engine in C++. I spent a whole summer (12 hours a day, 7 days a week) writing a 3DS format loader for it (it was horribly undocumented for much of it). The results were well worth it though.
The Andre Lamothe books on 3D game programming (Black Art of 3D Game Programming & Tricks of the 3D Game Programming Gurus) would be very useful to you.
That's very cool. Would you mind leaving a link or something I wanna take a look at your engine
Oh this reminds me of my university days back when I was writing 3D engines as projects. I cheated though as I used openGL and C++, not as hardcore as doing the whole pixel pipeline myself 😅
@@JSquare-dev It’s not released.
thanks for the book suggestions.
@@NinjaRunningWildmaybe you should release it then.
java with 3d? this is a madness respect!
I know haha.
The scariest part about this video, for me, is that I recognize the first two reference sources you show.
Also I’m definitely going to check out that third one.
i hope you get famous soon, love your editing!
Thanks bro appreciate it
having shaders be an abstract class and performing a virtual method call for each pixel and vertex is definetly slowing things down a *lot*. it might be a better idea for shaders to be fed data, and loop over it/modify it internally within one method.
Each shader has its own loop to write to screen you mean? I thought having one loop and then calling appropriate methods would be faster.
Pretty sure there would be overhead for all that dynamic dispatch. Making a shader write to a "frame buffer" internally would definitely help I think@@JSquare-dev
Or use C instead of java
1:54 adding some noise to the image will help reduce the color banding
Thanks for the tip I will try that next time
Nice video! Software rendering will never not be cool
Especially now with the successful remaster of 90s shooters and new same style indie games. Performance is a consideration though. Quake 2 runs with ~60 FPS(with colored light) on my Ryzen 3700x at 1080p. But that is single threaded though for the entire engine. Not using the 12x potential performance gain.
@@techpriest4787 Since you mentioned that, I think it would be fairly interesting to see a 3D software-rendered indie title - if something of the sort ever happens.
In the early days of hardware-accelerated 3D, when people just about began using OpenGL, Direct3D etc. for games, there were huge amounts of cool things you could do in a software renderer, but not so easily with a "proper" rendering API, at least not without some good compromises. Unreal's unique texture filtering, Half-Life's animated water ripples, Trespasser's bumpmaps and specular reflections. If you wanted to make that work in OpenGL 1.2 for example, you'd have to modify a texture and reupload it to the GPU, which is kind of a big deal: IO speeds can be a huge bottleneck. Nowadays, that's all very easy to do within shaders.
So I do wonder if, in a modern-day context, a software renderer has any advantage (no matter how niche) other than not requiring a GPU, of course. I can imagine a game where the CPU does the rendering but the GPU does a whole lot of the logic (in compute shaders). Very silly if you ask me, a lot of time would get wasted on reading the data from the GPU.
@@Admer456 for the gamers a software renderer would also make the retro game more authentic. It would not only look like a steam locomotive but also run like one. It is good for marketing though.
Well. Kinda of. If you use multi threading then performance considerations with original Quake 2 grade graphics are forgotten because we live already in the post quad core era thanks to AMD. The players would never have to reduce the resolution/culling distance perhaps. It would just be a marketing gimmick although a good one.
The cheapest AMD I could find at my store for like ~70 Euros has 6 cores and 12 threads already... that is a potential ~8x speed up over single threaded.
I think the programmer has more to gain. A software renderer would be cross platform naturally. I personally am very touchy about my programming languages. I went from C, to C#, to F# and finally to Rust. Now I do not want to use anything but Rust for everything I do. And that would of course exclude languages like GLSL and HLSL too form what I do. So a SW API would spare me this silly computer within a computer thing a.k.a. GPU and make the engine simpler I think.
Well. Not really simpler. Because we will have still to make it fist down to our own pixel sampler. But a SW engine also would run more consistent across hardware since there is no driver that could mess things up. Especially with AMD and OpenGL.
And there is no fixed function pipe line that has to be understood for Nvidia, AMD and now even for Intel GPUs...
Regarding specific SW features I have no idea. I am still quite inexperienced when it comes to graphics. Yes I noticed the ripples in Half Life 1. But I only understand the Quake 3 material system.
I learned that the Quake 2 remaster seem to have ditched the BSP tree and replaced it with a grid based system. Though the remaster is using Vulkan and not SW. I still have no idea what the real benefit is or rather how really. The only reason why I thought myself that a grid system is a good idea is because I had no real clue how BSP worked. So a grid system seem natural to me... :D
You might want to try writing an interpreter and a compiler? For compiling code to run into GPU maybe? This really got my attention nice job writing rasterizer
I am definitely planning to do that sometime
this is the first video i see on my recommended. then i realized you are criminally underrated. good work!
Thanks you :)
If you're really gonna go down this rabbit hole, you might want to check out how quake handled software rendering. Basically, precalculate a superset of all possibly visible faces for each possible camera position in your scene, then render one scanline at a time maintaining global and active edge tables
Next step. We're gonna program the gpu in Assembly
Nice video bro!
Glad you liked it
Damn. Great work. Probably took a toll on your mental space, but really good work. Good luck for whatever is next.
That's why it took a year lol.
@@JSquare-dev no doubt. I can totally see why. But you did it in the end 💪🏻
amazing channel. love your work
i am confusion didnt we draw polygons and such with no gpu back in the snes/sega genesis days and before? Why would this be so slow?
this is looking really good!
some help from the gpu would help a lot tho:)
Now try to do this on C and PSX :>
Dang, without OpenGL, I need to step it up, I thought I was low level
Start by mining silicon next time lol
@@JSquare-dev on it, I have to top this.
Do fix your multithreading, handle individual triangles on each core, instead of using multiple threads per triangle. Starting and stopping threads is especially expensive. You probably do that, and probably could run your program 3x faster by using a threadpool alone.
Or, as others have suggested, let each core handle a separate part of the screen.
To fix the depth issues, the first approach might cause, you could have locks on a reduced size framebuffer, maybe 1/8th x 1/8th.
I implemented my suggestion, and am now getting 21 fps on 1080p instead of 7 fps... so a bit of a gain.
When inspecting the code, I found that allocation rates are ginormous, 7GB/thread/s.
That could be fixed, and then it probably runs at 30 fps.
PS: General hint: debug performance with a program like VisualVM 😁.
I saw your work on github and it's amazing works like a charm.
I wish eclipse had a profiler but unfortunately it doesn't. I found a third party profiler but it was paid and I'm broke lol.
@@JSquare-dev Thank you :), and yes, Intellij Idea has one in their paid edition 😅 (nice for those at work, or if they have lots of money). I'm using VisualVM, because when I started Java, I was just a student and broke, too 😁, and it's free 😊.
@@AntonioNoack I'll look into that then thanks
love this type of video. keep up the great work.
That's a perfect effort. A CPU Renderer is my general coding practice.
What probably got the slowness is the shaders. From my experience polygons and textures render fine but shaders yikes that's where it gets tough. I am trying an fpga approach. I heard opengl works with graphics cards. So I might go with that
Nice video! Keep it up!
Thanks, will do!
awesome video!!
Glad you liked it
You got me as sub. Very cool to attempt this.
I wonder how much speed difference you'd get if you were to port this to C++ or C.
Probably a factor of 3, but you need heavy tricks after that (perspective corrected rather than correct, etc). I've done it & it definitely looks faster than his, but you can only go so far on the CPU. Even Quake took the optimization god Abrash everything he had, utilizing the U & V pipes for simultaneous execution of instructions, to make it run reasonably on top tier hardware of the time (Pentium 90 around then)
Probably a little bit faster but nothing considerable. My implementation itself is also bad like for example the rasterizer interpolates every possible attribute regardless of me using it in a shader, there's a lot of overdraw etc. I could do many things to improve it but I got lazy😅
@@NinjaRunningWildyeah, but as you point it out: it was a single core pentium 90mhz cpu for quake. i would think that proper optimisation on a multicore multithreaded multi ghz speed cpu could go much further.
Just thinking about Unreal (in 98) it was able to pull pretty good tricks on a 200mhz cpu (still single core) and was written in C.
so on todays CPU you should be able to go much further, of course not beating a GPU but i would guess a decent resolution and performance could be achieved
(i am away from my windows PC but i think i remember that even 98’s Unreal in Software render mode can be pulled up to properly render lightning effects (coronas, shadows etc) simply because you were able to run it on a 1ghz cpu.
what i am trying to say is: you CAN go pretty far on software render without being a “god” of graphics :)
@@mityaboy4639 Unreal was in C++ with some assembly here'n'there, but yeah. It also had its own lil scripting language for gameplay logic (just like Quake), procedural animated textures and animated lightmaps, something I did not see in other games of that era other than Trespasser. I'm impressed it even ran on hardware of that time, it was essentially the Crysis of 1998.
saw this on my recommended, amazing content
Mate ive done the same thing!
I started in november 2022, i had done a graphic engine in c++ that yet werent able to handle per pixel shading but only rasterize few triangles with different colors.
Performance was a problem for me too, considering my little experience with coding (i was 16 when i started).
On june ive started a new engine, this time i added multithreading with cuda. Performance are still an issue but i can handle 1000+ triangle with 40fps.
My latest achievements are normal maps and bloom, but bloom drag down fps from 40 to almost 20 due to interpolation when upscaling.
(obviously i havnt used opengl nor vulkan )
Unfortunately, i have some bugs that have to be fixed yet, one with bloom and one with clipping (its a rare case though) im working on it 5/6 hours per day.
Due to my little experience with coding, its still a mess and im sure someone could optimize it better
That's very cool and relatable with the optimization part. You add one cool feature (like bloom in your case) and it drags everything down. Also I am surprised cuda didn't help you too much I thought with parallelization I could easily get 60+ fps. I wanna give this another try sometime
Really awesome as a learning project. Multi-threading for each triangle probably is a bad idea since there aren’t that many pixels per triangle. Would be better to multithread by breaking up all triangles in the whole frame.
Of course, your 8 core CPU will never beat a 2000 core GPU at doing flops.
Awesome, good job.
This was my #1 recommendation - how do you only have 86 subs
Respect!! Well done!!
Vertex in = new Vertex(); statement inside void flatTriangle(..) can be replaced by declaring a class variable. If you allocate like this your renderer will get slowed down.
BRUh when i looked at the subs my jaw dropped i was like no way bro has less than 1mill
Nice Video. But please can you add a SPACE between "engine" and "(Without" in your Title?
Damn you’re where i want to be xd
Does this engine use right handed coordinate system?
HOW DO YOU ONLY HAVE 36 SUBS, you deserve at least 100k.
There's already Bisquit.
Thanks bro
could you upload a compiled version of the program please?
There's a precompiled jar for you to use. It's in the releases of the github repo here's a link: github.com/Hyrdaboo/DwarfGameEngine/releases
I made a small tutorial there on wiki section if you want to use this. It is only for eclipse though
I Don't know, But I just really like you game art style that PSX Retro Aesthetic. But I saw that you talk about you facing performance issues, But After sometime you optimized it (I hope so). BTW, love you work, and appreciate your hard work
great vid!
Oi, you should make a ray tracer instead, chances are it's going to be faster than rasterization, provided you use good enough acceleration algorithms.
Also use C++, can Imagine that Java might not be the best tool for heavy computations 😅
Wouldn't it be better to write this in C++ or something?
2:11 Shaders are also software :) Cool video btw!
Yeah not actual shaders but the same principle. Glad you liked the video
Cool video! And remember 90 graphics calculated even slower not even 1 fps and that all on cpu
Well 90s didn't have as powerful cpus as we have today so it's not an excuse for me)
I've checked your source code. Aren't you being limited by Java AWT's performance? It seems that you prepare a pixel array that you push into AWT as an image.
You could enable GPU acceleration for java2d. Java windows would start to be rendered in OpenGL. While your intentions are that you create a software render, which you did, I think that enabling acceleration for Java AWT will speed up things while you think on a better solution for how to render the screen in software mode.
I think awt is already using some kind of gpu acceleration in the background isn't it? Also I have checked if awt is the bottleneck before. By rendering only solid triangles without any kind of interpolation calculations I get around 200fps on 720p so it's really my fault more than java's
idk but should not there be some preaclculated matix for interpolation? I guess it maybe expensive to calc double floats or something like this. I readed that old games uses a lot of pre-calulated stuff as optimization@@JSquare-dev
@@flippery-flop I am not sure about that but because triangles constantly change in a game you have to calculate it every time you draw something. That's how actual shaders work the shader program runs for every triangle and outputs a color
I think if you convet it to WASM it may have more performance. not to rewrite code in wasm just a compiler that compiles java to it;
I don't know what WASM is but I think it wouldn't help a great deal. My implementation itself is flawed java isn't that slow
I doubt compiling to WASM would offer better performance than the native JVM.
Me and one friend where making the exact same idea, also have seen the same videos haha, and were currently stuck on PerspectiveCamera Movement, hopefully we can apply somthing of your project there
One think help us to improve performance was to make a arrays of floats to save some predefined values of sin and cos, so the program dont have to calculate everytime, it cost a little bit more memory but helps performance
Everything matrix math related including projection is from javidx9. Triple check if your projection matrix is correct I had that issue as well. Good luck on your engine
@@latanqueta457 My bottleneck is not really the projection side of things. Drawing solid triangles is in fact very fast (I get around 200fps on my intel i3). I messed up with the rasterizer and calculating lighting is even more expensive
Yes! I spent quite a time of today checking your code, it's really impressive but I will need to modify since I'm making something more oriented to a voxel engine
Have you had the problem of getting full straight to the objects and trass passing them, and then get the same object but everything flipped? 🥺
Can it run doom?😅
That’s really cool! I want to build one myself someday. What resources did you use?
I mentioned it in the video. It was a series from javidx9(just search javidx9 3D engine) and tomatochilinoodle 3D fundamentals playlist
Nice video. I tryed doing this once, but eventually i jumped over LWJGL, but your project looks nice
Very impressive
Thanks mate
Nice!!
Good to you see you here
In one year??? My render engine with opengl took me like half a year
Yeah I know it took a lot longer than it should have.
Seen this color box first generated in window from other video long way go
try it on a cpu from the 90s too, then the fun begins :D
Brilliant !!! +1sub
Maybe youd get a good speed up in the drawing and fillrate department by using opengl or directx etc just to render out your software transformed and lit geometry. Otherwise I know that as soon as you lose "real time" from your rendering performance you'll probably tend to lose interest too.
"just use opengl bro"
I believe this can turn out to be a great project for teaching how graphics pipeline works without a hassle of actually programming for GPU.
Or read Andre Lamothe books.
Yes definitely. I find it a lot easier to do things on cpu than dealing with all the gpu things. Though I will give that one a go too sometime
why cant you just use the GPU instead? i thought you would program that somehow
I could do that but the point of this project was to code everything from scratch including everything GPU handles for us. I will try making an actual GPU engine too sometime
Without OpenGL!?? That's insane!
No with Java is batshit insane, it is the worst choice of all for this.
@@faultboy ofc it's a bad choice, but do you think that it would be better with C++? lol
@@faultboy No it's not.. I mean, it's less than ideal for 3D, but Java provides a lot of capabilities in its graphics libraries (AWT/Swing and Graphics2D). Then with some tricks like double buffering (one thread renders, one thread shows) and forcing JPanel updates, you can get some pretty decent performance out of it all.
This is awesome. You should look into SIMD and parallelize your most intensive calculations this way, and perhaps rewrite the project in C or C++ if you can't use intrinsics in Java, since most C/C++ compilers has intrinsic headers that you can use. A good excuse to learn C/C++ if you haven't already!
I have no idea what any of that means😅 I do have a small amount of experience with C++ but not enough to do any of the stuff you mentioned. Writing an engine in C++ is something I have already been considering I'll definitely do that at some point.
SIMD in java?
@@JSquare-devwriting it in Rust is also a good opportunity to learn why C/C++ is being abandoned in the web industry already. Microsoft is not recommending it anymore for new projects. Seems Win32 is also getting rusty. Linux that strictly used C is getting Rusty as well.
really nice, but have you considered using another language?
Hmmm.. So? You used DirectX instead?
respect
Вау, это было полезное видео для меня как начинающего программиста
Рад что мой проект чем нибудь полезен
Here before it blows up
Making a graphics engine one the CPU and also in Java, im surprised you even get a single FPS.
That's the neat part I don't. If I run it at even 720p I hardly get 1fps💀resolution has to be a lot less
Nice 👍
why java?
Have you thought about building a custom graphics driver that takes advantage of the GPU. Without drivers, OpenGL, Directx, and Vulkan are basically just software renderers. You could just build your own API to take advantage of the GPU.
Prior to modern APIs, some older graphics hardware did not even have a z-buffer and really only took care of transformations and maybe texture mapping (like the PS1) so you don't even neccesarily need your driver to handle the entire pipeline. You would drastically increase performance just by offloading texturing and just handling transformations in software.
Your own graphics driver? That sounds complete lunacy to me. You'd have to actually handle the IO for a GPU at that point which is an INSANE scope to take on (like actual full R&D team at Nvidia levels of insanity).
I could totally see something like this working on the GPU fairly easily with something like Cuda, but without an api into the GPU there's no way. Unless you perhaps have some novel makeshift GPU lying around, but even then it's a big "Why?" as it wouldn't scale.
Name checks out...
@@ThefamousMrcroissant I think anyone who is building a software renderer would have some interest in it. Its probably one of the hardest things you could do next to maybe writing a kernel from scratch.
I suggested it mainly because its hard ( but not impossible).
@@NinjaRunningWild Why did you feel the need to insult me without even addressing my post?
I go by n00b but I have written my own compiler and have even written a simple OS. I know how incredibly difficult driver/API development is. I was obviously not suggesting this as a business venture but rather a learning experience.
@@n00bc0de7 I've written a fair share of drivers. It's not that drivers are complicated per say, it's that GPUs are.
Like I said it completely depends on the context, but if you'd want to make something for a modern architecture with thousands of SUs and pipelines just the mere prospect of IO would be an insane task. Even doing something fairly basic in cuda requires a good amount of knowledge of your GPUs specifications (such as warps, memory size and cache sizes), imagine an actual driver for these monsters.
bro using eclipse
bro chose Java 😶🌫, haha nice work!!
If you port this to C++ it can run much faster. The JRE is not suitable for this kind of work.
Here b4 500k
Fucking Legend!!!
Without OpenGL? For what?!
psychopathic behaviour
just watched you and saw you get two more subs hope i can help you
also we are TIED
Appreciate your support thanks
You wasted a year, you have no reason not to use an engine that 20k+ programmers have developed.
lol, I'm making this too
your renderer sounds like a gpu emulator tbh.
greet meeeee
Lesson learned: use opengl
Why are you making a software renderer in 2023 tho lol. It will be vastly slower than a GPU renderer.
делать нечево чтоли
I stopped watching after the java logo came up... its an industrial standard completely based on perfect marketing and a stupid userbase...
Understandable I am not a fan of java either. That's why I am switching to C++ next time I make something like this
Lol why are you making this crap renderer?