I thought I'd make this pinned comment to answer some common questions. 1. Why is the end so short? For some reason, while making the video, I got stuck in the mindset of compressing as much value as possible in as little time as possible. But I didn't consider the fact that some people actually find value in seeing the result of what was made. Who would've guessed? Not me apparently. I'll definitely make the showcases longer in future videos. 2. How do I make it 3D? Here's a quick guide: Start with the code shown in the video. Begin with replacing all mentions of Vec2 with Vec3. Make Quad into Cube a) 'into_quadrant' to 'into_octant' by replacing (i >> 1)' on line y with ((i >> 1) & 1) and adding another line for z; 'self.center.z += (0.5 - (i >> 2) as f32) * self.size;' b) 'into_quadrants' to 'into_octants' by returning 8 Cubes corresponding to the 0..=7 possible inputs to 'into_octant' c) 'quadrant' to 'octant' by adding 'let z = (pos.z < self.center.z) as usize;' and changing the last line to 'return z
The 1. is definitely the curse of performance engineering. You try to optimize your code, and then find yourself optimizing everything in your life. The 2. can be another video that you don't have to code and label it as just a guide, not tutorial or showcase. I didn't request it so don't worry about making it. I'm curious to see your best algorithm when you finally upload it to Github.
@@evespirit I'm too stupid to define what that is or if I quality and yet I built a 3D engine out of autism, can we begin deconstructing the universe soon then?
I like how you're not afraid to mention terms without explaining them, for example, cache locality. kinda know what it means, just enough to know that explaining it would probably double the length of the video, but i like the fact that you mentioned it anyway, for the people interested enough to learn about those terms
I know the description says you'd want to clean up the code first before uploading but I feel like many would great appreciate even the "messy" version to play around with / learn from it. Appreciated the video and intrigued in seeing more of it
Wow, this is really cool! I kinda wish you had showed the gravity simulation for longer at the end, I wanted to see everything form together nicely. Also, please don’t stop making these videos! I had already watched your videos before, which explains why I was shown this one so fast, and even 10 months later they were very memorable to me, especially the drawing circles with one triangle video. Your videos are in a wonderful middle ground where they are easy to follow and possible to implement on my own, but still inspiring me to take the steps to deeply understand my problems to find better solutions. The pacing of your videos is also very good, if they were much longer or shorter they would be considerably harder to follow. So, yeah. You are a positive influence on me, and I doubt I am alone. Keep being awesome! 💕
Thank you so much! This is exactly why I make these videos-to inspire others to create. I've always had a passion for coding and a dream of sharing that passion with the world. Your comment makes me feel like all those sleepless nights were worth it. Thank you again, and I'll definitely try making the showcases longer in future videos. 💕
I did a half hour version with an upset at 35mins or thereabouts and a 6 and a half hour with a brief interruption at the hour mark while i change the power profile of my monitor. 4k
dude this video was so cool! Would love to see more of just the simulations doing cool things, maybe with colours added randomly, or based on velocity, etc
This is amazing! I've just discovered your channel with this vid and I'm so glad I have. I wanna make cybersec simulations (kinda like Kaspersky's map) and I'm also learning/using Rust for all of my personal projects, and this video has inspired me so much and given me the confidence to tackle doing the simulations in Rust, even if it takes me much longer than it would in any other lang/program lol Seeing the performance this has is unreal
Thank you! I switched from C++ to Rust about 3 years ago and I’ve (almost) never looked back. I wouldn’t say Rust is perfect, no language is, but so far I haven’t found something to replace it. I say go for it, I believe in you!
Great Video! I really enjoyed the idea of thinking of quadtrees as pretending that you could combine every other point as one. Really great hook and also really well made in generell :)
For accuracy, do you also plan to add energy conservation? Depending on the time step the accelerations and velocities build up errors which can be tempered by conserving energy (potentials and kinetics). Also, in the past I have attempted to use a dynamic time step, where objects that are very close get a small time step and are updated more frequently to capture more fine motion, such as moons and sub orbits. Balancing the time step for the earth and moon vs Pluto for example especially since your distant Quad nodes don’t really change there center of mass or energy, but close objects really change significantly.
Interestingly, energy conservation (via a symplectic integrator) and variable time step don't play nice together, and you need to do some extra work to make it behave correctly.
this is amazing, the simulation resembles the milky way and surrounding clusters when the particles spread out, also worth noting Space Engine uses oct-trees to plot their objects
5:25 quadtrees (and their 3d cousin octrees) are oddly satisfying as a space partitioning solution, and I get a little excited whenever the opportunity to use them comes up. I'll definitely have to play with this sometime soon. perhaps see how it fares in 3d.
The performance here appears quite amazing! I have not been able to get past ~20k bodies in my own solution made in Unity’s DOTS.. With Barnes Hut.. Is it okay if I try to apply some of the teachings here to my own project? I presume you do not even make use of SIMD?
@@DeadlockCode Thank you! If it is okay to ask, which parts of the optimization process gave you the greatest speedups? I have spent time on things like SIMD and optimizing the gravity calculation before implementing an array/list-based tree. My approach made use of a hashmap, which is akin to a Dictionary. I do realize now it may not have been all that smart of me to calculate the COM of the nodes every time a body is inserted. I am surprised, this kind of scale is something i thought you had to go to GPU for and similar videos to yours do exactly that.
As you can see in the improvements section, I got a 3x speedup simply by sorting the array of bodies along a z order curve. Other than that I would definitely look into replacing the hashmap with a vector if possible since hashmap lookup requires a lot of, well, hashing. Other than that I would profile and see what lines of code or functions are taking the most time and see if I could either reduce their usage or perhaps find another way to accomplish the same result. (I accidentally deleted the previous comment so if you got two notifications, that's why)
@@DeadlockCode Makes sense! I have done extensive profiling. I know exactly which areas are the most problematic (the computation of forces) but your solutions are not something I would have thought would work so great. I sort of assumed sorting would only add computational time. That was an incorrect assumption heh. I will slowly try to improve bits and pieces following this video and see how it goes!
I really enjoyed your video and totally get what you mean at the end about the video taking a lot when you try to put effort into it. Keep it up because you are doing good work!
Barns and Hut initially used to report the final dynamics calculation times, without including their so-called theorem construction phase, which amounts to the actual multiple expansion part of the algorithm. This caused a lot of buzz, but at the time, it was possible to write faster code using almost any other method, as long as the code was reasonably optimized. It’s pleasant to hear that multiple expansions are now better, since it was obviously an interesting idea from the start.
Nice to see it's performs in real time. I few years ago I wrote a pretty straight forward (didn't put much effort into optimization) 3D n-body simulator using Barnes-Hut in Go. I got about 1 sec per frame with 1 million bodies, which includes software 3d rendering and basic lighting (8th gen i5 laptop with 32gb ram). For simplicity, I wrote each frame to a png then made a video afterward with ffmpeg. The slowest part was building the octtree every frame. I had extremely simple accretion too, but mostly ran with it off since it didn't add much to what I messing with at the time.
Would it be possible for you to describe your improved algorithm? I had a quick look at the source, but it would be fab to hear you explain it and the motivation behind it.
Nice stuff :) If you want, there is a O(N) method for approximating the many-body problem called Fast Multipole Method (FMM) by Leslie Greengard. It's also based on a quad-tree decomposition but with added mathematical tricks to keep the O(N) complexity. I never seen it used, I'm wondering if it's just that the log(N) acceleration is not that interesting if the FMM has a bigger constant.
Holy shit dude. This is stunning. If your roadblock is editing the video, it is really worthwhile to outsource it. Forcing yourself to do things that aren't really clicking will demotivate you, and it just isn't worth it. Your brain is so beautiful, I want to see what you are capable of when you are unblocked by the tasks that don't give you life, and you're free to spend your time doing ever you love; exploring and building. Please consider unburdening yourself from the activities that don't inspire you, and that bring you to the edge of the pit of despair. Life's too short to willingly endure a grind if you could otherwise sidestep it. Your work is too important. It would be like Henry Ford spending his time forcing himself to write a company newsletter when it doesn't inspire him and it drags him down. Optimize your workflow bro. I hope you hear from you sooner than 10 months! 🥰🫶💯
This is one of the highest quality videos I’ve seen on YT. Well done! As a rust developer I love your explanations on cache locality and optimisations. Have you thought about submitting this video to the Rust Weekly newsletter? I’m sure it’d be interesting to other rust devs as well as myself.
Really great work! I'm a huge fan of n-body simulation and there sadly isn't much hobbyist stuff out there. Have implemented Barnes-Hut a few times, it's pretty cool. BTW s/d is technically tan(theta) not theta :) Also, instead of using another library for the collisions, you should be able to use your quad-/oct-tree for that.
Yes, you can definitely motivate the criterion by using an opening angle and then baking parts into the constant. I tried to find proof that this was the original motivation from the Barnes-Hut paper (or it’s predecessors) but I couldn’t find anything and I didn’t want to claim that it was how they derived it if it wasn’t. Do you have any source for opening angle being the origin of the criterion (other than theta being a common variable for angles)? And yeah, I could’ve definitely used the quadtree for collision detection and that could be really fun implementing, but I just wanted to keep it simple because it wasn’t the main focus of the video. 👍
I knew where this was going the second I saw the impl. "Well those calculations don't need to happen sequentially" and yep, apparently they don't need to happen on the CPU either
Wow this takes me back to my grad school days when I used this to simulate discrete vortices - similar 1/x^2 interaction between particles. I used tecplot for animation. This is about 2 decades ago. Not sure if tecplot still exists but I used c and openmp back then.
Great video, especially the comprehensive visuals! I've written a library focused around acceleration calculations for N-body simulations called particular, maybe you stumbled upon it whilst doing your research. I'm very interested in seeing your code to explore how I could optimise the way my Barnes-Hut implementation is done, because even though it looks quite similar on a surface level (stack allocated nodes), your performance numbers seem quite incredible. I'm also curious what kind of hardware you are running the benchmarks on, and what value for the theta parameter was used for the benchmarks and the final simulation.
Great video, but I'm especially grateful for the repo. It will definitely come in handy for me as a learning tool. I want to do an efficient plasma simulation for example.
been having so much fun messing around with parameters on this :-) I was wondering though - coudl you make the space like in the arcade game 'Asteroids' - so whatever goes out of the screen comes in on the other side? i guess easy to do with the position - but for the gravity force we would need to imagine each body also repeated 'off the edges' of the screen
So question ( outside of scope) : is there a way you could continue to conglomerate masses with known distances to form a field equation approximation. I'm thinking one could avoid going to the leafs for a layer of propagation since it's possible it's close enough to a super position approximate? Similar idea to finite element analysis where you use points based on a resolution but because a region is homogeneous or a known pattern one could set the discrete estimate of the center of mass and an error term to provide a good enough approximation. The real point being that calculation-wise some of these terms are going to have a minute effect on a body that's far away and to the point of the approximation shown in the video it can be seen as a single point if you're far enough away or the additional points effects are negligible enough. Mathematically it'd be saying 1 + epsilon is approximately 1.
This simulation looks so good it’s uncanny… because there’s no dark matter to (somehow???) make the outer particles move at a similar speed to the inner particles. Well done! 👏🏼
Could this be used to model atmospheric particles? I am building a machine learning model to predict the impacts of solar weather on upper atmosphere electron temperature and density, I wonder if I could model the electrons as "particle clouds" that interact with other "particle clouds" represented by a single particle.
Very impressive! My first thought upon seeing the video title was to use a flow field. I'm curious how that would compare. It would be O(N) but with a really bad constant depending on the resolution of the flow field, since every body has to update the flow field in a radius around itself. There would be some problems though. You could have a small "influence radius" for smaller objects, but in that case clumps of small objects can't create larger gravitational fields to affect objects far away. The Barnes-Hut method seems superior for that reason.
I would guess combining the approaches might be optimal. Rather than operating on particles, you could operate on cells as much as possible, and form a quadratic flow field within cells based on distant cells. Basically, instead of all the masses in cell A just approximating cells B, C, D, all the way up to X as point masses, cell A builds an approximation of the fields of all these cells for itself, and all the masses in A refer to that instead. The distance such approximations are valid would depend only on the size of the cell, so very predictable. Child nodes would know whether a distant node already contributes to it's parent's field or not, so evaluating which should contribute to it's modification to the parent field is possible.
There's a related algorithm (which I'm not an expert on) called the FMM (Fast Multipole Method) that 'allegedly' achieves O(N) time complexity but in my short research of the algorithm I'd like to argue that it would still be O(N log N) unless you accept that the accuracy would decrease as N increases. If someone is more knowledgeable on this topic, please prove me wrong, I would love to love the FMM. I'm not sure if the same is applicable to a flow field but the first thing I'd be hesitant about is the limited scale of the simulation since we don't have infinite memory to store all of the possibly billions of cells necessary to accommodate the one body that decided to fly away from the main cluster or you would have to make each cell larger which would decrease accuracy. Also you would have to increase the resolution of the grid as you increase the number of bodies unless you accept that the accuracy would decrease, so I'm not even sure if it would really be O(N) time complexity or if it's just a cleaver trick that makes it look like it's O(N) time complexity. You could probably use a quadtree and instead of accelerating each body towards the quadtree you could try self interacting between the quadtree's nodes and then propagate accelerations down towards the individual bodies (I think this is what the FMM does?). You'd still somehow need to construct the quadtree but I've heard some people say that this is possible in O(N) if you use a limited accuracy for body positions but this would again run into the same problem with far away bodies. I think O(N log N) is the best you can do if you don't make further assumptions about your data and or accept worse accuracy. Remember, I'm no expert and I don't claim that everything in this comment is true - I'm just brainstorming and I am most likely mistaken about something.
Great video. When the code is ready, you should post a link to the video to This Week In Rust. Also, someone should try to port it to WASM. If I was adding SIMD, I use the nightly standard SIMD.
Hello, your videos are the best. Thank you for your effort. Will you please share the file of high resolution image of the purple star like animation? Seems so cool.
Would be interesting to see for what value of n the brute force n^2 algorithm becomes worse than the approximation. Maybe ~1000? Presumably the brute force algorithm is easier to optimize and run on the GPU.
If you don't need synchronization between threads, and you don't mind rewriting the compute kernels in c, OpenCL is really nice for GPU, the ocl and simple_ocl rust crates make using ocl in a rust project relatively easy.
Amazing video! Although a physicist, I have never done these type of simulations. However, I also try to implement tensor-based trees in my strategic game search tree algorithm (alphazero). Can you point me to some medium-sophisticated literature on vectorized implementation of trees on tensors? I could not find any. Thank you.
Very nice video, though the "Rusty" way to store the children would be to use an Option. Granted NonZeroUsize is pretty fiddly to work with sometimes so just using a usize is simpler though you have to "manually" remember that 0 means None instead of the compiler reminding you.
I agree, and for completeness, I'd also like to add that Option has the same size as a usize, which might sound counter-intuitive without the knowledge of Rust's null pointer optimization.
my main problem with this method is that you store the size and position of each node individually when you can already know how small it is, and you can make the coordinate system INCREDIBLY efficient by using the fact it's a base 2 coordinate system so you can have 32-bit numbers represent a quadtree that's 32 levels deep, where each bit for x/y tells you the exact path to reach the bottom node, going from msb to lsb and scaling down the tree not just index by index with a simple comparison, but in some cases, in a single operation worst case scenario, where every available cell has a single particle, you increase memory usage by 25% than if you just had them all in a vector also means that the position of the node can be stored in a single int64 for both x and y and have all the coord math be done in a single operation for both
Yes, there are definitely improvements on the table when it comes to memory footprint. Although, compressing the sizes and positions requires more compute in each iteration of the acceleration function to 'decompress' the data. I actually tried going the other way by caching more data, storing the size squared to avoid the multiplication in the acceleration function, which actually improved performance. So it's not just "less memory = faster" but would instead require further testing. Although I haven't tested going the entire way of optimizing the usage of every bit so I can't say it wouldn't be fast.
@@DeadlockCode i didn't mean that less memory usage would be faster, but mainly that you can make the coordinate system like that so you don't have to walk down the nodes to find points, you basically get a step order (suite of indexes leading down to a point) double as its coordinate so you can skip quite a few cases that satisfy it
an example, if i'm not mistaken, is that, if you don't know the point's position, (not stored), the way here to get its position is to go through each nodes and take in account their size at each step and do quite a few arithmetic operations to get it, but here, the point's position is the path you've travelled (1s and 0s on x and y) is the same value as the point's x and y coordinates it DOES limit the grid/node size to be a power of two, but it's rarely a problem
@@DeadlockCode by the way, do you think it's possible to use direct-ish memory address to get a neighbor node? as in, shaping the memory space so you can just move the read position to access neighbor nodes It's very easy for sibling nodes sharing the same parent, but getting their immediate, lowest level neighbor could make it so much faster for collisions and other nearby interactions.
@@DeadlockCode at last, the biggest problem i've yet to figure out is elements that aren't points, which have a collision/bounding box larger than a node's contained area, which might force nodes to have a secondary vector to store a list of points whose sizes are contained in, but can't in their children
This seems like a great candidate for Voronoi particle tracking and/or GPU acceleration. Essentially each particle stores its N nearest neighbors, and its velocity etc is just derived from them. Place the particles in ping-pong buffers and do the computation in a shader where one buffer is the last frame's state and the other the current write target, and you can probably get 60+ FPS
I'm a high-level programmer with no degree and not really the brain for maths but I delved into that topic once and it's cool to this potential application.
Thank you! And yes, in 3D you would use an octree but the implementation for it would be almost the exact same, except for 8 children instead of 4, Vec3 instead of Vec2, and cube instead of quad. Essentially just swap everything that is 2D with its 3D counterpart.
the moment I saw rust code, i instantly subbed, because i believe that rust is the perfect language to display educational snippets for topics such as this.
Amazing! Please post the code when it's cleaned up. I'd love to try to push this to it's limits with a good computer (no need for real time haha). 10M bodies? 100M?
Very nice video. Another solution could be using dbscan clustering algorithm? It occurred to me because from my understanding you only want to group bodies based on a metric which is what dbscan does
I thought I'd make this pinned comment to answer some common questions.
1. Why is the end so short?
For some reason, while making the video, I got stuck in the mindset of compressing as much value as possible in as little time as possible. But I didn't consider the fact that some people actually find value in seeing the result of what was made. Who would've guessed? Not me apparently. I'll definitely make the showcases longer in future videos.
2. How do I make it 3D?
Here's a quick guide:
Start with the code shown in the video.
Begin with replacing all mentions of Vec2 with Vec3.
Make Quad into Cube
a) 'into_quadrant' to 'into_octant' by replacing (i >> 1)' on line y with ((i >> 1) & 1) and adding another line for z; 'self.center.z += (0.5 - (i >> 2) as f32) * self.size;'
b) 'into_quadrants' to 'into_octants' by returning 8 Cubes corresponding to the 0..=7 possible inputs to 'into_octant'
c) 'quadrant' to 'octant' by adding 'let z = (pos.z < self.center.z) as usize;' and changing the last line to 'return z
The 1. is definitely the curse of performance engineering. You try to optimize your code, and then find yourself optimizing everything in your life.
The 2. can be another video that you don't have to code and label it as just a guide, not tutorial or showcase. I didn't request it so don't worry about making it.
I'm curious to see your best algorithm when you finally upload it to Github.
Beautiful work man
I delete my original reply, LOVE YOUR WORK MAN! :-{D Very happy to see. If you do see this, what song plays at the 9:20 minute mark?
"That's outside the scope of this video", "Leave it as an exercise to the viewer"
You just know this guy huffs a lot of mathematics daily 😂😂
Guilty as charged.
Established standards
Mathematics: origin is in bottom left
Computer graphics: origin is top left
3:55 DeadlockCode: let's just use top right
Never let them know your next move.
@@evespirit I'm too stupid to define what that is or if I quality and yet I built a 3D engine out of autism, can we begin deconstructing the universe soon then?
The manga style
@@adamrushfordA 3D engine built out of literal autism. You love to see it. 🥹
So when do we attempt to find this universe's seed and predict the future?
I like how you're not afraid to mention terms without explaining them, for example, cache locality. kinda know what it means, just enough to know that explaining it would probably double the length of the video, but i like the fact that you mentioned it anyway, for the people interested enough to learn about those terms
I know the description says you'd want to clean up the code first before uploading but I feel like many would great appreciate even the "messy" version to play around with / learn from it. Appreciated the video and intrigued in seeing more of it
I've never had youtube recommend me a video 1 minute after it's uploaded before..
And I've never had someone comment on my video 1 minute after it's uploaded before...
I guess there's a first for everything.
Wow, this is really cool! I kinda wish you had showed the gravity simulation for longer at the end, I wanted to see everything form together nicely.
Also, please don’t stop making these videos! I had already watched your videos before, which explains why I was shown this one so fast, and even 10 months later they were very memorable to me, especially the drawing circles with one triangle video. Your videos are in a wonderful middle ground where they are easy to follow and possible to implement on my own, but still inspiring me to take the steps to deeply understand my problems to find better solutions. The pacing of your videos is also very good, if they were much longer or shorter they would be considerably harder to follow.
So, yeah. You are a positive influence on me, and I doubt I am alone. Keep being awesome! 💕
Thank you so much! This is exactly why I make these videos-to inspire others to create. I've always had a passion for coding and a dream of sharing that passion with the world. Your comment makes me feel like all those sleepless nights were worth it. Thank you again, and I'll definitely try making the showcases longer in future videos. 💕
Maybe just post a video showing a recording of a simulation? A bit like Nils Berglund's channel does.
I did a half hour version with an upset at 35mins or thereabouts and a 6 and a half hour with a brief interruption at the hour mark while i change the power profile of my monitor. 4k
I wasn't recommended this. I searched for it, and boy am I glad I did
Adding collision is straight up mindblowing. Kudos!
This is awesome, now I want to code my own gravity simulation
Good luck! 🍀
big props for talking about implementation details. extremely helpful
Why isn't there 5 minutes at the end for showing the simulation?
Agreee. We need mooore
I did an hour or so on my cockroach powered pc for an hour upsetting everything at the 35 minute mark.
6 hours 35 minutes with a brief overview of my pc's power settings re the 1 hour mark. No upset, i wanted to see how large the groups would grow :-{D
Surprised that TH-cam blessed me with not only a good video but also a great channel.
I'm glad you gave this video another shot, it was super informative :)
Very impressive and good display of the power of math, conputer science, algorithms, and physics!
Can't thank you enough for presenting something so beautiful and technically interesting. Inspires me to dig further into these things. Thanks!
dude this video was so cool! Would love to see more of just the simulations doing cool things, maybe with colours added randomly, or based on velocity, etc
This is amazing! I've just discovered your channel with this vid and I'm so glad I have. I wanna make cybersec simulations (kinda like Kaspersky's map) and I'm also learning/using Rust for all of my personal projects, and this video has inspired me so much and given me the confidence to tackle doing the simulations in Rust, even if it takes me much longer than it would in any other lang/program lol Seeing the performance this has is unreal
Thank you! I switched from C++ to Rust about 3 years ago and I’ve (almost) never looked back. I wouldn’t say Rust is perfect, no language is, but so far I haven’t found something to replace it. I say go for it, I believe in you!
C'mon, this guy has only 4k subscribers? It's not fair...
Greate video! Good luck!
Great Video! I really enjoyed the idea of thinking of quadtrees as pretending that you could combine every other point as one. Really great hook and also really well made in generell :)
Thank you!
I really liked the video! Don't worry if you can't pump them out faster, they're very good!
the production value is insane!
For accuracy, do you also plan to add energy conservation? Depending on the time step the accelerations and velocities build up errors which can be tempered by conserving energy (potentials and kinetics). Also, in the past I have attempted to use a dynamic time step, where objects that are very close get a small time step and are updated more frequently to capture more fine motion, such as moons and sub orbits. Balancing the time step for the earth and moon vs Pluto for example especially since your distant Quad nodes don’t really change there center of mass or energy, but close objects really change significantly.
Interestingly, energy conservation (via a symplectic integrator) and variable time step don't play nice together, and you need to do some extra work to make it behave correctly.
Really looking forward to the code. Nice channel, subbed!
Very well presented, underrated channel
Holy cow that visual is intensely cool. It feels so big
this is amazing, the simulation resembles the milky way and surrounding clusters when the particles spread out, also worth noting Space Engine uses oct-trees to plot their objects
5:25 quadtrees (and their 3d cousin octrees) are oddly satisfying as a space partitioning solution, and I get a little excited whenever the opportunity to use them comes up. I'll definitely have to play with this sometime soon. perhaps see how it fares in 3d.
“It’s just quadtrees?”
“Always has been.”
seems like whoever first decided quadtrees were useful for running worlds solved the problem for all cases in the future.
I like your words magic man.
The performance here appears quite amazing! I have not been able to get past ~20k bodies in my own solution made in Unity’s DOTS.. With Barnes Hut.. Is it okay if I try to apply some of the teachings here to my own project? I presume you do not even make use of SIMD?
It'd be a bit mean to teach everyone a method that they then couldn't go use!
Yes, anyone can use it. And that’s correct, I haven’t used any explicit SIMD in my implementation.
@@DeadlockCode Thank you! If it is okay to ask, which parts of the optimization process gave you the greatest speedups? I have spent time on things like SIMD and optimizing the gravity calculation before implementing an array/list-based tree. My approach made use of a hashmap, which is akin to a Dictionary. I do realize now it may not have been all that smart of me to calculate the COM of the nodes every time a body is inserted. I am surprised, this kind of scale is something i thought you had to go to GPU for and similar videos to yours do exactly that.
As you can see in the improvements section, I got a 3x speedup simply by sorting the array of bodies along a z order curve. Other than that I would definitely look into replacing the hashmap with a vector if possible since hashmap lookup requires a lot of, well, hashing. Other than that I would profile and see what lines of code or functions are taking the most time and see if I could either reduce their usage or perhaps find another way to accomplish the same result.
(I accidentally deleted the previous comment so if you got two notifications, that's why)
@@DeadlockCode Makes sense! I have done extensive profiling. I know exactly which areas are the most problematic (the computation of forces) but your solutions are not something I would have thought would work so great. I sort of assumed sorting would only add computational time. That was an incorrect assumption heh. I will slowly try to improve bits and pieces following this video and see how it goes!
Really great work on the optimization!
I really enjoyed your video and totally get what you mean at the end about the video taking a lot when you try to put effort into it. Keep it up because you are doing good work!
This is something incredibly beautiful, great job!
Great video, it was a pleasure to watch!
Definitely awesome ❤ Amene explanation, deep concepts, usefulband practical coding. +1 subscriber
Amazing work! Thank you for sharing ♥
Nice edit, i really enjoyed this video, thanks Deadlock
This is a great video! You’re an amazing educator. Please keep up the good work!
Great video! I wanted to see more simulations with different setup in the end
This is so beautiful! Thank you for sharing :)
And thank you for commenting :)
Great work mate
Thanks 👍
that is... much simpler than I expected, wow
Barns and Hut initially used to report the final dynamics calculation times, without including their so-called theorem construction phase, which amounts to the actual multiple expansion part of the algorithm. This caused a lot of buzz, but at the time, it was possible to write faster code using almost any other method, as long as the code was reasonably optimized. It’s pleasant to hear that multiple expansions are now better, since it was obviously an interesting idea from the start.
Nice to see it's performs in real time. I few years ago I wrote a pretty straight forward (didn't put much effort into optimization) 3D n-body simulator using Barnes-Hut in Go. I got about 1 sec per frame with 1 million bodies, which includes software 3d rendering and basic lighting (8th gen i5 laptop with 32gb ram). For simplicity, I wrote each frame to a png then made a video afterward with ffmpeg. The slowest part was building the octtree every frame. I had extremely simple accretion too, but mostly ran with it off since it didn't add much to what I messing with at the time.
🇨🇱Great video ‼️
So well explained.
Perfect companion for my liking of physics and math 👍😀
Thanks and congrats
Saludos de 🇨🇱
Would it be possible for you to describe your improved algorithm? I had a quick look at the source, but it would be fab to hear you explain it and the motivation behind it.
That was awesome! Well done!
Nice stuff :)
If you want, there is a O(N) method for approximating the many-body problem called Fast Multipole Method (FMM) by Leslie Greengard.
It's also based on a quad-tree decomposition but with added mathematical tricks to keep the O(N) complexity.
I never seen it used, I'm wondering if it's just that the log(N) acceleration is not that interesting if the FMM has a bigger constant.
so cool - and so easy to get running! thanks! I'm gonna have to learn some Rust finally!
Holy shit dude. This is stunning. If your roadblock is editing the video, it is really worthwhile to outsource it. Forcing yourself to do things that aren't really clicking will demotivate you, and it just isn't worth it. Your brain is so beautiful, I want to see what you are capable of when you are unblocked by the tasks that don't give you life, and you're free to spend your time doing ever you love; exploring and building. Please consider unburdening yourself from the activities that don't inspire you, and that bring you to the edge of the pit of despair. Life's too short to willingly endure a grind if you could otherwise sidestep it. Your work is too important. It would be like Henry Ford spending his time forcing himself to write a company newsletter when it doesn't inspire him and it drags him down.
Optimize your workflow bro. I hope you hear from you sooner than 10 months! 🥰🫶💯
WOW, great job!
Thank you!
Making a space related game rn, will be investigating this usage
Wow it's Rust! That's so cool, I love it!
so cool. i am learning bevy and trying to get basic gravitation going. going to look in to this after i have the basics working
This is one of the highest quality videos I’ve seen on YT. Well done! As a rust developer I love your explanations on cache locality and optimisations.
Have you thought about submitting this video to the Rust Weekly newsletter? I’m sure it’d be interesting to other rust devs as well as myself.
Really good quality video. Good work
Liked and subbed, keep up the good work!
Really great work! I'm a huge fan of n-body simulation and there sadly isn't much hobbyist stuff out there. Have implemented Barnes-Hut a few times, it's pretty cool. BTW s/d is technically tan(theta) not theta :)
Also, instead of using another library for the collisions, you should be able to use your quad-/oct-tree for that.
Yes, you can definitely motivate the criterion by using an opening angle and then baking parts into the constant. I tried to find proof that this was the original motivation from the Barnes-Hut paper (or it’s predecessors) but I couldn’t find anything and I didn’t want to claim that it was how they derived it if it wasn’t. Do you have any source for opening angle being the origin of the criterion (other than theta being a common variable for angles)?
And yeah, I could’ve definitely used the quadtree for collision detection and that could be really fun implementing, but I just wanted to keep it simple because it wasn’t the main focus of the video. 👍
I knew where this was going the second I saw the impl. "Well those calculations don't need to happen sequentially" and yep, apparently they don't need to happen on the CPU either
Wow this takes me back to my grad school days when I used this to simulate discrete vortices - similar 1/x^2 interaction between particles. I used tecplot for animation. This is about 2 decades ago. Not sure if tecplot still exists but I used c and openmp back then.
Great video, especially the comprehensive visuals!
I've written a library focused around acceleration calculations for N-body simulations called particular, maybe you stumbled upon it whilst doing your research. I'm very interested in seeing your code to explore how I could optimise the way my Barnes-Hut implementation is done, because even though it looks quite similar on a surface level (stack allocated nodes), your performance numbers seem quite incredible. I'm also curious what kind of hardware you are running the benchmarks on, and what value for the theta parameter was used for the benchmarks and the final simulation.
Incredible.
Great video, but I'm especially grateful for the repo. It will definitely come in handy for me as a learning tool. I want to do an efficient plasma simulation for example.
been having so much fun messing around with parameters on this :-) I was wondering though - coudl you make the space like in the arcade game 'Asteroids' - so whatever goes out of the screen comes in on the other side? i guess easy to do with the position - but for the gravity force we would need to imagine each body also repeated 'off the edges' of the screen
So question ( outside of scope) : is there a way you could continue to conglomerate masses with known distances to form a field equation approximation. I'm thinking one could avoid going to the leafs for a layer of propagation since it's possible it's close enough to a super position approximate? Similar idea to finite element analysis where you use points based on a resolution but because a region is homogeneous or a known pattern one could set the discrete estimate of the center of mass and an error term to provide a good enough approximation. The real point being that calculation-wise some of these terms are going to have a minute effect on a body that's far away and to the point of the approximation shown in the video it can be seen as a single point if you're far enough away or the additional points effects are negligible enough. Mathematically it'd be saying 1 + epsilon is approximately 1.
This simulation looks so good it’s uncanny… because there’s no dark matter to (somehow???) make the outer particles move at a similar speed to the inner particles.
Well done! 👏🏼
Could this be used to model atmospheric particles?
I am building a machine learning model to predict the impacts of solar weather on upper atmosphere electron temperature and density, I wonder if I could model the electrons as "particle clouds" that interact with other "particle clouds" represented by a single particle.
Incredible!
My immediate thought on hearing the problem was "can i use quad trees?" And then the answer was quad trees. My addiction is sated for now.
code link missing in description 💕💕
You twisted, monstrous person! 💜 This means you broke mathematics somehow, right? Mesmerizing!
Very impressive! My first thought upon seeing the video title was to use a flow field. I'm curious how that would compare. It would be O(N) but with a really bad constant depending on the resolution of the flow field, since every body has to update the flow field in a radius around itself. There would be some problems though. You could have a small "influence radius" for smaller objects, but in that case clumps of small objects can't create larger gravitational fields to affect objects far away. The Barnes-Hut method seems superior for that reason.
I would guess combining the approaches might be optimal. Rather than operating on particles, you could operate on cells as much as possible, and form a quadratic flow field within cells based on distant cells. Basically, instead of all the masses in cell A just approximating cells B, C, D, all the way up to X as point masses, cell A builds an approximation of the fields of all these cells for itself, and all the masses in A refer to that instead. The distance such approximations are valid would depend only on the size of the cell, so very predictable. Child nodes would know whether a distant node already contributes to it's parent's field or not, so evaluating which should contribute to it's modification to the parent field is possible.
There's a related algorithm (which I'm not an expert on) called the FMM (Fast Multipole Method) that 'allegedly' achieves O(N) time complexity but in my short research of the algorithm I'd like to argue that it would still be O(N log N) unless you accept that the accuracy would decrease as N increases. If someone is more knowledgeable on this topic, please prove me wrong, I would love to love the FMM.
I'm not sure if the same is applicable to a flow field but the first thing I'd be hesitant about is the limited scale of the simulation since we don't have infinite memory to store all of the possibly billions of cells necessary to accommodate the one body that decided to fly away from the main cluster or you would have to make each cell larger which would decrease accuracy. Also you would have to increase the resolution of the grid as you increase the number of bodies unless you accept that the accuracy would decrease, so I'm not even sure if it would really be O(N) time complexity or if it's just a cleaver trick that makes it look like it's O(N) time complexity.
You could probably use a quadtree and instead of accelerating each body towards the quadtree you could try self interacting between the quadtree's nodes and then propagate accelerations down towards the individual bodies (I think this is what the FMM does?). You'd still somehow need to construct the quadtree but I've heard some people say that this is possible in O(N) if you use a limited accuracy for body positions but this would again run into the same problem with far away bodies. I think O(N log N) is the best you can do if you don't make further assumptions about your data and or accept worse accuracy.
Remember, I'm no expert and I don't claim that everything in this comment is true - I'm just brainstorming and I am most likely mistaken about something.
great stuff!
Great video. When the code is ready, you should post a link to the video to This Week In Rust. Also, someone should try to port it to WASM. If I was adding SIMD, I use the nightly standard SIMD.
Hello, your videos are the best. Thank you for your effort. Will you please share the file of high resolution image of the purple star like animation? Seems so cool.
I love n-body videos! Make videos of the other n-body algorithms please!!! 🙏
Great job
Would be interesting to see for what value of n the brute force n^2 algorithm becomes worse than the approximation. Maybe ~1000? Presumably the brute force algorithm is easier to optimize and run on the GPU.
If you don't need synchronization between threads, and you don't mind rewriting the compute kernels in c, OpenCL is really nice for GPU, the ocl and simple_ocl rust crates make using ocl in a rust project relatively easy.
Please upload a video or two of the simulation.
Like any great thing this ended too soon.
babe wake up deadlock just posted!!
you dropped this 👑
9:13 what instrument is this :0 ? The sad one that goes "eeeeee ææ oouu oooooohhh"
And what song, I want to hear more!
Amazing video! Although a physicist, I have never done these type of simulations. However, I also try to implement tensor-based trees in my strategic game search tree algorithm (alphazero). Can you point me to some medium-sophisticated literature on vectorized implementation of trees on tensors? I could not find any. Thank you.
slightly above my head, but does something like BLAS matrices do this internally?
Area 51 wants their computer back
very good video
Very nice video, though the "Rusty" way to store the children would be to use an Option. Granted NonZeroUsize is pretty fiddly to work with sometimes so just using a usize is simpler though you have to "manually" remember that 0 means None instead of the compiler reminding you.
I agree, and for completeness, I'd also like to add that Option has the same size as a usize, which might sound counter-intuitive without the knowledge of Rust's null pointer optimization.
my main problem with this method is that you store the size and position of each node individually when you can already know how small it is, and you can make the coordinate system INCREDIBLY efficient by using the fact it's a base 2 coordinate system
so you can have 32-bit numbers represent a quadtree that's 32 levels deep, where each bit for x/y tells you the exact path to reach the bottom node, going from msb to lsb and scaling down the tree not just index by index with a simple comparison, but in some cases, in a single operation
worst case scenario, where every available cell has a single particle, you increase memory usage by 25% than if you just had them all in a vector
also means that the position of the node can be stored in a single int64 for both x and y and have all the coord math be done in a single operation for both
Yes, there are definitely improvements on the table when it comes to memory footprint. Although, compressing the sizes and positions requires more compute in each iteration of the acceleration function to 'decompress' the data. I actually tried going the other way by caching more data, storing the size squared to avoid the multiplication in the acceleration function, which actually improved performance. So it's not just "less memory = faster" but would instead require further testing. Although I haven't tested going the entire way of optimizing the usage of every bit so I can't say it wouldn't be fast.
@@DeadlockCode i didn't mean that less memory usage would be faster, but mainly that you can make the coordinate system like that so you don't have to walk down the nodes to find points, you basically get a step order (suite of indexes leading down to a point) double as its coordinate so you can skip quite a few cases that satisfy it
an example, if i'm not mistaken, is that, if you don't know the point's position, (not stored), the way here to get its position is to go through each nodes and take in account their size at each step and do quite a few arithmetic operations to get it, but here, the point's position is the path you've travelled (1s and 0s on x and y) is the same value as the point's x and y coordinates
it DOES limit the grid/node size to be a power of two, but it's rarely a problem
@@DeadlockCode by the way, do you think it's possible to use direct-ish memory address to get a neighbor node?
as in, shaping the memory space so you can just move the read position to access neighbor nodes
It's very easy for sibling nodes sharing the same parent, but getting their immediate, lowest level neighbor could make it so much faster for collisions and other nearby interactions.
@@DeadlockCode at last, the biggest problem i've yet to figure out is elements that aren't points, which have a collision/bounding box larger than a node's contained area, which might force nodes to have a secondary vector to store a list of points whose sizes are contained in, but can't in their children
Please make another video showing results, simulating with different colors and trying different initial positions and velocities
Of course we'll build a humongous version of Asteroids game with it !
call this the Barnes-Nut algorithm because ough that sim 😩👌
This seems like a great candidate for Voronoi particle tracking and/or GPU acceleration. Essentially each particle stores its N nearest neighbors, and its velocity etc is just derived from them. Place the particles in ping-pong buffers and do the computation in a shader where one buffer is the last frame's state and the other the current write target, and you can probably get 60+ FPS
I'm a high-level programmer with no degree and not really the brain for maths but I delved into that topic once and it's cool to this potential application.
Where is the GitHub repo? Also can you share the code for the animations in the video?
There's another algorithm that I used to work with for plasma simulation. It's called particle in a cell (PIC) simulation.
Great video ❤
I love physics simulations
I suppose to switch to 3D, you need to use Octree ?
Thank you! And yes, in 3D you would use an octree but the implementation for it would be almost the exact same, except for 8 children instead of 4, Vec3 instead of Vec2, and cube instead of quad. Essentially just swap everything that is 2D with its 3D counterpart.
the moment I saw rust code, i instantly subbed, because i believe that rust is the perfect language to display educational snippets for topics such as this.
Banger video
Amazing! Please post the code when it's cleaned up. I'd love to try to push this to it's limits with a good computer (no need for real time haha). 10M bodies? 100M?
Very nice video. Another solution could be using dbscan clustering algorithm? It occurred to me because from my understanding you only want to group bodies based on a metric which is what dbscan does
Rust implementation, lets goooo!