- 2
- 123 372
ChimiChanga
เข้าร่วมเมื่อ 10 พ.ค. 2009
Andrew Kelley Practical Data Oriented Design (DoD)
Copyright: Belongs to Handmade Seattle (https ://vimeo.com/649009599). I'm not the owner of the video and hold no copyright. And the video is not monetized.
In this video Andrew Kelley (creator of Zig programming language) explains various strategies one can use to reduce memory footprint of programs while also making the program cache friendly which increase throughput.
At the end he presents a case study where DoD principles are used in Zig compiler for faster compilation.
References:
- CppCon 2014: Mike Acton "Data-Oriented Design and C++": th-cam.com/video/rX0ItVEVjHc/w-d-xo.html
- Handmade Seattle: handmade-seattle.com/
- Richard Fabian, 'Data-Oriented Design': dataorienteddesign.com/dodbook/
- IT Hare, 'Infographics: Operation Costs in CPU Clock Cycles': ithare.com/infographics-operation-costs-in-cpu-clock-cycles/
- The Brain Dump, 'Handles are the better pointers': floooh.github.io/2018/06/17/handles-vs-pointers.html
In this video Andrew Kelley (creator of Zig programming language) explains various strategies one can use to reduce memory footprint of programs while also making the program cache friendly which increase throughput.
At the end he presents a case study where DoD principles are used in Zig compiler for faster compilation.
References:
- CppCon 2014: Mike Acton "Data-Oriented Design and C++": th-cam.com/video/rX0ItVEVjHc/w-d-xo.html
- Handmade Seattle: handmade-seattle.com/
- Richard Fabian, 'Data-Oriented Design': dataorienteddesign.com/dodbook/
- IT Hare, 'Infographics: Operation Costs in CPU Clock Cycles': ithare.com/infographics-operation-costs-in-cpu-clock-cycles/
- The Brain Dump, 'Handles are the better pointers': floooh.github.io/2018/06/17/handles-vs-pointers.html
มุมมอง: 123 827
so after all this stand-ups Andrew Kelley sitting on stream talking about how Zig has so many bugs because its compiler is slow. DoD didn't help?
I‘m a web developer and will probably never use any of this stuff but it’s super interesting!
Brilliant!
One thing I noticed about booleans is that in some cases people duplicate an implicit information with them. Which means such booleans can be replaced with a function. For example, with the monster struct, you can actually detect if a monster is alive by simply checking if HP > 0 (unless your game logic allows a monster be alive otherwise, of course). Sometimes booleans duplicate implicit state of a nullable field. For instance, I've seen a user account record with email and a flag called activated. But user account was only considered activated if it has an email, so you can just delete the boolean and use email != null check.
How do i get a system to treat a hard drive or flash drive AS main memory, fool it into thinking the storage device is RAM, then treat the ram as l2 cache? Can it be done with a driver or does it have to be done in the firmware?
isn't this just swapping/paging? why would you want to do this because it will be insanely slow
@@rangeramg no it absolutely would not be swapping paging, I am trying to avoid paging all together and just use a block device directly AS RAM! I'm trying to both bypass motherboard limits on supported ram, and make programs believe I have more physical RAM then I would even have room to hold the page file for when swapping! Asking if it was the same thing makes me wonder if you watched the video? Sorry for the sarcasm but I am tired of people repeating that same response when im explicitly saying that's not what I want to do, it's very frustrating.
@@petevenuti7355 if that's the case i think the only way to accomplish that is through modifying the operating system itself, not sure if l2 is even able to be changed (i doubt it since it's inbuilt into the cpu itself). the system that enables paging essentially treats it like extra memory in the address space, so not really sure what else you want to do other than that? you can just allocate a whole device as swap space and use it as memory in linux, so there's that, and you can probably limit useable memory to almost nothing to emulate what you want to achieve and yes i did watch the video in full
@@rangeramg yeah I'm not sure how the mmu on the motherboard and the memory management of the operating system work together or are capable of overriding each other. I was hoping somebody would know. I remember really old systems where you could put RAM on a card on the bus. As for the actual l2 cash you are most likely right as it's in the CPU chip, however if the operating system has anything to do with it, such that's not just microcode on the CPU that handles it, it would be cool if any real ram that was available could be used in that role.
Man, this guy is pretty good at Zig, he should join the team developing the language
34.55
can someone correct me,im new to this: is he say ing that its 1+4+3+8+3+4+1 which adds up to 24? or is it not 1 and 3 for new lines?
The most important take away from this is that he does't know what OOP is.
Exactly, real communism has never been tried
As neat as some of these tricks and techniques are, I am absolutely rejecting your PR if you do pretty much anything talked about here. Code quality is wayyyy more important than technically saving a few bytes
@@maskettaman1488 you mean dividing perfectly readable functions into 5 classes and 25 methods LOL
@@rusi6219 No. I mean not jumping through hoops and writing some arcane code so that you can have a bool and an int share a byte. There are very very very few cases where that kind of optimization is necessary considering the quality of the code suffers greatly as a result. But also, yes, you should be breaking up functions in to smaller units. If you have 1 function that does 20 things... you should probably look at breaking it up and organizing things a bit.
@@maskettaman1488 hahaha "quality of code" is about the last thing a customer is thinking about as he's staring at a frozen screen due to fifty destructors destructing other destructors
@@rusi6219 I can only assume you're new to development in general if you think "fifty" destructors would even be perceptible to an end user. Have you never seen a stack trace from a modern web server or something? Also, destructors have literally nothing to do with cryptic bit tricks to save a few bytes on your data structures like the video is discussing You really seem to not know what you're talking about
@@maskettaman1488 I can only assume you're new to development in general if you think "cryptic bit tricks to save a few bytes" has no effect on how well software runs at scale You really seem to not know what you're talking about
Great talk. No fluff. Just gold!
I'm not a zig programmer but I was very familiar with llvm for a while.... This is gold!!
"I programmed my compiler like an ECS based video game and it dramatically improved performance" was not the talk I expected to listen to today. That is a cool take. It makes me wonder if there is value in a language where this is how it stores things and processes under the hood. Like, can you write a language where ECS is not *a* paradigm, it's *the* paradigm.
I have doubts whether the monster example is relevant in the context of DOD. Let's say I want to update the position of all bees. I would have to iterate through the whole set and use an if condition to pull out only the instances I am interested in. That is still clogging the cache with data I am not interested in (all non-bee monsters). In the DOD approach this would be split into two (or more) separate systems, which would automatically eliminate the need for encoding/tagging.
great talk
Nice talk. I am just wondering if someone could suggest me any book that goes more in detail for hardware related stuff, maybe to better understand compilers and low level feature. For example, like the slide at 5:20
Patterson and Henessy - comp arch Dragon book - compiler
@@ChimiChanga1337thank you
6:16 very important gotcha! "math is faster than every memory operation. maybe you should repeat the math instead of meoizing the result."
Great talk! I look forward to using some of these strategies. Open question: 18:21 In languages with compilers that support #pragma pack, is there any advantage to storing booleans out-of-band? Assuming non-sparsity of the boolean field would you still not incur cache line misses regardless of whether the bool is in the structure or an array outside, since the cpu is forced to load the same # of bytes?
The whole point of storing struct fields at their natural alignment is so the CPU doesn't have to do extra work. Back in the 90s, top-of-the-line RISC CPUs might even crash your program or trigger some OS interrupt to handle unaligned access. I believe x86 tends to be more forgiving about this, but I'm not sure. It is likely that your compiler will go to great lengths to avoid unaligned memory access, using multiple smaller reads or writes instead. As always, looking at assembly output and benchmarking your particular program will be necessary.
Right, I guess my intuition about performance can go out the window now, because all of those steps sound incredibly slow
3:09 nice :)
Damn, I should learn Zig
30:10 the plug i was looking for
20:27 the speaker image covered the content. and i dont understand this example too either.
20:52 ohhhw, its about language feature where the lang allows to create array of each member ef struct.
About the lexer and retokenizing. That's unnecessary work for strings and other tokens where the length isn't the same for all tokens of that type. Instead, just have one token for "begin string literal" and one for "end string literal". That's what I've done in my own projects. You get tiny token struct and no retokenizing.
16:03 zig has them now
Bit fields? You don't have to use multiple arrays and enums which make the code less readable. struct Monster { unsigned int life_points: 63 unsigned int is_alive: 1 }; That's it. You can pack a lot of data if you know what range of values is maximum/minimum in each case.
Or in this case just check if life_points is lower than 0, but this is a very simple example. For the zig compiler I think that Andrew's strategy is good.
Hello, bitfields?
Why the fuck is a talk about an abstract concept in CS have 67k views in 3 weeks, this field is GIGA oversaturated.
In my opinion, this approach has an impact on code simplicity and maintainability. On one side you have a structure that contains what you are interested in. On the other side, well, it is more complicated. And I would even say more bug prone et harder to debug. Also, he focuses only on the size of the data, while the way you access the data is also very important. For example, it is supposed to be way faster to visit all the elements of an array than to visit all elements of a linked list. Because with the array, the memory access are very close to each other so you stay in the cache and the processor can even predict what memory address will be accessed next. With a linked list, you are accessing all over the place so most of the time the next item won't be in the cache. Don't get me wrong, it is very nice that Andrew is trying to make the compiler fast. And there are a lot of slow compiler out there. But I am not sure that many of us mortals need that level of optimization.
I would say Andrew misunderstood the meaning of DoD and what he showcased here is straight up optimization - a step most mortals do not have to take as their code is 1000x slower for other reasons entirely, i.e. using pointers from a linked list to refer to individually heal allocated items. There is also this funny thing where he optimized the most irrelevant part of the compiler, LLVM will gobble 98% of the compile time regardless (unless Zig can use it's own fast backends for debug builds). But I do want to say that the way you refer to "readability" sounds like you don't know what you're talking about and the "readable" code you would spew out would be anything but actually readable.
@@Nors2Ka For me, readability is about how easy it is to understand the code. There is an infinite way of implementing a feature in a program, and some of them are better than others according to some criteria, for example readability or performances. Regarding readability, I would judge it by the amount of time an effort an average developer will need to understand a peace of code. At 22:29, on the left, I instantly know and understand what is a Monster and what it is capable of from its struct definition. But on the right, If I look at the Monster struct, I don't know that it can held items. And if I find the held_items hashmap, I am not sure of the key of the hashmap. Is it the Monster index in the Monster list? Is the hashmap dedicated to the Monsters or is is shared with the NPCs or other possible item holders? Should I clean something in the hashmap when a Monster is deleted? For me, the left version seems a lot more easier to maintain and add features on (for example, looting).
I feel the same way. I know in my typical projects (large web and console apps), the code being easily read and maintained is much more valuable than saving KB of memory or a fraction of the time of a single web request. I guess these kinds of optimizations are for specific use cases, but none the less it’s interesting to hear about and have in the tool belt.
The example at 20:00 doubles the number of memory accesses for each struct (one with the animation id, another one for the monster type), unless the reduction in footprint makes it all fit into a single cache line.
I was going to comment this as well. Splitting the data like this will always require two cache-reads on the initial load *. If the out-of-band technique is applied to more data, then each one will incur an additional cache-read. When looking at an individual access, it is like it comes with a guaranteed cache-miss for each time you apply the technique. Data for arrays are stored contiguously in memory. In this example, if you expect to access several elements in a row, then much of the data will already be in cache - because the size of each struct independently is less than one cache line in size. If you expect to go through all the data in any order (and it is small enough to fit in cache), then this results in less cache-misses total, because there are simply less cache-lines to access. These benefits can artificially inflate the performance savings of the technique when people benchmark it by iterating over an entire array of data, or test it in a limited context that doesn't factor in the accumulation of external data in cache from other processes, if the data / procedure is not used often. * For completeness, there are some special cases where you don't have two cache-reads: One cache read: I assume this is what you were referring to in your comment. If you happen to be storing less than 64 bytes of total monster data, 7 or fewer monsters in this example (assuming the 'animation' data is cache-aligned to start with and the 'kind' data is not). Then you can fit 3 extra monsters into the cache-line, where it could only store 4 monsters in a cache-line previously. Three or four cache-reads: In this example his data is a divisor of the cache-line length, if it wasn't then there may be a chance of having to perform another cache-read for each access.
This technique is very useful for instances where you do multiple operations on the Monster and you only need one (or a few) fields at a time. For example, in a game you will have an update function and a render function. If you iterate with the padded struct, you waste time going over the anim field in the update function and then waste time going over the kind field in the render function. You just have to be careful of your use case, as with any optimization methods.
6:50 using light speed as a measure of how fast an operation executes is funny. It also makes you understand how fast modern computers are: having memory as fast as L1 cache 1 meter away from the CPU is literally impossible
Avail yourself of bitfields. They're part of that standard and they can really save your bacon space-wise. In particular, I think they're better than encoding multiple Boolean values into an enum because of maintainability. Imagine you have these encoded enums, and you have to write a function like is_human, and it's working just fine but then someone adds a new variable and re-encodes all the enums so that you have to include some more, but you forgot to update the function. Another advantage of bitfields is you can take advantage of your knowledge of your actual use of the size of types. Like if you know you need a 24-bit, You can just allocate 24 bits, can you the next eight for something else.
Nice but lisp coders knew this for decades McCarthy
When he made alive_monsters and dead_monsters I was smiling so big my gf started investigating me
So (Dod) not only saves memory but also makes programs faster !? I'm in !
So awkward seeing a nerd swear when doing a speech.
Can someone link the video he mentioned by the guy in the red shirt?
th-cam.com/video/rX0ItVEVjHc/w-d-xo.html - Mike Acton DoD talk
Before even watching, I'm gonna predict that the solution is basically using an ECS or similar techniques to make objects of arrays of data instead of arrays of objects.
In Allen-Bradley/Rockwell PLC world, every atomic data type has an alignment of 4 bytes. You can initialize a BOOL and it takes up 32 bits of memory space. Way more efficient to index into individual bits of a DINT. Unfortunately, it’s way more of a pain in the ass to see magic numbers everywhere, only to find out they’re binary masks. Edit: slight correction: AB states the DATETIME data type to be a 64 bit REAL that’s a “special use atomic type,” whatever that means.
ahah storing tokens without the end position... that's a new one for me! interesting
what an advantage in life being put onto coding at such a young age
good for your career, very bad for your s-x life
An advantage in coding
Yeah, I started 65 years old, too late 😢
His backstory is 1:1 my backstory… I feel very unique
great talk, but at 20:00 andrew is talking about memory layout and how 7 bytes are wasted - assuming the enum is a byte! in many languages an enum defaults to an int, which would be 4 bytes. imo, the enum backing type should be shown, just for clarity.
But even if the tag is a u64 or such, those 7 bytes are still “wasted” in the sense that they don’t provide any information. They will always be 0, since the tag will only take values from 0 to 4 it never touches those more significant bits. In practice the exact type of the tag doesn’t matter because it will never take values high enough to get use out of those bytes.
also what languages are you using that automatically and truly use 32 bits for an enum tag lol
@@jotch_7627 well theres this isa called x86
@@jotch_7627 the isa would determine word size, not the language. languages can target many isas. thanks for playing
@@jotch_7627 I mean rust uses an isize by default which is ssize_t in C aka a pointer sized integer so it’s not that far out of the realm of possibility, although 1. You can easily change this and 2. That’s only for debug builds, when optimisations are turned on it immediately collapses to the smallest size possible
Ngl i assmed for a moment this was a presentation to department of defense
Thank goodness you posted this on YT because now I don't have to go to Vimeo to watch this talk for the Nth time....
In twenty years there will be videos called "the problems with Data oriented Design", "reasons to avoid Data oriented design". The programming world moves in hype cycles around new methods. One time OOP was all the rave.
Yeah, any time there's a trade-off, you'll find a constant tug-of-war over the years. The strategies presented in the talk are great if you're ready to optimize, but it's now harder to maintain the code. As people migrate their code to optimize the bytes over the next decade there will be a resurgence of people arguing that maintainability is paramount.
And each time we hopefully get better
@@Galakyllzdata oriented design is not harder to maintain unless you only know how to maintain OOP code.
@@BoletusDeluxe That's what I mean: you will find coworkers that are not as capable as you, so changing the data structures in ways that reduce the memory required will just result in your coworkers making mistakes when adding new fields, then they become frustrated, then they avoid tickets involving that part of the project altogether. Now you are the only person maintaining that part of the project. Eventually your coworkers will say "Well BoletusDeluxe's changes a few months ago make that process unmaintainable. It's more efficient but it's harder to maintain, so that's why we haven't gotten around to adding all of these other features." You might think "well, the less-experienced coworkers will just get fired/replaced then", but that is painfully unlikely from my experience. You become the anomaly, the senior developer that no one is as capable as, so they don't try to hold everyone to your standard. If they wanted that, they'd only ever higher senior engineers. So to allow the entire team to work on that project, they revert your changes.
@@BoletusDeluxe"maintaining" OOP code = duct taping bleeding wounds
there are a few good intuitions, but there are a lot of mistakes in this talk, things that are left undefined ("natural alignment" of a struct? what is the definition of that?), very arch dependent, confusion between language and CPU...
why is the size 24 and later 16 when the struct variables were rearranged?
because of how computers align things in memory to prevent gaps in memory. The size decrease is a result of removing the extra padding needed to keep the struct aligned
A struct needs to begin and end on a multiple of its largest member. The first u32 takes 4 byes. The u64 starts 4 bytes after that, because it needs to start and end on a multiple of 8, leaving a 4 byte space. Then the u32 takes a further 4 bytes, and the compiler gives the whole struct another 4 empties to make an even 8 useless bytes. The second version takes the last u32 and puts it in that space, making it all nice and neat.
Meanwhile, web developers send 30mb over the network just to show a simple text blog
not all of us. the day I'm sending more than 300KiB of JS for a simple web app is the day I need to be taken out back
@@SamualNI'm glad you exist. As an occasional web developer, I'm constantly disgusted by what I work with.
@@SamualNbased
why think when i can just write attribute packed on everything