Some time ago I used something like this in a project for a microcontroller, but instead of the PoolObject struct that you used in the video, I used an union of a pointer to my PoolObject, and the data struct intself, and then used a single-link list to store the pool using that pointer to link each memory block. Since the pointer is needed only while the block is in the list, but its value doesn't matter when it is borrowed, an union is perfect. Getting an object was as simple as returning the first element in the pool, and returning it was also as simple as adding it to the top of the pool. Both operations used constant time, no matter how many elements I had borrowed. Also, I sorted the elements in my structure based on their size (first, 64 bit integers and doubles; then 32 bit ints and floats; then pointers; then 16 bit ints; then chars and 8 bit ints and bools). That ensured that the struct was as compact as possible and allowed me to take maximum advantage of the available memory.
I have done that before with creating a union to a block of memory, then referencing it by the union member that matches up with the code that is trying to access it. Works well.
As someone new to C and currently learning the language, your videos are really useful in helping me become a better programmer 👍 C is definitely my favorite programming language because it gives you some basic tools which is all you need to literally make anything
There is a slight oversight in the second version of the return-function: if the user tries to return a pointer that lies outside the pool, "i" might become negative or larger than the pool and the write access to "allocated" would possibly corrupt other memory. You should check the bounds for "i" to be 0...(NUM_OBJECTS-1).
The variable 'i' is declared 'unsigned', so negative is not an issue. It's also an 'unsigned int' which means there might be truncation of a wider 'pointer' datatype happening. That would open the door to multiple (wrong) collision possibilities of the calculated value if the pointer (address) provided by the caller is garbled or wildly wrong. If memory is tightly constrained, expending a 'bool' (16bits??) as a simple "yes/no" flag for each object is not conservation... Perhaps the 'opaque' librarian (object pool driver) could be allowed to use a reserved "sentinel value" inside the data record to mark "available" vs "occupied"?? When dispensing objects, the data region should be zeroed-out to ensure buggy application code can reproduce its own bugs for finding and squelching.
IMO, we could manage the free objects as a singly linked list (as a free list), instead of the allocated flag, now the allocation would be O(1). Also now the pool looks more like Jeff Bonwick's slab allocator.
Other advantage of pools/arenas: They make debugging easier, as the code has way less points where the memory bugs could come from, and the same goes for errors.
Thank you for the interesting video! I would love to see how an object pool can be implemented more efficiently. And I would definitely be interested to see a different underlying data structure being used for the pool, including its advantages and drawbacks
An added bonus of using object pools when used by applications running on modern -- let's say at least a decade old -- CPUs (in non-embedded platforms) is the notion of data locality, which, on those CPUs, equates to being more friendly with the CPU cache. This yields a more efficient (faster) application when applied correctly. Look into Data-Oriented Design for more details about effective cache use.
there is no bonus. there in no data locality. fastest place in memory where you can allocate smth is stack, slowest - global memory. read about memory paging
is assert at line 48 21:35 really neccesary? isnt it like reverse of what we just computed? is it possible to differ? maybe if user provides invalid ptr as vector ptr, if offset is invalid, we would catch it.
Nicely put. It reminds me of code I wrote 30 years ago for processing HL7 messages at line speed. I think I called that a "free list with pre-allocation" rather than a "pool", but the concept is useful in lots of areas as you said. From an "object" perspective, this could be a good thing to hide behind a Factory pattern. Cheers!
Hey, great video as usual :) Just a feedback from an audio engineers perspective: Go easy on the de-reverb or noise supression. Also reduce some compression. Its a bit exhausting for the ears to listen to this unnnatural processing. Keep it up!
Would it help to get his mic closer or just make sure that the mic is a cardiod so that it's not picking up outside noise? Just a touch of de-essing might help.
Thanks. I was trying to address an annoying echo, but I definitely don't consider myself an audio engineer. I'll play around with it for next week's video. Please let me know if that one sounds any better to you.
@@JacobSorber will do! Moving the mic closer is the best way to counteract annoying room acoustics. Maybe consider investing in a lavalier mic. Otherwise just dial down the de-reverb and compression a bit. A bit of reverb sounds way more natural to our ears than the heavily processed audio.
Okay. Might have been a good idea to emphasize for beginners that their application won't be able to get an array of adjacent objects this way... "But, malloc() can do an entire array all at once!!??" Understanding memory layout and use is a hurdle to be cleared before further progress can be made. That "pointer arithmetic" gives me shivers. CrowdStrike taught us to ALWAYS test for NULL before a function can tentatively trust a pointer... Might be a good idea for `release()` to zero-out the data bytes, too. Consistent buggy behaviour is easier to hunt-down than random buggy behaviour...
Another way to do this is to create the array of objects, then create an array of pointers to the objects. Then manage the array of pointers as if it was a stack. Or said another way, create a stack of pointers to the array of objects. An advantage of this is that you can have bool 'empty' and 'full' flags, which saves you having to do pointer arithmetic. It also saves you having to have the 'allocated' flag. If the pointer to the object is on the stack, it is not allocated. If it is not on the stack, it is 'allocated' The care that needs to happen here is if an ISR can push or pop while you are updating the 'push' & 'pop' pointers in the stack.
@@user-sl6gn1ss8p I never figured out a really good way handle concurrency. Turning off interrupts globally can work, but the risk is that you hold off an interrupt that needs to be serviced quickly. Sometimes I would use queues to pass data from an ISR to a process running in my while(1){} loop. But you still run into the problem of handling concurrency when modifying the 'empty' flag on each queue. (Use a queue to pass data from (eg a UART Rx ISR) and to (eq a UART Tx ISR).) One way I tried that seemed to work was for both the ISR & the main process have counters that are initialized at the same time. When the ISR increments its copy of the counter, the main process sees that the main process' counter no longer matches the ISR, which means it needs to get something out of the queue. Going from the main process to the ISR is easier: just have main process sending data trigger the ISR by setting a flag. That is particularly easy in an architecture that has a single cycle bit set instruction. Handling concurrency would be a great topic for a video?
Happy new year! In the second implementation of the ReturnVector3 you are computing the I (index) with the actual address of the given pointer of *v. Yet the object pool consists of PoolObject which means we need to calculate the offset with this in mind I guess. unsigned int I = ((uintptr_t)v - sizeof(bool) - (uintptr_t)object_pool) / sizeof(PoolObject); right?
Could you do a video on handling concurrency in c, particularly in 8-bit or 16-bit micros that are passing information / data between an ISR and a background process?
Initialize an vector with integers ranging from 0..num_objects. When you allocate an object pop a index from that vector for the memory address. When you free an object push its index to that vector.
I like it. Using a stack to determine whether something is allocated or not is very clean. Please see my comment about using a stack of pointers to the elements in the array. Same idea, just implemented in a different way?
About that pointer arithmetic in the ReturnVector3 function, why don't you get the index just using (v - object_pool)? If the types are both Vector3*, it will give you the index directly, right?
In this case, they point to different objects. One is pointing to a Pool Object (object_pool) and the other to a Vector3 (v). I would have to double-check, but I think subtraction of pointers to objects of different type produces undefined behavior.
Thanks Dr Sorber for the great video. Could you also do one on intrusive linked list at some point. With reference to your video, I was think if we had an intrusive linked list inside Pool object to point to Free objects we could make the Borrow function faster as well.
Would it work to include the index of the object in the struct? This would save having to do the pointer arithmetic in the ReturnVector3? As a retired Embedded Engineer, any time I see a division, I worry about execution speed. Especially because the div operation is usually implemented in code in a library somewhere and also may even have unpredictable latency. typedef struct { bool allocated; uint16 indexOfThis; Vector3 obj; } PoolObject; This would require an initialization routine (for loop) to initialize the member "indexOfThis". I also understand that this is storing the same piece of information in effect in two places in my solution. The position in the array that you can get from doing pointer arithmetic is the same as the index I am proposing adding to PoolObject. This is considered crude by some programmers. However, I am trying to trade the usage of a small amount of extra memory (the indexOfThis) for a lower and predictable, latency? BTW - some people advocate for placing the smaller members of the struct into the end so that the complier can 'pack' the struct. Some compilers will align each member of the struct on 4 bytes boundaries to take advantage of architectures which have 32 bit wide data buses, which will cause offsets of: 0x00 allocated; 0x04 indexOfThis; 0x08 Vector3.x; 0x0c Vector3.y; 0x10 Vector3.z; Contrast that with: typedef struct { Vector3 obj; unit16OfThis; bool allocated; } PoolObject; 0x00 Vector3.x; 0x04 Vector3.y; 0x08 Vector3.z; 0x0c indexOfThis; 0x0e allocated; This problem of inefficiently packing structs gets worse as the data bus width gets larger (8 bytes for a 64 bit data bus) and as the struct has more elements. Perhaps I am not giving modern compilers enough credit for doing a good job of optimizing and packing?
Great! basically it is in a gross way it is how a memory allocator works but without the option of allocate dynamic sizes, memory allocators also add an control struct containing the allocated space, for returning the object I probably will use the container_of idea to get a pointer to the control struct instead getting the index, but it would added additional complications like be sure that the pointer returned is part of the object poll, some allocators use a magic number as a member of the control struct to be sure of that.
whats the point of pools? we have arenas, we have stack. you said you don't want to use stack for example, ok, but whats the point of not using stack instead of pool then? using global variables is a bad practise too, so you are basically trying to provide "better" practice forgetting about others. and it's not about "trying to keep example very simple", good programmer writes good code no matter what
Aren't structs basically "interfaces" for making multiple objects with the same structure as the struct? That sounds very OOP, that sounds almost like a class.
no. structures are memory layout. bunch of offsets that compiler uses instead of .member notation. oop is about inheritance mostly. to achieve inheritance you need a vtable, basically an array of methods of the class. you can make oop in c, it's not that hard
PS. The "faster" returner has a math issue. "v" doesn't point to the beginning of the PoolObject struct, so if your pool is big enough, the math will become wrong for higher values of "i"...
PPS. Wrote some code that proved I was wrong about that. The rounding of the integer division doesn't seem to screw up at least up to tens of thousands of small objects...
I'm just wondering if this is a response to Casey Muratori | Smart-Pointers, RAII, ZII? Becoming an N+2 programmer th-cam.com/video/xt1KNDmOYqA/w-d-xo.html Thank you for posting this. I like the use of an array for this, but I'd also like the hashtable implementation.
Some time ago I used something like this in a project for a microcontroller, but instead of the PoolObject struct that you used in the video, I used an union of a pointer to my PoolObject, and the data struct intself, and then used a single-link list to store the pool using that pointer to link each memory block. Since the pointer is needed only while the block is in the list, but its value doesn't matter when it is borrowed, an union is perfect. Getting an object was as simple as returning the first element in the pool, and returning it was also as simple as adding it to the top of the pool. Both operations used constant time, no matter how many elements I had borrowed. Also, I sorted the elements in my structure based on their size (first, 64 bit integers and doubles; then 32 bit ints and floats; then pointers; then 16 bit ints; then chars and 8 bit ints and bools). That ensured that the struct was as compact as possible and allowed me to take maximum advantage of the available memory.
I have done that before with creating a union to a block of memory, then referencing it by the union member that matches up with the code that is trying to access it. Works well.
Interesting, could you provide a brief example so I can better understand/visualize what you’re saying? Thanks.
@@Cambeast123 I tried to put a pastebin, but youtube removed the comment. It probably think is spam 😞 I'll copy the code here...
#include
typedef struct {
int x, y, z;
} OBJECT;
typedef union {
OBJECT object;
void *pointer;
} OBJECTPOOL;
#define N_OBJECTS_POOL 10
OBJECTPOOL pool_block[N_OBJECTS_POOL] = {0};
OBJECTPOOL *pool = pool_block;
void init_pool(void) {
for(int i=0; ipointer;
return obj;
}
void return_object(OBJECT *obj) {
OBJECTPOOL *p = (OBJECTPOOL*) obj;
p->pointer = pool;
pool = p;
}
int main() {
init_pool();
printf("%p
", pool);
OBJECT *obj = borrow_object();
printf("%p %p
", pool, obj);
OBJECT *obj2 = borrow_object();
printf("%p %p
", pool, obj2);
return_object(obj);
printf("%p
", pool);
return_object(obj2);
printf("%p
", pool);
return 0;
}
(I know that the initialization isn't very elegant... I just preferred to make the code easier to understand what is doing)
As someone new to C and currently learning the language, your videos are really useful in helping me become a better programmer 👍
C is definitely my favorite programming language because it gives you some basic tools which is all you need to literally make anything
Object pools are the way to go. Thanks for sharing.
Nice one! I would like to see a video about other allocators!
There is a slight oversight in the second version of the return-function: if the user tries to return a pointer that lies outside the pool, "i" might become negative or larger than the pool and the write access to "allocated" would possibly corrupt other memory. You should check the bounds for "i" to be 0...(NUM_OBJECTS-1).
Good point. Thanks!
The variable 'i' is declared 'unsigned', so negative is not an issue.
It's also an 'unsigned int' which means there might be truncation of a wider 'pointer' datatype happening.
That would open the door to multiple (wrong) collision possibilities of the calculated value if the pointer (address) provided by the caller is garbled or wildly wrong.
If memory is tightly constrained, expending a 'bool' (16bits??) as a simple "yes/no" flag for each object is not conservation...
Perhaps the 'opaque' librarian (object pool driver) could be allowed to use a reserved "sentinel value" inside the data record to mark "available" vs "occupied"??
When dispensing objects, the data region should be zeroed-out to ensure buggy application code can reproduce its own bugs for finding and squelching.
IMO, we could manage the free objects as a singly linked list (as a free list), instead of the allocated flag, now the allocation would be O(1).
Also now the pool looks more like Jeff Bonwick's slab allocator.
Other advantage of pools/arenas:
They make debugging easier, as the code has way less points where the memory bugs could come from, and the same goes for errors.
2 minutes in -- yes please, keen to see some videos on those topics/things!
Thank you for the interesting video! I would love to see how an object pool can be implemented more efficiently. And I would definitely be interested to see a different underlying data structure being used for the pool, including its advantages and drawbacks
You could also do `PoolObject *pool_obj = (PoolObject*)((unsigned char*)v - offsetof(PoolObject,obj));` in the return function.
Very nice! I think part 2 could explore more about ObjPool because it's very useful! Tks.
An added bonus of using object pools when used by applications running on modern -- let's say at least a decade old -- CPUs (in non-embedded platforms) is the notion of data locality, which, on those CPUs, equates to being more friendly with the CPU cache. This yields a more efficient (faster) application when applied correctly. Look into Data-Oriented Design for more details about effective cache use.
there is no bonus. there in no data locality. fastest place in memory where you can allocate smth is stack, slowest - global memory. read about memory paging
This video is really helpful Jacob ❤
is assert at line 48 21:35 really neccesary? isnt it like reverse of what we just computed? is it possible to differ?
maybe if user provides invalid ptr as vector ptr, if offset is invalid, we would catch it.
Nicely put. It reminds me of code I wrote 30 years ago for processing HL7 messages at line speed. I think I called that a "free list with pre-allocation" rather than a "pool", but the concept is useful in lots of areas as you said. From an "object" perspective, this could be a good thing to hide behind a Factory pattern. Cheers!
Hey, great video as usual :) Just a feedback from an audio engineers perspective: Go easy on the de-reverb or noise supression. Also reduce some compression. Its a bit exhausting for the ears to listen to this unnnatural processing. Keep it up!
Would it help to get his mic closer or just make sure that the mic is a cardiod so that it's not picking up outside noise? Just a touch of de-essing might help.
Here I was wondering if my headphones were tweaking out lol
Thanks. I was trying to address an annoying echo, but I definitely don't consider myself an audio engineer. I'll play around with it for next week's video. Please let me know if that one sounds any better to you.
@@JacobSorber will do! Moving the mic closer is the best way to counteract annoying room acoustics. Maybe consider investing in a lavalier mic. Otherwise just dial down the de-reverb and compression a bit. A bit of reverb sounds way more natural to our ears than the heavily processed audio.
Happy New Year, Jacob.
Happy New Year to you too!
I wish a video about safe code in C, like common mistakes
Great explanation.
Okay. Might have been a good idea to emphasize for beginners that their application won't be able to get an array of adjacent objects this way...
"But, malloc() can do an entire array all at once!!??"
Understanding memory layout and use is a hurdle to be cleared before further progress can be made.
That "pointer arithmetic" gives me shivers. CrowdStrike taught us to ALWAYS test for NULL before a function can tentatively trust a pointer...
Might be a good idea for `release()` to zero-out the data bytes, too. Consistent buggy behaviour is easier to hunt-down than random buggy behaviour...
Another way to do this is to create the array of objects, then create an array of pointers to the objects. Then manage the array of pointers as if it was a stack. Or said another way, create a stack of pointers to the array of objects. An advantage of this is that you can have bool 'empty' and 'full' flags, which saves you having to do pointer arithmetic. It also saves you having to have the 'allocated' flag. If the pointer to the object is on the stack, it is not allocated. If it is not on the stack, it is 'allocated'
The care that needs to happen here is if an ISR can push or pop while you are updating the 'push' & 'pop' pointers in the stack.
How would you go about taking care of that case?
@@user-sl6gn1ss8p I never figured out a really good way handle concurrency. Turning off interrupts globally can work, but the risk is that you hold off an interrupt that needs to be serviced quickly. Sometimes I would use queues to pass data from an ISR to a process running in my while(1){} loop. But you still run into the problem of handling concurrency when modifying the 'empty' flag on each queue. (Use a queue to pass data from (eg a UART Rx ISR) and to (eq a UART Tx ISR).) One way I tried that seemed to work was for both the ISR & the main process have counters that are initialized at the same time. When the ISR increments its copy of the counter, the main process sees that the main process' counter no longer matches the ISR, which means it needs to get something out of the queue. Going from the main process to the ISR is easier: just have main process sending data trigger the ISR by setting a flag. That is particularly easy in an architecture that has a single cycle bit set instruction.
Handling concurrency would be a great topic for a video?
Very nice video.. thanks 🙏
Nice video.I feel motivated to do some c
Happy new year!
In the second implementation of the ReturnVector3 you are computing the I (index) with the actual address of the given pointer of *v. Yet the object pool consists of PoolObject which means we need to calculate the offset with this in mind I guess.
unsigned int I = ((uintptr_t)v - sizeof(bool) - (uintptr_t)object_pool) / sizeof(PoolObject);
right?
returnvector function could take in double ptr to vector so that we could null out the user ptr
Could you do a video on handling concurrency in c, particularly in 8-bit or 16-bit micros that are passing information / data between an ISR and a background process?
It would be really nice if you can also mention how other mallocs can decrease memory fragmantation
Great video
Great video as always Jacob. Isn't this fixed size linear allocator?
Is this good for real time applications like flight control or audio processing?
Initialize an vector with integers ranging from 0..num_objects. When you allocate an object pop a index from that vector for the memory address. When you free an object push its index to that vector.
I like it. Using a stack to determine whether something is allocated or not is very clean. Please see my comment about using a stack of pointers to the elements in the array. Same idea, just implemented in a different way?
About that pointer arithmetic in the ReturnVector3 function, why don't you get the index just using (v - object_pool)? If the types are both Vector3*, it will give you the index directly, right?
In this case, they point to different objects. One is pointing to a Pool Object (object_pool) and the other to a Vector3 (v). I would have to double-check, but I think subtraction of pointers to objects of different type produces undefined behavior.
@JacobSorber Oh, sorry about that. I thought object_pool was simply a pointer to an array of Vector3
As someone who doesn't know a whole lot about alternate allocation schemas, I would love to hear an explainer on them.
Thanks Dr Sorber for the great video. Could you also do one on intrusive linked list at some point. With reference to your video, I was think if we had an intrusive linked list inside Pool object to point to Free objects we could make the Borrow function faster as well.
Next week. 😀
Would it work to include the index of the object in the struct? This would save having to do the pointer arithmetic in the ReturnVector3? As a retired Embedded Engineer, any time I see a division, I worry about execution speed. Especially because the div operation is usually implemented in code in a library somewhere and also may even have unpredictable latency.
typedef struct {
bool allocated;
uint16 indexOfThis;
Vector3 obj;
} PoolObject;
This would require an initialization routine (for loop) to initialize the member "indexOfThis".
I also understand that this is storing the same piece of information in effect in two places in my solution. The position in the array that you can get from doing pointer arithmetic is the same as the index I am proposing adding to PoolObject. This is considered crude by some programmers. However, I am trying to trade the usage of a small amount of extra memory (the indexOfThis) for a lower and predictable, latency?
BTW - some people advocate for placing the smaller members of the struct into the end so that the complier can 'pack' the struct. Some compilers will align each member of the struct on 4 bytes boundaries to take advantage of architectures which have 32 bit wide data buses, which will cause offsets of:
0x00 allocated;
0x04 indexOfThis;
0x08 Vector3.x;
0x0c Vector3.y;
0x10 Vector3.z;
Contrast that with:
typedef struct {
Vector3 obj;
unit16OfThis;
bool allocated;
} PoolObject;
0x00 Vector3.x;
0x04 Vector3.y;
0x08 Vector3.z;
0x0c indexOfThis;
0x0e allocated;
This problem of inefficiently packing structs gets worse as the data bus width gets larger (8 bytes for a 64 bit data bus) and as the struct has more elements.
Perhaps I am not giving modern compilers enough credit for doing a good job of optimizing and packing?
I always seem to have to implement stuff you make videos on a week before
Great! basically it is in a gross way it is how a memory allocator works but without the option of allocate dynamic sizes, memory allocators also add an control struct containing the allocated space, for returning the object I probably will use the container_of idea to get a pointer to the control struct instead getting the index, but it would added additional complications like be sure that the pointer returned is part of the object poll, some allocators use a magic number as a member of the control struct to be sure of that.
Slab/slob/slub video would be great
The book I'm learning C from is so old, pools of any kind aren't mentioned at all
Did this back in the day with a file handles on an phone auto attendant dating app because only had 512mb of RAM and every byte counted
I guess the only downside of this is allocing objects is O(n) where n is the number of objects allocated
I guess, we can trade off memory for performance by introducing a free list
whats the point of pools? we have arenas, we have stack. you said you don't want to use stack for example, ok, but whats the point of not using stack instead of pool then? using global variables is a bad practise too, so you are basically trying to provide "better" practice forgetting about others. and it's not about "trying to keep example very simple", good programmer writes good code no matter what
Aren't structs basically "interfaces" for making multiple objects with the same structure as the struct? That sounds very OOP, that sounds almost like a class.
no. structures are memory layout. bunch of offsets that compiler uses instead of .member notation. oop is about inheritance mostly. to achieve inheritance you need a vtable, basically an array of methods of the class. you can make oop in c, it's not that hard
No one cares but... I am a true Code Wizard when it comes to object pools 🙂
There is the terminology of objects in c, it’s just not the same as c++
20:12 Would've been better to just
PoolObject *pool = &(object_pool[0]);
PoolObject *node = (PoolObject*)(((bool*)v)-1);
size_t i = (((uintptr_t)node) - ((uintptr_t)pool)) / sizeof(PoolObject);
assert( node >= pool && i < NUM_OBJECTS );
node->allocated = false;
PS. The "faster" returner has a math issue. "v" doesn't point to the beginning of the PoolObject struct, so if your pool is big enough, the math will become wrong for higher values of "i"...
PPS. Wrote some code that proved I was wrong about that. The rounding of the integer division doesn't seem to screw up at least up to tens of thousands of small objects...
I'm just wondering if this is a response to Casey Muratori | Smart-Pointers, RAII, ZII? Becoming an N+2 programmer th-cam.com/video/xt1KNDmOYqA/w-d-xo.html
Thank you for posting this. I like the use of an array for this, but I'd also like the hashtable implementation.
Audio is awful. Please reprocess and reupload. Unwatchable in present form.