Some days, I just don't feel like writing code. Then a video like this pops up in my feed and I find myself back at the keyboard, refactoring something. Haha ... thanks for the C content!
I think this is exactly why C is one of my favorite languages. Reinventing some stuff keeps my mind active and sane when diving into languages with lots of abstractions. I think this makes me much more aware about performance trade-off's hidden in language or library goodies.
@@hbobenicio doing the same thing in very high level languages like Java is also pretty fun. It's thrilling using the cracks in the system to poke with a stick into the inner workings of an overcomplicated machine. Also bringing Java code to segfault is really entertaining 😂
I don't write C any more (and never did any heavy lifting with it), but it's nice to see how development is evolving. I've always liked how much space C gives us to do things our own way (including how to screw things up royally, but I don't see the latter happening here!) I'll definitely give this another view. I'm in awe of the folks who really know how to do this stuff. C is still my favorite language; I never learned to hate it, even though I've had my share of awful bugs just like everyone else, but not as awful as the ones we get when we really know how to code. Now we've learned all about robustness. Cheers!
@@BboyKeny Whatever gets it done fastest, and that tends to be Python. I'm not working on computationally large problems these days. There's a soft spot in my heart for SML (or maybe it's a soft spot in my head!) I wish I had time to learn something about Haskell, but life is short.
I have done, templates in C using macros. It's not pretty. Some of the reasons why are: confusing error messages, the fact that your code linter will have trouble telling you were errors are, junp to function definition will likely be broken, your code will be less readable, confusing formatting. That was the reason i ultimately switched to D.
You gained a sub. I use a couple of these tricks in my projects (except the arena allocator), and find them very useful. Thanks for documenting such neat tricks.
Thank you! I am a CS PhD student and I definitely learned something new: arenas/bump allocator with this video. Arenas feel pretty useful to correctly free memory when raising exceptions during the function lifespan. We do not need to remember all the stuff we have to free when quitting the function early on. I like to code in the Nim language, for its templates and meta programming features which are well easier to debug (and much more powerful). There are plenty of features that I wish were there in C (default values, some automatic type inference for function callbacks, etc...) but the amount of optimisations in Clang/GCC, the reliability of debugging tools like Valgrind, the structural typing and the simplicity of the language (no absurd edge cases to learn like in JS) makes it an incredible language ;)
Nice concise explanation of bump arena allocators. I have a feeling you like Zig haha. Also the explanation of how to use the pre processor to make templates is good. I just wish there were a way to do that without using the pre processor or meta programming, but oh well
@@voxelrifts same thing popped in my mind, sounds like how zig handles memory allocation, and meta programming can be done in the same sourcefile with a comptime keyword
The lack of templates is really what's dragging c down, and it's the main reason I use c++, only for the templates though, not for the OOP stuff. Great video! Arena allocators are wonderful.
@@captainfordo1 It's a compromise between static correctness and runtime correctness. Templates avoid type erasure with void*, which leads to needing to use the debugger more often, if you're lucky to find the error before you segfault. It's been proven time and time again that generics are the way to go (literally, even go has them), without generics you'll spend time reinventing the wheel or dealing with type erasure. I meant C having "templates" as a way to do generics, not specifically to have the same implementation as in C++, which is what you're saying is hard to debug. Also it is not clear if you meant hard to debug as in "the compiler error messages when using templates are hard to understand", or if you literally mean debug template code, which are very different things. If you can't understand compiler errors then ask chatgpt, it's great help, better than having no compiler errors but a few bugs.
7:30 Count-based strings have their benefits, but they come with tradeoffs. You're trading a 1-byte null terminator for 4-8 bytes of size for each string reference. It's a balance of pros and cons.
The "takes 3 bytes more" argument isn't that good though, now that we have gigabytes upon gigabytes of memory. There isn't any balance really, count based strings win at literally everything else I would say
@@voxelrifts In memory-constrained environments like Microcontrollers, wearables, and automotive systems, those 3-7 extra bytes per string reference can significantly impact resources & performance. It's crucial to choose the right tool for the job.
@@heapninja yeah, this video is definitely not going to be applicable for embedded systems. Arenas would also be unnecessary there, so would massive data structure templates.
For jank templates I prefer the header file variation where you #define an argument (usually the type but any template parameter) for the template and include the template header. The template header uses the template arg to generate a bunch of functions and struct, then undefines all it's args. This way you can debug the code as you would normally because the function isn't created in a macro expansion.
one problem i have faced with this method that i dont know if there exists any fix for it is that if you want to for example include the template multiple times in your program simply for the declarations to use in other headers, include guards wont save you from multiple inclusion because the generated code is not guarded. Which means that you'd have to make a custom implementation header file with your own guards for one or multiple template specializations of the header file. If you know of any workarounds that avoid needing to make an extra file, i would be extremely thankful to hear of them.
@@tiranobanderas5655 There's an attribute you can use in GCC/clang for weak linkage. This basically makes the linker de-duplicate multiple instances of the same symbol across linked object files. This is basically what C++ does with template generated code (and is part of why C++ compilation is slow). Alternatively you can put prototypes in the header and ifdef guard the function implementations so that you can define a symbol to prevent the function implementations from being generated.
@@slayerxyz0 yeah, the second alternative is what i want to know how to do, but im not sure if im doing it wrong, because even if i make an implementation macro so that i can include only the definitions, the struct causes a redefinition of type problem. for example, if i have a generic dynamic array template header which i would use like this: //if i want to include the definitions as well #define T float #define IMPL #include "template.h" //if i only want the declarations #define T float #include "template.h" then, inside of the template.h file i would have a struct defined to use the data type defined for T. typedef struct { T *data; size_t len, cap; } STRUCT_NAME; then i would get a struct redefinition problem because the struct declaration itself is not within the ifdef guards.
Arena allocator is not necessarily a bump allocator. I've seen arena's implemented as a linked list of memory pages which works like this: - does the allocation fit in the page? bump current page - else create new page and allocate there then the arena_free traverses the list of pages and frees them supposedly you don't need this if you reserve (not allocate) a large chunk of virtual address space and put your allocations in there but I've yet to find a reference implementation online that actually does this
As I said, everyone seems to have different names. I heard many people call arenas bump allocators which I why I included them in the video. As for the large virtual memory reserve thing, that's exactly what I do for my arena implementation. I have 2 pointers instead of just one, a commit pos and an alloc pos
If you have two pointers (one for the filename and one for the extension) you can calculate the length of the filename just by using pointer arithmetics. long long difference_in_elements = ptr2 - ptr1; long long difference_in_bytes = (ptr2 - ptr1) * (long long)sizeof(*ptr1); Note: The difference between two pointers is not in bytes, but in elements. Which amounts to the same thing for the char type on most systems. Because on most systems sizeof(char) is equal to 1.
@@thebatchicle3429 No, the difference between two pointers is not in bytes, but in the number of elements (of the corresponding type). This is usually the type long long (signed) or int (signed). The result can also be negative. int array[2] = { 0 }; int *ptr1 = array; int *ptr2 = array + 1; printf("%p %p %lld %lld %zu", (void*)ptr1, (void*)ptr2, (ptr2 - ptr1), (ptr1 - ptr2), sizeof(ptr2 - ptr1)); //Out: 000000000027FB9C 000000000027FBA0 1 -1 8 You must never implicitly assume that the size of a char corresponds to 1 byte. This may be the case in the vast majority of cases, but there are also systems on which this is not the case. That's why my example said "difference_in_bytes" and not "difference_in_characters". But your hint is still good, because in this context you would probably expect the difference in characters. Thanks for that, I will amend my comment accordingly.
@@thebatchicle3429 The difference between two pointers is not in bytes, but in elements. That's why I wrote "difference_in_bytes" and not "difference_in_characters". However, your hint is still good, as in this context you would expect the difference to be in characters and not in bytes. Thanks for that, I've amended my comment. Cheers
The standard says that sizeof(char) is always 1. However, it does guarantee how many bits char is. On most systems, CHAR_BITS == 8, but some might be 16 for example.
Nice, I am just now starting to lay out the basics of a c style compiled language I'm planning to implement. I was looking around what allocation strategies to support but couldn't decide. In the past it was pretty easy either you garbage collect or use a malloc free system but nowadays there are so many niche competitors trying to revolutionize the game. Thanks to this video I might consider making arenas my primary allocation strategy. We will see ourselves again in 5 years when the language is mature enough to even work 😂
Nice! Good luck on your language. I would recommend having arenas for temporary or scoped dynamic allocations, and providing a malloc-like interface anyways because it is necessary in a few cases like dynamic arrays and such (unless you leverage virtual memory that is)
@@voxelrifts thanks. I was planning on allowing access to malloc and free anyway for potential c interop. Dynamic datastructures should ideally be part of the standard but I will see.
I have some questions about how to use that string struct. 1. How do you print a string that's not null-terminated? 2. How do you get the length of the string? strlen (but that won't work if it's not null-terminated)? I assume you don't manually count characters.
Instead of using %s for printing a string, you can use %.*s which allows you to give it a size. I have a macro for str_expand(the_str), which just expands to `(int)the_str.size, the_str.str`. so what I can do is simply do printf("%.*s", str_expand(my_string)); Length of the string is stored right within the struct so there's no need for strlen or counting. If you mean how I convert a string literal to a string struct initially, then I use a macro called str_lit which uses the sizeof() operator which returns the size for the string including the null terminator, then subtracting 1 gives me length of the string. github.com/PixelRifts/c-codebase/blob/master/source/base/str.h Lines 42 and 43
@@yogxoth1959 OK one thing I forgot to mention which is important, the str_lit macro only works on string literals, not char*s. If you want to convert a char* to a string type you have to use strlen. (This is because sizeof works differently specifically for string literals)
Some template functionality can be emulated in C with function pointer parameters. As an example I have abused this approach to essentially create 25 versions of the same function in a single TU and it has worked really well. If I used C++ templates it wouldn't save me any LOCs there and the syntax is simpler. Function pointer sounds like an indirect call and indirect call doesn't speak performance, but don't worry. Compiler can and will optimize out hundreds of lines of code beyond comprehension if you give it a chance to do so. The compiler (Tested with GCC) can be easily encouraged and even forced to generate several versions of the same function and inline functions specified as function pointers or even inline the entire thing as long as definitions of both functions are visible in a single TU. This will generate essentially the same code as respective C++ templates or macro templates. You trade some flexibility as function pointers can't replace everything, but you can still do alot with those. You also get strict type safety and the same debugging experience as with your regular code.
One metaprogramming option that I'm exploring is using python to generate the C code, and add that C code as a target to a makefile (so that every time I modify the python script, the C files get regenerated). Another option would be to use SCons as a build system, since it's already Python code it would integrate more seamlessly
i really like C, until i saw Zig, zig was a very refreshing view on low-level programming, by default, zig standard library uses by default allocators, c_allocator, heap_allocator, and many more..., you can use any of them, also i really like zig's syntax, because it's a mix between C and OOP, but in a C-style: you have structs, structs can have fields, they can have methods (which are stored within the type, and not the object). every files are more or less giant structs. the only thing i don't really like in zig, is that there is too much builtin functions. I really like the comptime keyword though. anyway, if you like(d) C, you'll surely like zig.
The special advantage of zig's comptime when it comes to templating is that you can use it for things that otherwise would require a separate 'templating part' of the language (often an entire different language), like we have in C, C++ and Rust. In Zig, you use the same exact language both for the code you want to ultimately compile and for the templates. In fact, you use the same language even in your build script! There's no gnu-make, nmake, cmake, autotools, meson, and so on. There's only Zig.
I just started learning C a week ago (I already work as a developer so I know how to code but never did anything serious with C), and this video literally answered the most important questions that I had regarding this language one by one, the youtube algorithm really nailed it this time.
You should keep in mind that different types in C have different alignments when allocating memory, and that even though your code will work with or without taking alignment into account, misaligned memory accesses can degrade performance. Awesome video tho
This is certainly an interesting technique, though I would suggest using pre-existing libraries when possible. For memory allocation, unless you absolutely need it to be as fast as possible, using a GC library for those that can't or don't want to manually manage memory, would be my recommendation. Hans Boehm wrote one that you might consider looking into. For string handling, it's a good idea to do the same, finding a library that suits your style but that stores the length. All that said, data structures are the real crux for people new to the language. Most people either don't learn them properly or at all in school, and not just the implementation, but the selection of, can have a huge impact on performance and memory usage. There's plenty of videos that go over some of the vagaries of selection, but they're often too general purpose and truthfully, I often mix and match to make hybrid structures anyway. It'd be great if someone could make a video series on implementing some of the more obscure data structures and how to mix and match them for more effective design.
@@justadude8716 It doesn't really take long and does help to get better at understanding memory and other important things if you do do things yourself sooo ¯\_(ツ)_/¯
A lot of these are why I love using Rust. Explicit lifetime management, proper memory management, etc. all in a world where you don't "hope" that a library works as you expect.
My C++ learning was stagnant. In order to learn C/C++ in depth, I turned to learning Rust. Now after watching your video, I found that these best practices are used by default in Rust community. I think This is the benefit of learning a modern language
What I don't understand is how do you get a value from the arena? Especially if you have multiple things inside it - how do you keep track of all of the things inside the allocator to retrieve the correct one when you need it?
@@BiskitSlippers allocators are stand ins for stuff like malloc or mmap. You always store pointers to elements within them, they don't provide a retrieval mechanism for specific data for you.
@@voxelrifts Oh I think I understand now. So an allocator is really like your own personal area to store data and one you have a little more control over? But at the end of the day you need to keep track of your own data manually as you usually would?
I used pretty much the same approach for my data structures except that I used generic selection. Usually in C people would just use void pointers for the data and not bother with the macros, but I wanted to see what it's like. I like the type safety I get, and I imagine the compiler can do more optimizations knowing the type. I do wonder though, what if I use void pointer and generic selection based on type to get _some_ level of type safety and less generated code and potentially easier to debug. I need to experiment more.
i really like the idea of arena allocation, one thing though is after reading the articles you linked is that i still dont understand how youre allocating your arenas, if the arena is stored on the heap or the stack edit: i misread your codebase; correct me if im wrong but you allocate an arena on the heap instead of the stack, so would it be wrong of me to just always allocate arenas on the heap just to save me a headache on managing the stack also thank you for the great resources!
@@1nilusnilus Yes I am allocating arenas on the heap. For my codebase I use virtual memory allocation here so I don't run out. You could allocate arenas on the stack as well, the allocator startegy stays the same, you just have to be careful of stack overflows
I'm trying to implement memory arenas right now and I asked ChatGPT what it thought about my arena_alloc function. It mentioned that the function would probably not handle arbitrary structs very well, because of alignment. I don't think you mentioned that in this video. Is this something I have to worry about?
The program everything search by void was said to be all written in C it is so fast at finding my files. I use blender a lot and model things and make cut files. C is used for robotics. Blender has Python. I am interested in writing code and want to understand what I am doing so I am starting with C
Could you provide some more direction towards metaprogramming in C? Lets say I want to make something equivalent to template T vector_get(int index); this function would return the data stored at index in a vector So my metafile would generate vector_get multiple times for different data types? and after having my metafile generate the types, I would use the appropriate type where I need it? Do I understand this correctly?
How do you handle struct redefinition errors? I was trying to remake a std::pair in c, for C23 everything seems fine, for older versions i came up with additional define macro like this: #define Pair_define(T1, T2) \ struct Pair_##T1##_##T2 { \ T1 first; \ T2 second; \ }; #define Pair(T1, T2) struct Pair_##T1##_##T2 But i also get an error when define macro is used more than once.
Hello , I write C and people call me insane . Am I doing it wrong ? Grouping allocations is definitely helpful . Any strategy to make sure those malloc & free calls are balanced !
Is it possible to reserve 1 tb ram ( virtual memory address space) size arena in cpp on windows system. With malloc i am limited by size of physical ram plus swap space. How to do it right. Edit: I was able to reserve 127 TB virtual address space with virtual alloc on windows 11 pro. I read the docs, the limit is 128 TB per user space process.
I wouldn’t say that you have to free memory all at once. When you have a segment of memory you want to “free” you can add that memory to a free list. Essentially if you free memory you will almost immediately end up reusing that block of memory in the arena by popping it off the free list. This is really only if you want to deallocate in the middle of an arena. If you want whatever you’re allocating to have a group lifetime then you wouldn’t do this. Just a way to add more flexibility to an arena. Pretty good video though.
The problem isn't *finding* the dot, it's splitting the string at the dot. After finding the dot if you need a separate string representing just the filename, you'd have to replace the dot with a null terminator, which will break the string representing the entire filename with extension
Am I mistaken in that arenas still need a backing memory buffer, which you'd need to either have in static memory or still allocate on the heap, the latter requiring either using malloc anyway or using a system specific function?
You will call malloc, but it will be only once for all the small objects that you will use, so it can be faster than several separate calls to malloc. It also helps to reduce memory fragmentation, as you can allocate arenas of a fixed size, regardless of the size of your smaller objects. And finally, you also reduce the number of times you call free(), which can also be costly and problematic. This helps both with memory management and performance, as you free all the objects from that arena at once, at one place. It's not ideal for every case, but it's very useful for the use cases he mentioned in the video.
I read K&R and I know little bit of C, now i hear about zig and odin and wonder if they are a better version C, or should I learn C and then move on to zig
it didnt watch the whole video yet , but gotta agree with the relativly small library, especilay string.h. at this point i acully decided to write my own stdlib like library (but focusing on adding better string functions and advances data types like linked list)
I think you just reinvented objec? 😂 If you put fucntion and data pointer in arena.dealocating would be object.destory() , routine for allocating it is creator ?
There is a weird way to do strings, but it works: struct string { size_t len; char data[1]; } Now if you want to allocate a string, you can do the following: #define STR_LENGTH 5 struct string* greeting = (struct string *)malloc(sizeof(struct string) + STR_LENGTH); /* NULL terminate end of string */ greeting->len = STR_LENGTH; greeting->data[STR_LENGTH] = 0; /* TODO: Fill in data with your actual string */ Don't forget to free that memory with free(greeting) and create a macro template for this special kind of string ;)
This also seems to have the same issue of needing new allocations for simple "views" into the string, since the count and string data is right next to each other.
If it's a view you want, you'd just need a tag struct such that: size_t len; const char* const data; Now we made it clear that we can't change either the characters in the data or the pointer to the character array so we can init the data struct in an initializer list instead of direct assignments. Can't believe that's the monstrosity we need for simple "views".
@@SimGunther Right. The point of the string struct in the video was to not have to have two separate structures for regular strings and views :). Quite helpful in many many cases I've found
I'm only learning C for CS50x and to help with learning C++. Though I might come back to this video in the future, I'm gonna walk away for now. (Rn, I can't stop sneezing while this video is playing. Send help!!!)
Hello, bro. Would you be able to share the roadmap and the topics you have Learned to master C programming and graphics design using C ?. It would be so helpful for students like me 🙏🏼.
I don't have a strict roadmap per-se to be honest and I am still learning. For C I would recommend handmade hero's intro for starting off and just doing random projects in C to understand how to use it efficiently. For graphics learnopengl.com/ is an excellent resource for learning graphics programming, but more specifically OpenGL. Once you understand those, it's not hard to extrapolate to other graphics APIs. But an important tip is DON'T DO BOTH AT ONCE. Either learn C first, or learn graphics programming in your preferred language. Both topics have a lot of concepts you need to understand so mixing them together can be confusing.
N strings or counter based strings don't need a structure. You just store the size in the first byte. This limits string size to 255 but is simple and lightning fast.
This has the same problem as with nullterminated strings where you have to make a new allocation for strings that are already there. If you follow through with my example, you'd have to allocate for the extension string instead of filename string which is what the struct avoids
@@voxelrifts if you're receiving null terminate strings. You just offset by one to receive and send null terminate strings then using your own libraries manipulate everything using the n count. It's not the same as it adds another facet. In many ways it's like using a structure except without the pointers needed to manipulate as it's all in one character array. In fact you can chain them that way too.
@@stolenlaptop Firstly, I don't know what you mean by facet. Secondly I think you missed my point :) Circling back to the example I gave in the video, we had the full filename with extension as an allocated string and wanted strings that are just filename without extension and just extension without filename the problem with placing count right before the characters of the string is the same as having a null terminator at the end of the string. Infact having the count roght before the allocation is much worse because the count for filename without extension and filename with extension will have different sizes but would have to be stored in the same location if you want to refer to the same allocation. The structure solves this because you're keeping sizes on the stack itself rather than alongside the string data
@@voxelrifts I took it as, you can encode the length of the view in the lowest byte of the pointer, which is a more interesting approach. Doing so in the lowest byte isn't as good as your strings probably aren't 256 byte aligned and you can kiss goodbye having a different offset, but you could totally use the upper 24bits of the pointer as they are for the most part irrelevant in the userspace. Of course this necessitates having functions or at least a macro that would encode/decode the pointer for use.
What are your thoughts about GNU AutoGen? I think it's a good solution, if you are not in a position to roll your own metaprogramming helper. And a lot of the times macros/codegen aren't even necessary considering how far link-time optimizations have come. The compiler can, in a way, generate the code for you, if it sees it fit.
3:45, why do you need to "registry" the pointers? Once they are != NULL, you can presume they have memory allocated. 3:56, I guess you are trying to detach a,b,c from the struct, maybe throwing them in global space? This is terrible idea! 4:14, so this "arena" won't ever be used just for reading/writing, but only for allocations? 4:37, I don't use C anymore, so I don't recognize this 'string' type for it. But I didn't get in which way arena helped here? 6:58, just create a tiny f() for the extension: const char *get_ext (const char *FileNameAndExtNoDot) { return FileNameAndExtNoDot + strlen (FileNameAndExtNoDot) + 1; } //It'll point after the dot. So the printf would be: printf ("%s.%s", name, get_ext (name)); 9:33, it's a nice idea to not put the last ';' in a macro, to be forced by the compiler to put it on each call. It feels more like a normal cmd. And there's a bug: you can't put the last \, because the macro ended before it. I didn't find this ERROR macro(?). Anyway, you could use assert, from , instead of if, and you would be dismissed from having to write an error message. It would just be shown as a failure of that if-logic.
The pointers are not going in global space at all. They're going in an arena which has a lifetime that lasts from the init function to the free function. We get rid of the need to call free on those pointers manually. It might not be a big win just for random integers, but things for things like linked lists it is very helpful since you don't need to traverse the list freeing every node. Basically removing the dependency on the pointer for memory freeing About 4:37 it's a custom type, as you probably saw from later in the video About 6:58 yes in this case you can print it piecewise but it's not an actual allocated string anywhere which is necessary many times. ERROR was just a placeholder since I was showing pseudocode here. You're free to use whatever error handling you want!
@@voxelrifts But you still need to write at least 1 time a free f() for this arena. And if it deals with linked list, the same thing about traversing the nodes, right?
@@MrAbrazildo yes you free the arena. But if you allocate all the nodes in the arena, you don't have to iterate through the list and free all pointers one by one. Instead you free the entire arena and all the nodes allocated within it are automatically freed
@@voxelrifts If I would spend time making such thing, I would build a stack version of it: a struct carrying an array and several pointers, 1 for each part of it. And an internal control, compiled in a separated file.
As a 23 year old with no college education so far and no skills other a basic High School education. How would you recommend someone go about learning programming and Mathematics?
I don't think I'm qualified to give advice to people to be honest, but I think doing projects that interest you is the best way to learn programming. For both maths and programming, the only way one can get better is with practice :)
In practicality i think finding some in-person courses is the safest path to get into the business, but if you want to learn for free: programming is a huge field, it really boils down to "what" you want to program, i would say your first step is to identify what area you want to target, like web, back end, embedded, etc. and then i would recommend watching getting started videos on youtube, once you got the basics of your language i would recommend to work on and learn code etiquette (software development is about making clear and understandable code, not just code that works), then get a portfolio going and hope it is enough to get a job
@@ariabk Well not exactly, an arena is an arena if you free the memory all at once. If you want to pop off memory it wouldn't be an arena, but instead it'd be a stack allocator. Also I wouldn't call push/pop instructions "allocator functions" at all.
Thumbs up, because the Video itself is good but the sad truth is I never made something in Rust yet, but I program since 20 years, just randomly saw this and i want to say: If you know C already, then you shouldn't need this Video and if you don't, learn Rust.
I love C because it's not bloated and handcrafting everything sometimes feels fun. But I'm unable to build complex because I can't wrap my head around header inclusion organization. Setting makefiles is tedious and a pain in the ass to set up 3rd party library inclusion.
Arena and bump allocators are completely different. Bump allocator is just stack. Arena allocator can use any allocation strategy within itself, but it's point is that you destroy the whole block and all allocations within it in one call.
To be fair, these terms are not that rigidly defined, in fact you even used the term "stack" that implies a LIFO behavior thay is not the case with a Linear/Bump/Arena allocator.
Fur allocating bulk memory, in one talk speaker mentioned "game Level" önce you completed level you fine need anything, so free then all! But bad practice to show black screen and talk, please put some code
Instead of generating code within a function-like macro body, use generator headers intead. Much easier to read and maintain, and no need to spam \\\\ unti your head explodes. Example of generator header: #define _CAT3(a,b,c) a ## b ## c #define CAT3(a, b, c) _CAT3(a, b, c) #define FN(name) CAT3(GEN_ARG_NAME, _, name) #include struct GEN_ARG_NAME { GEN_ARG_TYPE* data; size_t size; } int FN(push)(MTL_ARG_TYPE *o, MTL_ARG_TYPE v) { ... } #undef _CAT3 #undef CAT3 #undef FN #undef GEN_ARG_NAME #undef GEN_ARG_TYPE Later you can instantiate the template by setting a few macros and including header, like this: #define GEN_ARG_NAME intvec #define GEN_ARG_TYPE int #include ... intvec iv = {0}; intvec_push(&iv, 42); .... Get the idea?
sorry, but instead of having to allocate 8 bytes for a "string", you could just know where the string ends by knowing where the extension is, and just extract one from it.
These things are basically the stuff i do all the time when i write in c, while being super good knowledge, most people would probably just use a garbage collected, probably interpreted language with a large set of standard libraries and call it a day
And that's completely fine! I just wanted to share these techniques because they're never really explained/told for some reason, and because they make a huge difference from my experience.
And, of course, everything you show here undoes the main benefit of C - it's low memory footprint. That string implementation adds 8 bytes to every string. So now "HELLO" goes from consuming 8 bytes (5 char, 1 null, 2 pad) to 16 bytes. You've doubled the memory footprint for no real benefit.
"No real benefit"? I did show in the video how that reduces the memory footprint effectively by you not having to make separate allocations per substring. Also 1 byte is not a lot of memory to "waste" if you even call this wastage.
C++ offers a lot of inconveniences disguised as language features, as an alternative to writing simple data oriented code design in c, where everything is clear and in front of you, and you know what is going on in terms of what is executed and how it gets optimized by the compiler, you are offered objects with automatic operators obfuscating code clarity, incomprehensible inheritance hierarchies obfuscating code clarity, runtime template related errors that is going to take you forever to solve, namespaces where you either have to choose between insanely long lines of code or risk not knowing what library a generic function name comes from. My point is that c does exactly what you want it to, is easy to write and read due to less syntax complexity and limits you in terms of features so that you cant write an overly complex solution to a simple problem. C++ is not inherently evil, it is just a language that has seen so much feature creep that understanding how to use all the tools efficiently is impossible.
I got tired of getting error messages from the compiler that looked like an ancient eldritch language. I'm mostly interested in making games (maybe with like a custom software renderer or something like that) and C's features have been just enough.
There's a few ways to do this. A) Assume a size, like 1 megabyte or something. No complications. B) Make arenas hold a linked list of malloc-ed chunks of memory. So if you exceed one chunk, you alloc a new one and use that. Or C) the one I use, using virtual memory. This allows me to VirtualAlloc a large chunk of memory to reserve that address space (like 1 gigabyte). This doesn't actually allocate that memory though. When space is required I commit part of that reserved memory which actually allocates it to my process like malloc does. This keeps things contiguous in address space and I don't have to worry about passing sizes around since 1 gigabyte is quite a bit of space. Of course, I have additional functions that can allocate an arena with a specified size but that is rarely required.
@@voxelrifts Ah, very cool, never realized I could do that. But is virtual allocation part of the standard library or is it Windows-specific? To put it another way, can I achieve "cross-platform" virtual allocation?
@@ghostbusterz It's not in the standard library no, but mmap is the way to do so in any posix compliant OS. so I just have helper functions for reserve, commit, decommit and free, which call either mmap/munmap for posix OSes, and VirtualAlloc/VirtualFree for windows.
// vec_MyType.h #define VecT MyType #include "vec.h" #undef VecT // vec_MyType.c #define VecT MyType #include "vec.c" #undef VecT And use VecT as the type in the vec code. At least we don't dig through macro compiler errors.
Some days, I just don't feel like writing code. Then a video like this pops up in my feed and I find myself back at the keyboard, refactoring something. Haha ... thanks for the C content!
Most days I don't feel like writing any code. It's boring and unnecessary
@@illegalsmirf Gotta take a break, sounds like burnout, brother.
Comment of the day after watching tons of videos. This is so real 👏👏
Great video!
For me, the fact that you need to make everything yourself in C is fun : )
I think this is exactly why C is one of my favorite languages. Reinventing some stuff keeps my mind active and sane when diving into languages with lots of abstractions. I think this makes me much more aware about performance trade-off's hidden in language or library goodies.
@@hbobenicio doing the same thing in very high level languages like Java is also pretty fun. It's thrilling using the cracks in the system to poke with a stick into the inner workings of an overcomplicated machine.
Also bringing Java code to segfault is really entertaining 😂
This is the second time that you’re making a video about exactly the same thing I’m working on. Thanks again for the videos, super helpful!
I don't write C any more (and never did any heavy lifting with it), but it's nice to see how development is evolving. I've always liked how much space C gives us to do things our own way (including how to screw things up royally, but I don't see the latter happening here!) I'll definitely give this another view. I'm in awe of the folks who really know how to do this stuff. C is still my favorite language; I never learned to hate it, even though I've had my share of awful bugs just like everyone else, but not as awful as the ones we get when we really know how to code. Now we've learned all about robustness. Cheers!
What do you write in nowadays?
@@BboyKeny Whatever gets it done fastest, and that tends to be Python. I'm not working on computationally large problems these days. There's a soft spot in my heart for SML (or maybe it's a soft spot in my head!) I wish I had time to learn something about Haskell, but life is short.
If you like C and want to get thing done fast like python, maybe give Go a try?
I have done, templates in C using macros. It's not pretty. Some of the reasons why are: confusing error messages, the fact that your code linter will have trouble telling you were errors are, junp to function definition will likely be broken, your code will be less readable, confusing formatting. That was the reason i ultimately switched to D.
You gained a sub. I use a couple of these tricks in my projects (except the arena allocator), and find them very useful. Thanks for documenting such neat tricks.
Lovely video. Your points completely explain why I switched to zig for nearly all of my performance critical code.
Thank you! I am a CS PhD student and I definitely learned something new: arenas/bump allocator with this video.
Arenas feel pretty useful to correctly free memory when raising exceptions during the function lifespan. We do not need to remember all the stuff we have to free when quitting the function early on.
I like to code in the Nim language, for its templates and meta programming features which are well easier to debug (and much more powerful).
There are plenty of features that I wish were there in C (default values, some automatic type inference for function callbacks, etc...) but the amount of optimisations in Clang/GCC, the reliability of debugging tools like Valgrind, the structural typing and the simplicity of the language (no absurd edge cases to learn like in JS) makes it an incredible language ;)
No offense: how did you get into a CS PhD program without ever having heard about arenas? What exactly are they teaching in schools? :/
@@pyrus2814 I studied CS and we didn't have to program at all. It's all academical nonsense.
Nice concise explanation of bump arena allocators.
I have a feeling you like Zig haha.
Also the explanation of how to use the pre processor to make templates is good.
I just wish there were a way to do that without using the pre processor or meta programming, but oh well
Haven't used zig at all actually! But have heard good stuff about it
@@voxelrifts same thing popped in my mind, sounds like how zig handles memory allocation, and meta programming can be done in the same sourcefile with a comptime keyword
The lack of templates is really what's dragging c down, and it's the main reason I use c++, only for the templates though, not for the OOP stuff. Great video! Arena allocators are wonderful.
C21 has a maxro to aid with that ...
I have 30 K total so far, templates are not good for C that is just your personal preference.
Yeah I'd much rather void* everything
I'm very glad that C doesn't have templates. They make debugging a mess, and are generally not fun to use in my experience.
@@captainfordo1 It's a compromise between static correctness and runtime correctness. Templates avoid type erasure with void*, which leads to needing to use the debugger more often, if you're lucky to find the error before you segfault. It's been proven time and time again that generics are the way to go (literally, even go has them), without generics you'll spend time reinventing the wheel or dealing with type erasure. I meant C having "templates" as a way to do generics, not specifically to have the same implementation as in C++, which is what you're saying is hard to debug.
Also it is not clear if you meant hard to debug as in "the compiler error messages when using templates are hard to understand", or if you literally mean debug template code, which are very different things. If you can't understand compiler errors then ask chatgpt, it's great help, better than having no compiler errors but a few bugs.
@@user-hk3ej4hk7mYou lose out on real performance benefits. There's a reason std::sort is way faster than qsort
Thank you SO MUCH, this is amazing and so helpful!! Arenas in special feel genuinely life-changing to learn about, thank you so much
7:30
Count-based strings have their benefits, but they come with tradeoffs. You're trading a 1-byte null terminator for 4-8 bytes of size for each string reference. It's a balance of pros and cons.
The "takes 3 bytes more" argument isn't that good though, now that we have gigabytes upon gigabytes of memory. There isn't any balance really, count based strings win at literally everything else I would say
@@voxelrifts In memory-constrained environments like Microcontrollers, wearables, and automotive systems, those 3-7 extra bytes per string reference can significantly impact resources & performance. It's crucial to choose the right tool for the job.
@@heapninja yeah, this video is definitely not going to be applicable for embedded systems. Arenas would also be unnecessary there, so would massive data structure templates.
That's the beauty of C. You can decide about those tradeoffs according to your specific constraints and needs.
You can also do the std::string approach and squeeze a short string directly into the size field if you find a good way to distinguish both cases.
For jank templates I prefer the header file variation where you #define an argument (usually the type but any template parameter) for the template and include the template header. The template header uses the template arg to generate a bunch of functions and struct, then undefines all it's args. This way you can debug the code as you would normally because the function isn't created in a macro expansion.
one problem i have faced with this method that i dont know if there exists any fix for it is that if you want to for example include the template multiple times in your program simply for the declarations to use in other headers, include guards wont save you from multiple inclusion because the generated code is not guarded. Which means that you'd have to make a custom implementation header file with your own guards for one or multiple template specializations of the header file.
If you know of any workarounds that avoid needing to make an extra file, i would be extremely thankful to hear of them.
@@tiranobanderas5655 There's an attribute you can use in GCC/clang for weak linkage. This basically makes the linker de-duplicate multiple instances of the same symbol across linked object files. This is basically what C++ does with template generated code (and is part of why C++ compilation is slow). Alternatively you can put prototypes in the header and ifdef guard the function implementations so that you can define a symbol to prevent the function implementations from being generated.
@@slayerxyz0 yeah, the second alternative is what i want to know how to do, but im not sure if im doing it wrong, because even if i make an implementation macro so that i can include only the definitions, the struct causes a redefinition of type problem.
for example, if i have a generic dynamic array template header which i would use like this:
//if i want to include the definitions as well
#define T float
#define IMPL
#include "template.h"
//if i only want the declarations
#define T float
#include "template.h"
then, inside of the template.h file i would have a struct defined to use the data type defined for T.
typedef struct {
T *data;
size_t len, cap;
} STRUCT_NAME;
then i would get a struct redefinition problem because the struct declaration itself is not within the ifdef guards.
Thank you man - this was super helpful for me
Really nice video! Got This In My Recommendation.
This video explains the ideas very clearly, thanks
Fantastic video! Great work
thank you gangsta, this is perfect for my project
Arena allocator is not necessarily a bump allocator. I've seen arena's implemented as a linked list of memory pages which works like this:
- does the allocation fit in the page? bump current page
- else create new page and allocate there
then the arena_free traverses the list of pages and frees them
supposedly you don't need this if you reserve (not allocate) a large chunk of virtual address space and put your allocations in there but I've yet to find a reference implementation online that actually does this
As I said, everyone seems to have different names. I heard many people call arenas bump allocators which I why I included them in the video. As for the large virtual memory reserve thing, that's exactly what I do for my arena implementation. I have 2 pointers instead of just one, a commit pos and an alloc pos
If you have two pointers (one for the filename and one for the extension) you can calculate the length of the filename just by using pointer arithmetics.
long long difference_in_elements = ptr2 - ptr1;
long long difference_in_bytes = (ptr2 - ptr1) * (long long)sizeof(*ptr1);
Note:
The difference between two pointers is not in bytes, but in elements.
Which amounts to the same thing for the char type on most systems. Because on most systems sizeof(char) is equal to 1.
This is wrong. The length of the filename is calculated like this:
ptrdiff_t len = ptr2 - ptr1;
The length is already in bytes.
@@thebatchicle3429
No, the difference between two pointers is not in bytes, but in the number of elements (of the corresponding type).
This is usually the type long long (signed) or int (signed). The result can also be negative.
int array[2] = { 0 };
int *ptr1 = array;
int *ptr2 = array + 1;
printf("%p %p %lld %lld %zu", (void*)ptr1, (void*)ptr2, (ptr2 - ptr1), (ptr1 - ptr2), sizeof(ptr2 - ptr1));
//Out: 000000000027FB9C 000000000027FBA0 1 -1 8
You must never implicitly assume that the size of a char corresponds to 1 byte. This may be the case in the vast majority of cases, but there are also systems on which this is not the case.
That's why my example said "difference_in_bytes" and not "difference_in_characters".
But your hint is still good, because in this context you would probably expect the difference in characters. Thanks for that, I will amend my comment accordingly.
@@thebatchicle3429
The difference between two pointers is not in bytes, but in elements.
That's why I wrote "difference_in_bytes" and not "difference_in_characters".
However, your hint is still good, as in this context you would expect the difference to be in characters and not in bytes. Thanks for that, I've amended my comment.
Cheers
The standard says that sizeof(char) is always 1. However, it does guarantee how many bits char is. On most systems, CHAR_BITS == 8, but some might be 16 for example.
FUCKING DELICIOUS VIDEO! JUST LOVE C CONTENT!!!!!!!!!!!!
THANK U, KEEP DOING IT
Great video mate, videos like these keep my spark up, thanks for this ❤
Nice, I am just now starting to lay out the basics of a c style compiled language I'm planning to implement. I was looking around what allocation strategies to support but couldn't decide.
In the past it was pretty easy either you garbage collect or use a malloc free system but nowadays there are so many niche competitors trying to revolutionize the game.
Thanks to this video I might consider making arenas my primary allocation strategy.
We will see ourselves again in 5 years when the language is mature enough to even work 😂
Nice! Good luck on your language. I would recommend having arenas for temporary or scoped dynamic allocations, and providing a malloc-like interface anyways because it is necessary in a few cases like dynamic arrays and such (unless you leverage virtual memory that is)
@@voxelrifts thanks.
I was planning on allowing access to malloc and free anyway for potential c interop. Dynamic datastructures should ideally be part of the standard but I will see.
I have some questions about how to use that string struct.
1. How do you print a string that's not null-terminated?
2. How do you get the length of the string? strlen (but that won't work if it's not null-terminated)? I assume you don't manually count characters.
Instead of using %s for printing a string, you can use %.*s which allows you to give it a size. I have a macro for str_expand(the_str), which just expands to `(int)the_str.size, the_str.str`.
so what I can do is simply do printf("%.*s", str_expand(my_string));
Length of the string is stored right within the struct so there's no need for strlen or counting.
If you mean how I convert a string literal to a string struct initially, then I use a macro called str_lit which uses the sizeof() operator which returns the size for the string including the null terminator, then subtracting 1 gives me length of the string.
github.com/PixelRifts/c-codebase/blob/master/source/base/str.h Lines 42 and 43
@@voxelrifts Exactly what I wanted to know. Thank you very much!
@@yogxoth1959 OK one thing I forgot to mention which is important, the str_lit macro only works on string literals, not char*s. If you want to convert a char* to a string type you have to use strlen. (This is because sizeof works differently specifically for string literals)
Some template functionality can be emulated in C with function pointer parameters. As an example I have abused this approach to essentially create 25 versions of the same function in a single TU and it has worked really well. If I used C++ templates it wouldn't save me any LOCs there and the syntax is simpler.
Function pointer sounds like an indirect call and indirect call doesn't speak performance, but don't worry. Compiler can and will optimize out hundreds of lines of code beyond comprehension if you give it a chance to do so.
The compiler (Tested with GCC) can be easily encouraged and even forced to generate several versions of the same function and inline functions specified as function pointers or even inline the entire thing as long as definitions of both functions are visible in a single TU. This will generate essentially the same code as respective C++ templates or macro templates. You trade some flexibility as function pointers can't replace everything, but you can still do alot with those. You also get strict type safety and the same debugging experience as with your regular code.
Very interested to see this if you have a godbolt link or something
"...well it's all pain and suffering" :-D :-D. Great video, thanks.
Great video, you have my subscription.
One metaprogramming option that I'm exploring is using python to generate the C code, and add that C code as a target to a makefile (so that every time I modify the python script, the C files get regenerated). Another option would be to use SCons as a build system, since it's already Python code it would integrate more seamlessly
i really like C, until i saw Zig, zig was a very refreshing view on low-level programming, by default, zig standard library uses by default allocators, c_allocator, heap_allocator, and many more..., you can use any of them, also i really like zig's syntax, because it's a mix between C and OOP, but in a C-style: you have structs, structs can have fields, they can have methods (which are stored within the type, and not the object). every files are more or less giant structs. the only thing i don't really like in zig, is that there is too much builtin functions. I really like the comptime keyword though. anyway, if you like(d) C, you'll surely like zig.
Sounds like some c++ features
The special advantage of zig's comptime when it comes to templating is that you can use it for things that otherwise would require a separate 'templating part' of the language (often an entire different language), like we have in C, C++ and Rust. In Zig, you use the same exact language both for the code you want to ultimately compile and for the templates. In fact, you use the same language even in your build script! There's no gnu-make, nmake, cmake, autotools, meson, and so on. There's only Zig.
Great video
Cool stuff
I just started learning C a week ago (I already work as a developer so I know how to code but never did anything serious with C), and this video literally answered the most important questions that I had regarding this language one by one, the youtube algorithm really nailed it this time.
You should keep in mind that different types in C have different alignments when allocating memory, and that even though your code will work with or without taking alignment into account, misaligned memory accesses can degrade performance. Awesome video tho
Awesome video:)
This is certainly an interesting technique, though I would suggest using pre-existing libraries when possible. For memory allocation, unless you absolutely need it to be as fast as possible, using a GC library for those that can't or don't want to manually manage memory, would be my recommendation. Hans Boehm wrote one that you might consider looking into. For string handling, it's a good idea to do the same, finding a library that suits your style but that stores the length.
All that said, data structures are the real crux for people new to the language. Most people either don't learn them properly or at all in school, and not just the implementation, but the selection of, can have a huge impact on performance and memory usage. There's plenty of videos that go over some of the vagaries of selection, but they're often too general purpose and truthfully, I often mix and match to make hybrid structures anyway. It'd be great if someone could make a video series on implementing some of the more obscure data structures and how to mix and match them for more effective design.
Why find a library if writing one yourself is just as fine?
@@voxelrifts Laziness.
@@anon_y_mousse lol
Also because I enjoy actually making stuff rather than laying bricks for hours
@@justadude8716 It doesn't really take long and does help to get better at understanding memory and other important things if you do do things yourself sooo ¯\_(ツ)_/¯
A lot of these are why I love using Rust. Explicit lifetime management, proper memory management, etc. all in a world where you don't "hope" that a library works as you expect.
My C++ learning was stagnant. In order to learn C/C++ in depth, I turned to learning Rust. Now after watching your video, I found that these best practices are used by default in Rust community. I think This is the benefit of learning a modern language
Rust is an entire language where a linter wouldve been sufficient lol
What I don't understand is how do you get a value from the arena? Especially if you have multiple things inside it - how do you keep track of all of the things inside the allocator to retrieve the correct one when you need it?
@@BiskitSlippers allocators are stand ins for stuff like malloc or mmap. You always store pointers to elements within them, they don't provide a retrieval mechanism for specific data for you.
@@voxelrifts Oh I think I understand now. So an allocator is really like your own personal area to store data and one you have a little more control over? But at the end of the day you need to keep track of your own data manually as you usually would?
@@BiskitSlippers correct!
@@voxelrifts Got it, thank you for helping me out 👍
I used pretty much the same approach for my data structures except that I used generic selection. Usually in C people would just use void pointers for the data and not bother with the macros, but I wanted to see what it's like. I like the type safety I get, and I imagine the compiler can do more optimizations knowing the type. I do wonder though, what if I use void pointer and generic selection based on type to get _some_ level of type safety and less generated code and potentially easier to debug. I need to experiment more.
i really like the idea of arena allocation, one thing though is after reading the articles you linked is that i still dont understand how youre allocating your arenas, if the arena is stored on the heap or the stack
edit: i misread your codebase; correct me if im wrong but you allocate an arena on the heap instead of the stack, so would it be wrong of me to just always allocate arenas on the heap just to save me a headache on managing the stack
also thank you for the great resources!
@@1nilusnilus Yes I am allocating arenas on the heap. For my codebase I use virtual memory allocation here so I don't run out. You could allocate arenas on the stack as well, the allocator startegy stays the same, you just have to be careful of stack overflows
Good video 👍 thank you
Thanks for the great video! What debuger are you using in that video, is it a VSCode extention?
I'm trying to implement memory arenas right now and I asked ChatGPT what it thought about my arena_alloc function. It mentioned that the function would probably not handle arbitrary structs very well, because of alignment. I don't think you mentioned that in this video. Is this something I have to worry about?
Yes, but alignment in the allocator is also not hard. Just round up required allocated space to a multiple of a power of 2! Like 8
The program everything search by void was said to be all written in C it is so fast at finding my files. I use blender a lot and model things and make cut files. C is used for robotics. Blender has Python. I am interested in writing code and want to understand what I am doing so I am starting with C
Could you provide some more direction towards metaprogramming in C?
Lets say I want to make something equivalent to
template
T vector_get(int index);
this function would return the data stored at index in a vector
So my metafile would generate vector_get multiple times for different data types?
and after having my metafile generate the types, I would use the appropriate type where I need it? Do I understand this correctly?
Correct, the metaprogram is nothing but an additional layer used to generate code, which basic old C can understand
How do you handle struct redefinition errors? I was trying to remake a std::pair in c, for C23 everything seems fine, for older versions i came up with additional define macro like this:
#define Pair_define(T1, T2) \
struct Pair_##T1##_##T2 { \
T1 first; \
T2 second; \
};
#define Pair(T1, T2) struct Pair_##T1##_##T2
But i also get an error when define macro is used more than once.
@@ban5176 yeah, you only want to call Pair_define once in some header
Hello , I write C and people call me insane . Am I doing it wrong ?
Grouping allocations is definitely helpful . Any strategy to make sure those malloc & free calls are balanced !
Is it possible to reserve 1 tb ram ( virtual memory address space) size arena in cpp on windows system. With malloc i am limited by size of physical ram plus swap space. How to do it right.
Edit: I was able to reserve 127 TB virtual address space with virtual alloc on windows 11 pro. I read the docs, the limit is 128 TB per user space process.
stb has a lot of useful header files for rading images, etc, you may find it useful
9:35 `void type##_slice_subslice(...);\`
"void"? Are you sure about that? [0]
9:50 `type##_slice_subslice(slice, idx)`
Are you sure about that? [1]
Yep, yep, that was a mistype
I wouldn’t say that you have to free memory all at once. When you have a segment of memory you want to “free” you can add that memory to a free list. Essentially if you free memory you will almost immediately end up reusing that block of memory in the arena by popping it off the free list. This is really only if you want to deallocate in the middle of an arena. If you want whatever you’re allocating to have a group lifetime then you wouldn’t do this. Just a way to add more flexibility to an arena. Pretty good video though.
Can't you just iterate through the filename comparing to the ascii code for the . ?
The problem isn't *finding* the dot, it's splitting the string at the dot. After finding the dot if you need a separate string representing just the filename, you'd have to replace the dot with a null terminator, which will break the string representing the entire filename with extension
Am I mistaken in that arenas still need a backing memory buffer, which you'd need to either have in static memory or still allocate on the heap, the latter requiring either using malloc anyway or using a system specific function?
You are correct, you need to allocate the backing buffer, either with malloc or with os specific calls like VirtualAlloc or mmap.
You will call malloc, but it will be only once for all the small objects that you will use, so it can be faster than several separate calls to malloc. It also helps to reduce memory fragmentation, as you can allocate arenas of a fixed size, regardless of the size of your smaller objects. And finally, you also reduce the number of times you call free(), which can also be costly and problematic. This helps both with memory management and performance, as you free all the objects from that arena at once, at one place. It's not ideal for every case, but it's very useful for the use cases he mentioned in the video.
Sounds like Rust with extra added steps
I read K&R and I know little bit of C, now i hear about zig and odin and wonder if they are a better version C, or should I learn C and then move on to zig
Is arena aligned? If no, then it's definitely useful for implementing most communication protocols
it didnt watch the whole video yet , but gotta agree with the relativly small library, especilay string.h. at this point i acully decided to write my own stdlib like library (but focusing on adding better string functions and advances data types like linked list)
I think you just reinvented objec? 😂 If you put fucntion and data pointer in arena.dealocating would be object.destory() , routine for allocating it is creator ?
There is a weird way to do strings, but it works:
struct string {
size_t len;
char data[1];
}
Now if you want to allocate a string, you can do the following:
#define STR_LENGTH 5
struct string* greeting = (struct string *)malloc(sizeof(struct string) + STR_LENGTH);
/* NULL terminate end of string */
greeting->len = STR_LENGTH;
greeting->data[STR_LENGTH] = 0;
/* TODO: Fill in data with your actual string */
Don't forget to free that memory with free(greeting) and create a macro template for this special kind of string ;)
This also seems to have the same issue of needing new allocations for simple "views" into the string, since the count and string data is right next to each other.
If it's a view you want, you'd just need a tag struct such that:
size_t len;
const char* const data;
Now we made it clear that we can't change either the characters in the data or the pointer to the character array so we can init the data struct in an initializer list instead of direct assignments. Can't believe that's the monstrosity we need for simple "views".
@@SimGunther Right. The point of the string struct in the video was to not have to have two separate structures for regular strings and views :). Quite helpful in many many cases I've found
I'm only learning C for CS50x and to help with learning C++. Though I might come back to this video in the future, I'm gonna walk away for now. (Rn, I can't stop sneezing while this video is playing. Send help!!!)
cool video
Hello, bro. Would you be able to share the roadmap and the topics you have Learned to master C programming and graphics design using C ?. It would be so helpful for students like me 🙏🏼.
I don't have a strict roadmap per-se to be honest and I am still learning.
For C I would recommend handmade hero's intro for starting off and just doing random projects in C to understand how to use it efficiently.
For graphics learnopengl.com/ is an excellent resource for learning graphics programming, but more specifically OpenGL. Once you understand those, it's not hard to extrapolate to other graphics APIs.
But an important tip is DON'T DO BOTH AT ONCE. Either learn C first, or learn graphics programming in your preferred language. Both topics have a lot of concepts you need to understand so mixing them together can be confusing.
@@voxelrifts Shout out Casey Muratori :)
Shouldn't you be passing a size to the arena? Seems like a small string is gona take diffrent sizes tha. The full thing
N strings or counter based strings don't need a structure. You just store the size in the first byte. This limits string size to 255 but is simple and lightning fast.
This has the same problem as with nullterminated strings where you have to make a new allocation for strings that are already there. If you follow through with my example, you'd have to allocate for the extension string instead of filename string which is what the struct avoids
@@voxelrifts if you're receiving null terminate strings. You just offset by one to receive and send null terminate strings then using your own libraries manipulate everything using the n count. It's not the same as it adds another facet. In many ways it's like using a structure except without the pointers needed to manipulate as it's all in one character array. In fact you can chain them that way too.
@@stolenlaptop Firstly, I don't know what you mean by facet. Secondly I think you missed my point :)
Circling back to the example I gave in the video, we had the full filename with extension as an allocated string and wanted strings that are just filename without extension and just extension without filename the problem with placing count right before the characters of the string is the same as having a null terminator at the end of the string. Infact having the count roght before the allocation is much worse because the count for filename without extension and filename with extension will have different sizes but would have to be stored in the same location if you want to refer to the same allocation. The structure solves this because you're keeping sizes on the stack itself rather than alongside the string data
@@voxelrifts I took it as, you can encode the length of the view in the lowest byte of the pointer, which is a more interesting approach. Doing so in the lowest byte isn't as good as your strings probably aren't 256 byte aligned and you can kiss goodbye having a different offset, but you could totally use the upper 24bits of the pointer as they are for the most part irrelevant in the userspace. Of course this necessitates having functions or at least a macro that would encode/decode the pointer for use.
thx mate
What are your thoughts about GNU AutoGen? I think it's a good solution, if you are not in a position to roll your own metaprogramming helper.
And a lot of the times macros/codegen aren't even necessary considering how far link-time optimizations have come. The compiler can, in a way, generate the code for you, if it sees it fit.
I haven't heard of this actually, I should take a look. I used metadesk for making the tablegen program which was quite easy to set up and work with
3:45, why do you need to "registry" the pointers? Once they are != NULL, you can presume they have memory allocated. 3:56, I guess you are trying to detach a,b,c from the struct, maybe throwing them in global space? This is terrible idea!
4:14, so this "arena" won't ever be used just for reading/writing, but only for allocations?
4:37, I don't use C anymore, so I don't recognize this 'string' type for it. But I didn't get in which way arena helped here?
6:58, just create a tiny f() for the extension:
const char *get_ext (const char *FileNameAndExtNoDot)
{ return FileNameAndExtNoDot + strlen (FileNameAndExtNoDot) + 1; } //It'll point after the dot.
So the printf would be:
printf ("%s.%s", name, get_ext (name));
9:33, it's a nice idea to not put the last ';' in a macro, to be forced by the compiler to put it on each call. It feels more like a normal cmd.
And there's a bug: you can't put the last \, because the macro ended before it.
I didn't find this ERROR macro(?). Anyway, you could use assert, from , instead of if, and you would be dismissed from having to write an error message. It would just be shown as a failure of that if-logic.
The pointers are not going in global space at all. They're going in an arena which has a lifetime that lasts from the init function to the free function. We get rid of the need to call free on those pointers manually. It might not be a big win just for random integers, but things for things like linked lists it is very helpful since you don't need to traverse the list freeing every node. Basically removing the dependency on the pointer for memory freeing
About 4:37 it's a custom type, as you probably saw from later in the video
About 6:58 yes in this case you can print it piecewise but it's not an actual allocated string anywhere which is necessary many times.
ERROR was just a placeholder since I was showing pseudocode here. You're free to use whatever error handling you want!
@@voxelrifts But you still need to write at least 1 time a free f() for this arena. And if it deals with linked list, the same thing about traversing the nodes, right?
@@MrAbrazildo yes you free the arena. But if you allocate all the nodes in the arena, you don't have to iterate through the list and free all pointers one by one. Instead you free the entire arena and all the nodes allocated within it are automatically freed
@@voxelrifts If I would spend time making such thing, I would build a stack version of it: a struct carrying an array and several pointers, 1 for each part of it. And an internal control, compiled in a separated file.
@@MrAbrazildo I don't think I get what you're describing
quick question, what was that link about the metaprogramming?
Shoot I forgot to add that link to the description. I'll add it now, but here you go: www.rfleury.com/p/table-driven-code-generation
@@voxelrifts Thanks, great video.
As a 23 year old with no college education so far and no skills other a basic High School education.
How would you recommend someone go about learning programming and Mathematics?
I don't think I'm qualified to give advice to people to be honest, but I think doing projects that interest you is the best way to learn programming. For both maths and programming, the only way one can get better is with practice :)
@@voxelrifts Thank you for taking the time to write this.
@@voxelrifts will be looking for your content in the future. Keep up the good work.
In practicality i think finding some in-person courses is the safest path to get into the business, but if you want to learn for free: programming is a huge field, it really boils down to "what" you want to program, i would say your first step is to identify what area you want to target, like web, back end, embedded, etc. and then i would recommend watching getting started videos on youtube, once you got the basics of your language i would recommend to work on and learn code etiquette (software development is about making clear and understandable code, not just code that works), then get a portfolio going and hope it is enough to get a job
Macro templates are really wacky.
I prefer to use void* and size_t, even if it gets really hard.
That big macro for templates is *extremely* scuffed. Just use void ptr or use inheritance
Is a linear allocator… just a stack?
Can be implemented as one easily yeah!
@@voxelrifts wait i dont get it. is the normal rsp thing with a push instruction that most CPUs give you by default an arena allocator?
@@ariabk Well not exactly, an arena is an arena if you free the memory all at once. If you want to pop off memory it wouldn't be an arena, but instead it'd be a stack allocator. Also I wouldn't call push/pop instructions "allocator functions" at all.
zig my friends zig
ugly language
zig indeed
lol, the problem with zig is that u only get what LLVM generates for u
hopefully, it will change in the near future
@@airbus5717yeah well, better than whatever this is
Thumbs up, because the Video itself is good but the sad truth is I never made something in Rust yet, but I program since 20 years, just randomly saw this and i want to say:
If you know C already, then you shouldn't need this Video and if you don't, learn Rust.
I love C because it's not bloated and handcrafting everything sometimes feels fun. But I'm unable to build complex because I can't wrap my head around header inclusion organization. Setting makefiles is tedious and a pain in the ass to set up 3rd party library inclusion.
You might want to watch my latest video about compilation and how that whole system works!
super cool channel, but only thing that is not cool is having atom editor(made in javascript) logo.
Lol, it is probably time to change it
Arena and bump allocators are completely different. Bump allocator is just stack. Arena allocator can use any allocation strategy within itself, but it's point is that you destroy the whole block and all allocations within it in one call.
To be fair, these terms are not that rigidly defined, in fact you even used the term "stack" that implies a LIFO behavior thay is not the case with a Linear/Bump/Arena allocator.
Fur allocating bulk memory, in one talk speaker mentioned "game Level" önce you completed level you fine need anything, so free then all!
But bad practice to show black screen and talk, please put some code
Leave it to a C programmer to reinvent what's already implemented and solved in C++
That's actually the biggest problem with C++
It solves problems by adding more features into it
Any plans on re-recording this with a decent mic?
Instead of generating code within a function-like macro body, use generator headers intead. Much easier to read and maintain, and no need to spam \\\\ unti your head explodes. Example of generator header:
#define _CAT3(a,b,c) a ## b ## c
#define CAT3(a, b, c) _CAT3(a, b, c)
#define FN(name) CAT3(GEN_ARG_NAME, _, name)
#include
struct GEN_ARG_NAME {
GEN_ARG_TYPE* data;
size_t size;
}
int FN(push)(MTL_ARG_TYPE *o, MTL_ARG_TYPE v) { ... }
#undef _CAT3
#undef CAT3
#undef FN
#undef GEN_ARG_NAME
#undef GEN_ARG_TYPE
Later you can instantiate the template by setting a few macros and including header, like this:
#define GEN_ARG_NAME intvec
#define GEN_ARG_TYPE int
#include
...
intvec iv = {0};
intvec_push(&iv, 42);
....
Get the idea?
sorry, but instead of having to allocate 8 bytes for a "string", you could just know where the string ends by knowing where the extension is, and just extract one from it.
How do you extract part of a string without allocation? You *have* to allocate it somewhere else for null termination to work
@@voxelrifts you dont
@@seethekek4647 uhh you're talking about null terminated strings correct?
@@voxelrifts yes
@@seethekek4647 then I need more explanation here because I'm pretty sure it's not possible to not have to allocate the substring separately
What a wonderful nightmare
These things are basically the stuff i do all the time when i write in c, while being super good knowledge, most people would probably just use a garbage collected, probably interpreted language with a large set of standard libraries and call it a day
And that's completely fine! I just wanted to share these techniques because they're never really explained/told for some reason, and because they make a huge difference from my experience.
"...not having a good standard library" sure... Even languages aiming to replace C, like Rust, still have to call down to C's malloc() lol
Using macros withou ()😮
You lost me at the #define. I'm not sure what it is about them, but C/C++ Makros just make me giving up on these languages every time.
And, of course, everything you show here undoes the main benefit of C - it's low memory footprint. That string implementation adds 8 bytes to every string. So now "HELLO" goes from consuming 8 bytes (5 char, 1 null, 2 pad) to 16 bytes. You've doubled the memory footprint for no real benefit.
"No real benefit"? I did show in the video how that reduces the memory footprint effectively by you not having to make separate allocations per substring. Also 1 byte is not a lot of memory to "waste" if you even call this wastage.
Standard question for all C users: why aren't you using C++?
Compile times and too many features that don't need to be there. The features that I do need, I can write in C so I don't really mind
because C works good enough to not bother with C++
C++ offers a lot of inconveniences disguised as language features, as an alternative to writing simple data oriented code design in c, where everything is clear and in front of you, and you know what is going on in terms of what is executed and how it gets optimized by the compiler, you are offered objects with automatic operators obfuscating code clarity, incomprehensible inheritance hierarchies obfuscating code clarity, runtime template related errors that is going to take you forever to solve, namespaces where you either have to choose between insanely long lines of code or risk not knowing what library a generic function name comes from.
My point is that c does exactly what you want it to, is easy to write and read due to less syntax complexity and limits you in terms of features so that you cant write an overly complex solution to a simple problem.
C++ is not inherently evil, it is just a language that has seen so much feature creep that understanding how to use all the tools efficiently is impossible.
Guys, guys! It is a trap... oh no.
I got tired of getting error messages from the compiler that looked like an ancient eldritch language. I'm mostly interested in making games (maybe with like a custom software renderer or something like that) and C's features have been just enough.
Why is a size not specified when making an arena? How is that even possible since malloc requires a size?
There's a few ways to do this.
A) Assume a size, like 1 megabyte or something. No complications.
B) Make arenas hold a linked list of malloc-ed chunks of memory. So if you exceed one chunk, you alloc a new one and use that.
Or C) the one I use, using virtual memory. This allows me to VirtualAlloc a large chunk of memory to reserve that address space (like 1 gigabyte). This doesn't actually allocate that memory though. When space is required I commit part of that reserved memory which actually allocates it to my process like malloc does. This keeps things contiguous in address space and I don't have to worry about passing sizes around since 1 gigabyte is quite a bit of space. Of course, I have additional functions that can allocate an arena with a specified size but that is rarely required.
@@voxelrifts Ah, very cool, never realized I could do that. But is virtual allocation part of the standard library or is it Windows-specific? To put it another way, can I achieve "cross-platform" virtual allocation?
@@ghostbusterz It's not in the standard library no, but mmap is the way to do so in any posix compliant OS. so I just have helper functions for reserve, commit, decommit and free, which call either mmap/munmap for posix OSes, and VirtualAlloc/VirtualFree for windows.
@@voxelrifts Got it. I use SDL2 so I imagine it's very likely it has something of the sort.
// vec_MyType.h
#define VecT MyType
#include "vec.h"
#undef VecT
// vec_MyType.c
#define VecT MyType
#include "vec.c"
#undef VecT
And use VecT as the type in the vec code. At least we don't dig through macro compiler errors.