@@Conenion I've used unistd.h many times but never seen this one before. Although where I work we are quickly moving from C to modern C++ where using "new" doesn't make much sense since e.g. std::make_unique exists etc
@@metal571 > never seen this one before. Again, because they shouldn't be used, even when using C. See man sbrk on your Linux box and see the "NOTES" section.
The addresses returned by calling sbrk(0) immediately followed by sbrk(0x1000) should actually be exactly the same if no other procedures like printf are allocating memory behind the scenes. What sbrk returns is actually the previous break, not the new break as suggested in the video. The returned address of sbrk(0x1000) can be used as newly allocated memory. Then calling a sbrk(0) after sbrk(0x1000) would actually show the new break after it was incremented by 4096 (0x1000) bytes.
I've heard of writing your own memory allocators, but it never occurred to me that there is something below malloc, which uses heap pointers directly. Very interesting
The heap is not really separate - the underlying memory returned by a `malloc` call is itself allocated using `sbrk` or `mmap` (these days more often the latter). It's an abstraction layer on top of those system calls, and it also permits resize and deallocating, which is why it's more complicated and slower. A particular disadvantage of using `sbrk` is that it's not possible to reclaim that memory once allocated - it becomes a permanent part of the process's virtual memory size (VSZ). You should probably also mention `alloca`, which can allocate a variable amount of memory on the stack but only for the duration of the current function - it's automatically released when the function terminates when the stack frame is reverted to its pre-function-call state.
Yea, there's no way in hell modern allocators are using sbrk over mmap or better platform specific APIs to reserve uncommited page tables. I'd also question the heap notion. There is no such thing as a heap anymore. It's all just floating reserved heap objects controlled by platform runtimes, possibly wrapped with canary, and obscured with runtime heap aslr. We don't live in old POSIX land where you might find yourself with a defined map of memory blocks and you can only grow down and up until you collide (heap and stack position being defined by the platforms abi) anymore. This is BS. We can freely reserve and release pages (MmAllocateNonCachedMemory, alloc_pages) as well as reserve address space easily (mmap, file views/sections, ntallocatevirtualmemory, et al). Funny thing about the description, `We talk about how an ELF gets processed and loaded into memory, and how the memory is mapped between the user and kernel space`, he doesn't even mention binfmt.This dude stinks of larp. Whatever helps the guy boost his CV I guess?
@@FelixHdez I can't speak for them but this stuff you either pick up very slowly over time just programming in these environments (eventually if you code in C long enough you start to wonder how malloc works and start poking around source code and manpages) or through an actual course in topics related to "low level" and "systems" programming.
mmap actually returns MAP_FAILED (-1) on an unsuccessful allocation, so a NULL check won't catch it like with malloc. Learned that the hard way! Did not know about sbrk though.
@@williamdrum9899 null doesn't really exist, what its actually represented as depends on the language, in C its pretty much 0, in other languages it's some type that the compiler then turns into whatever it represents null as
Couple of questions. At the 5:00 mark: - The program break was incremented by 0x21000 (in the terminal output) and not by 0x1000 (as seen in line 11, in the editor). Why is that? - When you use that new allocated space as an array you start indexing at position 100, is there a particular reason why? BTW I really enjoy your videos, keep up the great work!
The printf() call is likely allocating memory behind the scenes. The sbrk() system call is documented as returning the _previous_ break value, not the position the break just got moved to. That way, if the break is increased(i.e. memory is allocated), the return value is a pointer into the newly available memory region. (If the adjustment is zero, then the previous break value is the same as the current value.) If no allocations are made in between the two calls, the sbrk(0) and the sbrk(0x1000) should actually return the same value. You can easily test this, by taking this code, removing anything that accesses the new memory, and changing the second sbrk() into another sbrk(0). You'll see that the break has adjusted, even when you didn't ask it to. But if you move BOTH sbrk() calls to BEFORE the printf() statements, they'll be the same.
Memory allocation was such a mystery to me in college, and this helps me get a better picture. Thanks! So, can you also say that the 3 ways for getting memory is from the Heap, Stack, and OS Virtual memory, respectively?
You have to differentiate between user land and what the kernel does to manage a process' memory. A process is a running program. A program is a file on disk, it is what the compiler + linker spits out. A running process "sees" only the user land. In an OS with memory mapping, a process has no idea where in memory the code of the program (called text segment) resides. The process "sees" only linear addresses starting from zero. Mapping is done using 4k large pages and managed by the OS, with support from the hardware, called memory management unit (MMU). This unit also has a translation lookaside buffer (TLB) which is a cache for recent translations of virtual to physical memory. Heap and stack are managed by the user land (support libraries like the glibc in Linux contain the code for malloc/free to manage the heap). To answer your question: "Getting memory" is calling malloc (or similar depending on your programming language), that calll is then handled by the glibc (in user land) making the appropriate calls to the OS which then does the memory management for the process including mapping from virtual to physical addresses.The stack is handled automagically. It simply grows from the highest address possible downwards in direction to the end of the heap.
Keep this going I don't use it but your explanations are really good so I just like to watch and learn how it works. And I really like to see more rust.
The topic is interesting. When I interview people I almost always ask them exactly this question. However I have one complaint as your material may be confusing to people - you seem to suggest that malloc is *separate* allocation method from sbrk/mmap which is not the case. Malloc is just a function which does one of those two syscalls under the hood (which one it will use depends on the size of the block you are requesting). If you would strace your malloc example you world see exactly that.
Yup! To get memory malloc definitely has to invoke MMAP to generate new arenas. I just wanted to get the point across about the different API's one has at their disposal, even if one relies on the other. Thanks for watching!
What kind of work do you do that this is relevant? I've done a bit of performance sensitive code, but I've never had to touch that, remotely. And considering what I've read from others, that memory allocated with sbrk is permanent, I think I'll leave the fingers of that anyways... It's also not ever going to be portable to Windows, probably also Mac. I would really think that in the scenarios, at least concerning performance, where malloc is not desirable, other approaches are probably superior, like preallocation, your own allocator for a manual memory pool, alloca, etc...
@@9SMTM6 I do all sorts of stuff related to low level Linux programming on the boundary of kernel and userspace. I didn't say that knowing how malloc is implemented is helpful in my work but many people who work in my area would know that simply because they are interested in the innerworkings of the unix-like operating systems. They know that because they were curious enough, or they learned that by stracing their programs, it doesn't matter. What matters is if the job candidate is interested in those areas, if he goes into details, or just takes everything for granted. I'm not saying you have to know how malloc works to be positively evaluated by me but it is a good starting point for a discussion and further questions. I couldn't care less about Windows and Mac, but you are wrong, Mac does supprt brk()/sbrk() calls. It has unix roots.
@@krzysztofadamski2884 frankly, while what I've learned from this video is not u interesting, it's not really what I've hoped for. Like yeah obviously malloc is using syscalls to request additional memory if it runs out. Duh. This video DOES describe different syscalls and how they behave, yeah. But what it doesn't do is explain WHY these calls behave like that, which is what the first part of the title implies it does, and what I was really looking for. It's just deffering to the OS. I know that there is some kind of virtual address translation, but how that works, and why it's designed the way it is is unclear to me. Why is the upper border 0x7FFF[..]? And not 0xFFF[..]? What's with the addresses between the stack and the heap? I'm pretty sure they are forbidden when not mapped manually, but noone says that explicitly? How the hell does an expansion of the bss section not fuck up every heap pointer? If you know of good material regarding that I'd welcome it.
Gemini 1.5 Pro: This video is about how a program gets more memory at runtime on Linux C. The video starts with explaining two regions of memory, kernel space and user space memory. User space memory is the region that a process can use during execution. When a program runs, it gets loaded into the user space memory. There are three sections that get loaded: text section, data section, and bss section. The question is then where does data come from if it's not loaded in from the beginning. The answer is that there are three locations we can get memory from: user space allocator, system break, and mmap system call. User space allocator is the easiest to use. The most common user space allocator is the glibc malloc. It creates a heap that grows upwards towards the stack. Calling malloc means asking for a certain amount of bytes from the allocator. The second way to get memory is the system break or break system calls. This method is more performant than the glibc allocator but it's less granular. By using the system break system call, we can actually increase the break value to create more room for us to put variables. The last way to get memory is the mmap system call. This is the most performant allocator but it's also the most complex to use. The mmap system call literally says hey kernel give me any memory you have access to. The kernel will decide where to put the memory. The good thing is that you can control the permissions of the memory that you get back. In conclusion, the video talks about three ways to get more memory at runtime on Linux C: user space allocator, system break, and mmap system call. User space allocator is the easiest to use but the least performant. System break is more performant than user space allocator but it's less granular. Mmap system call is the most performant but it's also the most complex to use.
If you want to access the memory pointed by the void ptr, you have to cast it to some type. How otherwise you access it as ints or structs or whatever?
@@gabiold yes but you don't have to do this cast *explicitly*. In C (contrary to C++), you can assign void pointer to any pointer type without a cast. That is why you don't have to (and should not) cast the return value of malloc() in C. You are correct that you can't access (dereference) the value of void pointer as compiler needs to know the type. But when you assign the pointer, the type will be known. In other words, it should be: int *myNewArray = firstEnd; The cast is not needed.
@@krzysztofadamski2884 You got me! I am using C for decades and didn't especially know that. 😱 I do embedded work in C though, malloc and pointers are evil on microcontrollers, so I do not really use them. The biggest annoyance by the way is when you assign the arithmetic result of some uint8 operands to an uint16 variable, and it is not casted automatically to uint16 before the operation, so it will give wrong result...
@@gabiold C in general is full of traps but I wouldn't call pointers "evil". In fact, the only part of C language I would consider evil are undefined and implementation-defined behaviours. The rules of automatic type promotions are also hard, though.
@@krzysztofadamski2884 Yes, I do not mean pointers are evil in general, I am confident with them on proper systems. But they are evil on small (8-bit or so) microcontrollers. They can't point anywhere, some architectures has pointers of 8 bit which can't point too far. Some compilers generate code for you with bigger pointers, but when you check the disassembly you'll see that it requires a ton of workaround code, because the arch has no direct support for what you wrote. Another problem is many microcontrollers have Harvard architecture, and a very little RAM, so compilers may place string literals (for example) as true constants in the .text section, not in initialized data. Thus, if you try to index it with a pointer made for the RAM, it will fail. There are RAM ptrs, ROM ptrs, far, near... These are quite implementation-defined behaviours on MCUs, and you can't even be sure whether your code will work with another compiler. (hint: it won't... usually...) I'd rather avoid pointer arithmetic altogether on 8-bit MCUs, Either I solve the problem in other way, or I explicitly program the series of instructions the arch. can do instead (for example sequentially read FLASH content from address, with the core instruction available for this purpose).
glibc malloc not just allocates memory, it does tons of optimisations to reduce the number of calls to mmap, avoid useless reallocations, etc.. that you don't want to deal with, so if you are working on a production build or just a large proram please use malloc, it will always be faster than mmap, the only case that I could think of using it is for memory-mapped files and if you want to reimplement your own allocator
It’s been well over a decade since I’ve used mmap but I think it’s possible to use it for Inter process communication. If you map the memory to a file, multiple processes can used the same file to share the memory. That’s about all I recall about the usage at this point. I wouldn’t even be surprised that I’m misremembering some other means of shared memory. I’m pretty sure I used mmap for that.
While I don't understand x86 / x64 memory dynamics, coming from ARM and how Acorn memory management worked, I have some understanding of memory management in a virtual memory space.
FYI: sbrk() seems to accept negative values to shrink the memory again. But this must be a nightmare to maintain when using multiple different types of allocators...
@@44r0n-9 The OS can't possibly handle that! Imagine allocator A increasing the system break by 1024 bytes. Then allocator B does the same. If allocator A now wants to release the 1024 bytes allocated by it, it can't do that without making sure allocator B already released it's section of memory. The best A can do is ignore the deallocation, which ultimately leaks memory. Hence: a nightmare to maintain. There is a reason sbrk() is a mostly legacy API.
@@roboticbrain2027 I would not go as far as calling sbrk a legacy API as it is much faster than MMAP and theese are the only ways to allocate memory in Linux apart from the stack. Last time I checked glibc sourcecode whenever malloc needed a new chunk to extend the heap it tried to actually do sbrk() first and it only resorted to mmap() if it had reached the point when it couldn't. So it is still a very iportant system call and it is still used internally, although indeed I don't see much reason to use it manually unless you want to make your own version of malloc() for some reason.
.data stores global variables not const vars. All global vars are initialized to 0 unless specified otherwise. Also between heap and stack there is an adress space(memory page) used for shared libraries. Stack grows to lower mem adresses(x86,x64) to the shared lib mem page, not the heap address space
If you really need you allocator, use glibc with it. If you really can't, use mmap and manage the memory, sbrk and brk makes thing a lot harder. But lets be real, you probably can get access to glibc that is compatible with your system. Use that instead. Don't make things harder for you. This level of "low level" computing stuff is implemented better by the professionals that spent half of their lifes onto this thing. Don't make your day a nightmare, use malloc
The same as you would from any other language - you call a syscall (brk or mmap) for that. Or, if you link to some library, you can call malloc function that will do that for you.
Memory manage by hand before you code rather than writing bloatware. As a novice, I created an algorithm that reduced the size of "the expert's" code by a factor of eight. He called my code inefficient, so I showed it had no difference in execution. He called my code confusing, so I added a page of comment in the source. He had my code removed because he was the expert, and I was the novice. No body cares.
Nice catch! Most likely this happened because printf call allocated some memory (0x20000 bytes) for the buffer using malloc internally and malloc uses (s)brk itself to fullfil the allocation. Run this program via strace and you will see more sbrk() syscalls than those explicitly called from the code.
Also his explanation of what sbrk does was wrong. From the man-pages: "On success, sbrk() returns the previous program break. (If the break was increased, then this value is a pointer to the start of the newly allocated memory)." What we see is the increase due to printf - NOT the increase from his call to sbrk. If you had void* ptr1 = sbrk(0); void* ptr2 = sbrk(1000) then both would point to the same address.
Iv heard some people say that Linux was bad, because it was a monolithic kernel? What do they mean by that? Does it mean that your drivers were supposed to run as a seperate kernel from your OS?
Yes, it means that a lot more stuff is part of the kernel (such as drivers). If one of these programs crashes, the entire kernel will crash, which is an issue. There's also microkernels, which do *a lot* more in userspace (so not in the kernel) meaning that if something crashes, the kernel will most likely keep running. The downside of running things in userspace is that is usually requires more syscalls to the kernel to do specific tasks, which can be slow.
First things first: Linux isn't bad. :-) It is a monolithic kernel, yes, because the entire kernel is one large executable in the sense, that everything in the kernel runs in kernel mode, i.e. in the same context, including privileges. So, for example, if a driver hangs in an endless loop, the whole system hangs (except I believe there is a watchdog mechanism which prevents this). As you can see from my explanation, Linux is a monolithic OS, even though it has modules (*.ko in /usr/lib/modules). This only is a feature to load/unload drivers instead of building one large executable by linking them all in. The micro kernel fanboys said this is bad, because micro kernels do better by only having a very small kernel actually running in kernel mode. Drivers (and everything else that does not need it) do run in a less privileged mode. Beginning with the 90ties the academic OS crowd was mainly in favor of micro kernels, believing no new OS should follow the, in their view, antiquated monolithic kernel approach. As it turns out, though, MK are much harder to develop and to debug. Also, they introduce a communication overhead, which can be problematic in a kernel which affects the entire system performance. Torvalds being pragmatic wanted to develop something that works, he wasn't interested in academia perceived beauty. See the "Tanenbaum vs Torvalds" debate. It even has its own Wikipedia page.
If you're running on an operating system, the stack pointer is part of the C "runtime" - i.e. the mechanics of how it works are assumed to be taken care of for you. There's not any way to mess about with the stack pointer in the standard library without going beyond C and into assembly. So basically doing stuff like that puts you into undefined behavior territory as far as the C standard goes. (Note the "running on an operating system" part. If you're in a freestanding environment like a bare metal microcontroller, all bets are off.) Some operating systems give you some additional tools that let you add additional stacks and jump between them without dropping to assembly. During university I remember implementing a simple coroutine-based consumer-producer program that used two stacks on a FreeBSD or Linux box using (I believe) sigaltstack() to create the second stack. But I mostly copied the stack creating code and that was years ago, so I don't remember the exact technique.
Yeah, I think it would be easier to manage memory with a couple of wrapper to both mmap and malloc (I say malloc because I don't like collisions in handling), I'm imagining something like the following: typedef struct _PAGE PAGE; struct _PAGE { size_t size; PAGE *prev; PAGE *next; void* (*palloc)( void *addr, size_t size ); } PAGE stdc_page = { 0, NULL, NULL, realloc ); void* palloc( void *addr, size_t size ) { PAGE* page = (PAGE*)(((PAGE**)addr) - 1); page = page->palloc( page, size + sizeof(PAGE) ); return page ? (void*)(((PAGE**)page)+1) : NULL; } Of course with a little more fault checking and declarations but the above is the simplest form I could think of to get across how I would map both types of functions into one
I don’t think your estimates of the different performance of malloc vs mmap vs sbrk are correct. Yes, sbrk is probably the fastest. Aber all it’s doing not much more than asking the operating system move the border of mapped pages. But glibcs malloc is way more complicated than most people think. It’s not just managing some blocks of memories. It’s doing tons of optimizations, it’s filling gaps that came from freeing and it’s using the empty space as an inner linked list only between the free blocks to traverse the list faster when it’s searching for a fitting free block. For big allocations if even uses mmap internally
malloc() actually does even more! At some point it becomes impissible to extend the last segment even further as it hits some other already allocated virtual memory and brk() just returns -1. That is when malloc uses mmap to get more memory chunks and each time it calls mmap it may get a random place within the virtual adress space. So it has to manage multiple chunks of various sizes which may or may not be connected together. And it must try its best to squeeze a newly requested data sections into one of the chunks before trying to mmap a new chunk for the thing and some spare. And don't forget that mmap-ed chunks can be redized unless they get way too big or hit some other already allocated memory so it is beneficial to try also that first instead of just getting new chunks mmap-ed whenever needed. And it is best to get some spare memory for next time. The logic is not super complex but to do mallloc() efficiently really takes much more than you might expect at first glance.
It depends on the CPU and the hardware. The example he gave is most likely based on x86 but I wouldn't take the address ranges too literally. This was more of a general overview
If you have a function of that code you can take it's function pointer and pass that around to some other functions. However you can't get the original source code at runtime once compiled. The C compiler turns your C code into a binary file if native machine code. What you can do, is cast the function pointer into an char pointer and read some of that machine code. But keep in mind that that string isn't zero terminated and you can't know the length of your functions bytecode.
Technically yes, but it's not a good idea, since most modern machines use virtual memory and address space layout randomization, which means there's no guarantee that say for example &main = 0xDEADBEEF every time. This is something ypu can only really do on Game Boy or most game consoles made before the year 2000 or so
I think the closest you can get is printf("%p",&main); or something like that. As for how many bytes your function takes up, you pretty much need assembly for that. Pity there's no sizeof() for functions. Would be useful on embedded hardware
By the way, whichever method you use, you aren't REALLY allocating any memory in RAM. The real allocation in the real physical memory happens when you try to write to a memory page. When you try to write the first time to a newly created page, the MMU in the CPU will generate a page fault. After the page fault is processed you will have a real page (4kb) sitting inside of RAM.
Getting in the habit of typing comparisons to constants backwards is good because, in the event that you accidentally type = instead of ==, for example (0 = x) will throw a compiler error and force you to check the problem, where as (x = 0) is valid syntax that returns value, and you may miss it.
It's kind of missing the part I find really relevant, while repeating much that I already know. And actually the first part of the title is at the very least misleading regarding that. Yeah, of course there's some syscall behind additional memory, if the heap is running out. And it's neat to see how these syscalls behave differently. But what I'm wondering is related to WHY they're behaving like that. How do these system calls do it? I don't really know how this works, what the OS does. All I know there's some kind of memory address translation going on. I would've loved to see that explained, and based on that you could explain why the different syscalls behave the way they do.
I can't give you a complete and detailed answer, but before the Pentium processors the memory was accessed using absolute addresses, segment:offset pairs. Since then there is MMU (memory mapping unit), and segment registers inside the CPU are actually descriptor table indexes, and this descriptor table holds information where that memory block in the contiguous address space is. There is a related thing, TLB (translation lookaside buffer). It is a cache helping the virtual->physical memory address translation. Like I said, I have gaping holes in the detailed knowledge, I just started writing an operating system somewhere more then 20 years ago just for fun, and was very interested in the internals back then. Of course I never finished it, but at least it was bootable and it printed a welcome prompt. 😃
I really wish people would stop pronouncing abbreviations in a manner inconsistent with the word of which they're an abbreviation. For instance, "strcpy()" is not "stir copy" but rather "string copy". It's like when people pronounce "char" as though you're going to flame-broil some meat. It should be "care" as it is an abbreviation for character.
I can really visualize you sitting in a room with a dozen people talking about Care pointers and no one knowing wtf you're talking about lmao. I get the idea, but it's just not intuitive.
@@44r0n-9 Well, people could just be unlazy and say character in full, as that's what it's supposed to represent. Supposed because it doesn't actually, but that was the original intent.
well son, when a cpu loves some data in its cache very much...
Never seen brk or sbrk before after 7 years in the embedded world. Really interesting, thanks
You have probably never seen them because a) they belong to the Unix/Linux API (#include ) and b) they are not meant to be used by end users.
I have that I’ve seen sbrk only my second year in collage. I had to use it to implement malloc 😫
@@Conenion I've used unistd.h many times but never seen this one before. Although where I work we are quickly moving from C to modern C++ where using "new" doesn't make much sense since e.g. std::make_unique exists etc
@@metal571
> never seen this one before.
Again, because they shouldn't be used, even when using C. See
man sbrk
on your Linux box and see the "NOTES" section.
@@gabemcguire2463
👍
Very good task for students to learn an important topic. Very good teacher.
The addresses returned by calling sbrk(0) immediately followed by sbrk(0x1000) should actually be exactly the same if no other procedures like printf are allocating memory behind the scenes. What sbrk returns is actually the previous break, not the new break as suggested in the video. The returned address of sbrk(0x1000) can be used as newly allocated memory. Then calling a sbrk(0) after sbrk(0x1000) would actually show the new break after it was incremented by 4096 (0x1000) bytes.
Actually it is not guatenteed that printf does not change the program brake or used malloc
I've heard of writing your own memory allocators, but it never occurred to me that there is something below malloc, which uses heap pointers directly. Very interesting
Repeat after me: There is no such thing as free space
The heap is not really separate - the underlying memory returned by a `malloc` call is itself allocated using `sbrk` or `mmap` (these days more often the latter). It's an abstraction layer on top of those system calls, and it also permits resize and deallocating, which is why it's more complicated and slower. A particular disadvantage of using `sbrk` is that it's not possible to reclaim that memory once allocated - it becomes a permanent part of the process's virtual memory size (VSZ).
You should probably also mention `alloca`, which can allocate a variable amount of memory on the stack but only for the duration of the current function - it's automatically released when the function terminates when the stack frame is reverted to its pre-function-call state.
Yea, there's no way in hell modern allocators are using sbrk over mmap or better platform specific APIs to reserve uncommited page tables. I'd also question the heap notion. There is no such thing as a heap anymore. It's all just floating reserved heap objects controlled by platform runtimes, possibly wrapped with canary, and obscured with runtime heap aslr. We don't live in old POSIX land where you might find yourself with a defined map of memory blocks and you can only grow down and up until you collide (heap and stack position being defined by the platforms abi) anymore. This is BS. We can freely reserve and release pages (MmAllocateNonCachedMemory, alloc_pages) as well as reserve address space easily (mmap, file views/sections, ntallocatevirtualmemory, et al). Funny thing about the description, `We talk about how an ELF gets processed and loaded into memory, and how the memory is mapped between the user and kernel space`, he doesn't even mention binfmt.This dude stinks of larp. Whatever helps the guy boost his CV I guess?
@@reecesx why you gotta be like that
@@reecesx You stink of elitism.
@@reecesx How did you learn all of this
@@FelixHdez I can't speak for them but this stuff you either pick up very slowly over time just programming in these environments (eventually if you code in C long enough you start to wonder how malloc works and start poking around source code and manpages) or through an actual course in topics related to "low level" and "systems" programming.
mmap actually returns MAP_FAILED (-1) on an unsuccessful allocation, so a NULL check won't catch it like with malloc. Learned that the hard way!
Did not know about sbrk though.
I've always wondered what NULL really means (as an assembly programmer I habe no clue how it's actually stored)
@@williamdrum9899 in c it's just the number 0
@@williamdrum9899 null doesn't really exist, what its actually represented as depends on the language, in C its pretty much 0, in other languages it's some type that the compiler then turns into whatever it represents null as
@@williamdrum9899 I think it depend on which encode table was used, for most used Ascii and uft8, NULL is the value of 0x00
Awesome explanations. I never thought about using system calls to allocate memory.
Glad it was helpful!
Awesome video! Been a C programmer for 4 yrs and haven't ever heard of sbrk or mmap!
Couple of questions. At the 5:00 mark:
- The program break was incremented by 0x21000 (in the terminal output) and not by 0x1000 (as seen in line 11, in the editor). Why is that?
- When you use that new allocated space as an array you start indexing at position 100, is there a particular reason why?
BTW I really enjoy your videos, keep up the great work!
The printf() call is likely allocating memory behind the scenes.
The sbrk() system call is documented as returning the _previous_ break value, not the position the break just got moved to. That way, if the break is increased(i.e. memory is allocated), the return value is a pointer into the newly available memory region. (If the adjustment is zero, then the previous break value is the same as the current value.)
If no allocations are made in between the two calls, the sbrk(0) and the sbrk(0x1000) should actually return the same value.
You can easily test this, by taking this code, removing anything that accesses the new memory, and changing the second sbrk() into another sbrk(0). You'll see that the break has adjusted, even when you didn't ask it to.
But if you move BOTH sbrk() calls to BEFORE the printf() statements, they'll be the same.
Memory allocation was such a mystery to me in college, and this helps me get a better picture. Thanks!
So, can you also say that the 3 ways for getting memory is from the Heap, Stack, and OS Virtual memory, respectively?
OS Virtual memory encapsulates heap and stack to begin with. Heap and stack are just special cases of os virtual memory
You have to differentiate between user land and what the kernel does to manage a process' memory. A process is a running program. A program is a file on disk, it is what the compiler + linker spits out.
A running process "sees" only the user land. In an OS with memory mapping, a process has no idea where in memory the code of the program (called text segment) resides. The process "sees" only linear addresses starting from zero. Mapping is done using 4k large pages and managed by the OS, with support from the hardware, called memory management unit (MMU). This unit also has a translation lookaside buffer (TLB) which is a cache for recent translations of virtual to physical memory.
Heap and stack are managed by the user land (support libraries like the glibc in Linux contain the code for malloc/free to manage the heap).
To answer your question: "Getting memory" is calling malloc (or similar depending on your programming language), that calll is then handled by the glibc (in user land) making the appropriate calls to the OS which then does the memory management for the process including mapping from virtual to physical addresses.The stack is handled automagically. It simply grows from the highest address possible downwards in direction to the end of the heap.
Keep this going I don't use it but your explanations are really good so I just like to watch and learn how it works.
And I really like to see more rust.
Great to hear!
The topic is interesting. When I interview people I almost always ask them exactly this question.
However I have one complaint as your material may be confusing to people - you seem to suggest that malloc is *separate* allocation method from sbrk/mmap which is not the case. Malloc is just a function which does one of those two syscalls under the hood (which one it will use depends on the size of the block you are requesting). If you would strace your malloc example you world see exactly that.
Yup! To get memory malloc definitely has to invoke MMAP to generate new arenas. I just wanted to get the point across about the different API's one has at their disposal, even if one relies on the other. Thanks for watching!
@@LowLevelTV well it can (and will) use brk() in addition to mmap.
What kind of work do you do that this is relevant? I've done a bit of performance sensitive code, but I've never had to touch that, remotely.
And considering what I've read from others, that memory allocated with sbrk is permanent, I think I'll leave the fingers of that anyways... It's also not ever going to be portable to Windows, probably also Mac.
I would really think that in the scenarios, at least concerning performance, where malloc is not desirable, other approaches are probably superior, like preallocation, your own allocator for a manual memory pool, alloca, etc...
@@9SMTM6 I do all sorts of stuff related to low level Linux programming on the boundary of kernel and userspace. I didn't say that knowing how malloc is implemented is helpful in my work but many people who work in my area would know that simply because they are interested in the innerworkings of the unix-like operating systems. They know that because they were curious enough, or they learned that by stracing their programs, it doesn't matter. What matters is if the job candidate is interested in those areas, if he goes into details, or just takes everything for granted.
I'm not saying you have to know how malloc works to be positively evaluated by me but it is a good starting point for a discussion and further questions.
I couldn't care less about Windows and Mac, but you are wrong, Mac does supprt brk()/sbrk() calls. It has unix roots.
@@krzysztofadamski2884 frankly, while what I've learned from this video is not u interesting, it's not really what I've hoped for.
Like yeah obviously malloc is using syscalls to request additional memory if it runs out. Duh.
This video DOES describe different syscalls and how they behave, yeah.
But what it doesn't do is explain WHY these calls behave like that, which is what the first part of the title implies it does, and what I was really looking for. It's just deffering to the OS.
I know that there is some kind of virtual address translation, but how that works, and why it's designed the way it is is unclear to me.
Why is the upper border 0x7FFF[..]? And not 0xFFF[..]? What's with the addresses between the stack and the heap? I'm pretty sure they are forbidden when not mapped manually, but noone says that explicitly? How the hell does an expansion of the bss section not fuck up every heap pointer?
If you know of good material regarding that I'd welcome it.
Just discovered your channel this week, it's my fav find of the year kudos
Gemini 1.5 Pro: This video is about how a program gets more memory at runtime on Linux C.
The video starts with explaining two regions of memory, kernel space and user space memory. User space memory is the region that a process can use during execution. When a program runs, it gets loaded into the user space memory. There are three sections that get loaded: text section, data section, and bss section. The question is then where does data come from if it's not loaded in from the beginning.
The answer is that there are three locations we can get memory from: user space allocator, system break, and mmap system call.
User space allocator is the easiest to use. The most common user space allocator is the glibc malloc. It creates a heap that grows upwards towards the stack. Calling malloc means asking for a certain amount of bytes from the allocator.
The second way to get memory is the system break or break system calls. This method is more performant than the glibc allocator but it's less granular. By using the system break system call, we can actually increase the break value to create more room for us to put variables.
The last way to get memory is the mmap system call. This is the most performant allocator but it's also the most complex to use. The mmap system call literally says hey kernel give me any memory you have access to. The kernel will decide where to put the memory. The good thing is that you can control the permissions of the memory that you get back.
In conclusion, the video talks about three ways to get more memory at runtime on Linux C: user space allocator, system break, and mmap system call. User space allocator is the easiest to use but the least performant. System break is more performant than user space allocator but it's less granular. Mmap system call is the most performant but it's also the most complex to use.
So the three types of allocation is:
malloc: the portable userland function
mmap: the weird syscall one
sbrk: you are not expected to understand this
BTW in C you should not explicitly cast the void pointer. It is not needed and may be dangerous as it turns compiler warnings off for this assignment.
If you want to access the memory pointed by the void ptr, you have to cast it to some type. How otherwise you access it as ints or structs or whatever?
@@gabiold yes but you don't have to do this cast *explicitly*. In C (contrary to C++), you can assign void pointer to any pointer type without a cast. That is why you don't have to (and should not) cast the return value of malloc() in C.
You are correct that you can't access (dereference) the value of void pointer as compiler needs to know the type. But when you assign the pointer, the type will be known.
In other words, it should be:
int *myNewArray = firstEnd;
The cast is not needed.
@@krzysztofadamski2884 You got me! I am using C for decades and didn't especially know that. 😱
I do embedded work in C though, malloc and pointers are evil on microcontrollers, so I do not really use them.
The biggest annoyance by the way is when you assign the arithmetic result of some uint8 operands to an uint16 variable, and it is not casted automatically to uint16 before the operation, so it will give wrong result...
@@gabiold C in general is full of traps but I wouldn't call pointers "evil". In fact, the only part of C language I would consider evil are undefined and implementation-defined behaviours.
The rules of automatic type promotions are also hard, though.
@@krzysztofadamski2884 Yes, I do not mean pointers are evil in general, I am confident with them on proper systems. But they are evil on small (8-bit or so) microcontrollers. They can't point anywhere, some architectures has pointers of 8 bit which can't point too far. Some compilers generate code for you with bigger pointers, but when you check the disassembly you'll see that it requires a ton of workaround code, because the arch has no direct support for what you wrote. Another problem is many microcontrollers have Harvard architecture, and a very little RAM, so compilers may place string literals (for example) as true constants in the .text section, not in initialized data. Thus, if you try to index it with a pointer made for the RAM, it will fail. There are RAM ptrs, ROM ptrs, far, near... These are quite implementation-defined behaviours on MCUs, and you can't even be sure whether your code will work with another compiler. (hint: it won't... usually...)
I'd rather avoid pointer arithmetic altogether on 8-bit MCUs, Either I solve the problem in other way, or I explicitly program the series of instructions the arch. can do instead (for example sequentially read FLASH content from address, with the core instruction available for this purpose).
glibc malloc not just allocates memory, it does tons of optimisations to reduce the number of calls to mmap, avoid useless reallocations, etc.. that you don't want to deal with, so if you are working on a production build or just a large proram please use malloc, it will always be faster than mmap, the only case that I could think of using it is for memory-mapped files and if you want to reimplement your own allocator
It is to much advance concept for me. But it is so interesting 🤔. Thanks❤️ for introduce this new concepts.
It’s been well over a decade since I’ve used mmap but I think it’s possible to use it for Inter process communication.
If you map the memory to a file, multiple processes can used the same file to share the memory.
That’s about all I recall about the usage at this point. I wouldn’t even be surprised that I’m misremembering some other means of shared memory. I’m pretty sure I used mmap for that.
Quality content as always. Thank you!
Thank you!
Thank you for the great content. Can I cast the new allocated space to a void pointer or would that create an error?
While I don't understand x86 / x64 memory dynamics, coming from ARM and how Acorn memory management worked, I have some understanding of memory management in a virtual memory space.
When ur kid asked where memory came from you missed an excellent opportunity to say something like
"Someone told me once, but I don't remember"
nice!I love this kind of tutorial video
FYI: sbrk() seems to accept negative values to shrink the memory again. But this must be a nightmare to maintain when using multiple different types of allocators...
That's all handled by the OS, absolutely no problem.
@@44r0n-9 The OS can't possibly handle that!
Imagine allocator A increasing the system break by 1024 bytes. Then allocator B does the same. If allocator A now wants to release the 1024 bytes allocated by it, it can't do that without making sure allocator B already released it's section of memory. The best A can do is ignore the deallocation, which ultimately leaks memory. Hence: a nightmare to maintain. There is a reason sbrk() is a mostly legacy API.
@@roboticbrain2027 I would not go as far as calling sbrk a legacy API as it is much faster than MMAP and theese are the only ways to allocate memory in Linux apart from the stack. Last time I checked glibc sourcecode whenever malloc needed a new chunk to extend the heap it tried to actually do sbrk() first and it only resorted to mmap() if it had reached the point when it couldn't. So it is still a very iportant system call and it is still used internally, although indeed I don't see much reason to use it manually unless you want to make your own version of malloc() for some reason.
.data stores global variables not const vars. All global vars are initialized to 0 unless specified otherwise. Also between heap and stack there is an adress space(memory page) used for shared libraries. Stack grows to lower mem adresses(x86,x64) to the shared lib mem page, not the heap address space
If global variable is set to zero, it will go to .bss
If you really need you allocator, use glibc with it. If you really can't, use mmap and manage the memory, sbrk and brk makes thing a lot harder.
But lets be real, you probably can get access to glibc that is compatible with your system. Use that instead. Don't make things harder for you. This level of "low level" computing stuff is implemented better by the professionals that spent half of their lifes onto this thing. Don't make your day a nightmare, use malloc
I'm used to NES which basically just says "You have 2 KB of RAM from 0x0000 to 0x07FF, it's your job to figure out how you want to use it"
Hi LLL, I am just a beginner but your code without checking the return values of malloc, printf and using strcpy instead of strncpy felt very weird!
Hi can you maker video explaining uefi and device tree
Cool stuff. Now how do we allocate memory in aarch64 assembly? :)))
Teehee ;D
The same as you would from any other language - you call a syscall (brk or mmap) for that. Or, if you link to some library, you can call malloc function that will do that for you.
Memory manage by hand before you code rather than writing bloatware. As a novice, I created an algorithm that reduced the size of "the expert's" code by a factor of eight. He called my code inefficient, so I showed it had no difference in execution. He called my code confusing, so I added a page of comment in the source. He had my code removed because he was the expert, and I was the novice. No body cares.
Please suggest some books for me to get a better understanding of os and memory and low level stuff
could you make a new video on arena allocators?
Strange there's no mention of mmap(2) being able to map files and devices, not just anonymous regions of memory.
Great video thanks 👍
Thanks for sharing
BTW mmap returns -1 on error, not NULL.
Such a cool channel
Arnt you possibly dereferencing a NULL pointer on line 13 if malloc fails for your first example?
Hi LLL... Why the first addr of sbrk is 0x...06000 and the last is 0x...27000? Shouldn't be the last addr equals to 0x...07000?
Nice catch! Most likely this happened because printf call allocated some memory (0x20000 bytes) for the buffer using malloc internally and malloc uses (s)brk itself to fullfil the allocation.
Run this program via strace and you will see more sbrk() syscalls than those explicitly called from the code.
@@krzysztofadamski2884 Wow! Thanks! I'll do it.
@@fusca14tube Yup!, What he said^ glibc allocated memory in the backend to make room for the first printf.
Also his explanation of what sbrk does was wrong.
From the man-pages: "On success, sbrk() returns the previous program break. (If the break was increased, then this value is a pointer to the start of the newly allocated memory)."
What we see is the increase due to printf - NOT the increase from his call to sbrk.
If you had
void* ptr1 = sbrk(0);
void* ptr2 = sbrk(1000)
then both would point to the same address.
@@ABaumstumpf Thanks!
Casual C programmer :
Memory comes from malloc
Casual C++/Java/C# programmer :
Memory comes from new
😂
very nice
Your sbrk example code seg faults when you try to assign to any element in the array.
if the memory allocated by sbrk is inside of ELF Does sbrk increase the size of the final executable?
Iv heard some people say that Linux was bad, because it was a monolithic kernel? What do they mean by that? Does it mean that your drivers were supposed to run as a seperate kernel from your OS?
Yes, it means that a lot more stuff is part of the kernel (such as drivers). If one of these programs crashes, the entire kernel will crash, which is an issue.
There's also microkernels, which do *a lot* more in userspace (so not in the kernel) meaning that if something crashes, the kernel will most likely keep running. The downside of running things in userspace is that is usually requires more syscalls to the kernel to do specific tasks, which can be slow.
First things first: Linux isn't bad. :-)
It is a monolithic kernel, yes, because the entire kernel is one large executable in the sense, that everything in the kernel runs in kernel mode, i.e. in the same context, including privileges. So, for example, if a driver hangs in an endless loop, the whole system hangs (except I believe there is a watchdog mechanism which prevents this). As you can see from my explanation, Linux is a monolithic OS, even though it has modules (*.ko in /usr/lib/modules). This only is a feature to load/unload drivers instead of building one large executable by linking them all in.
The micro kernel fanboys said this is bad, because micro kernels do better by only having a very small kernel actually running in kernel mode. Drivers (and everything else that does not need it) do run in a less privileged mode. Beginning with the 90ties the academic OS crowd was mainly in favor of micro kernels, believing no new OS should follow the, in their view, antiquated monolithic kernel approach. As it turns out, though, MK are much harder to develop and to debug. Also, they introduce a communication overhead, which can be problematic in a kernel which affects the entire system performance. Torvalds being pragmatic wanted to develop something that works, he wasn't interested in academia perceived beauty.
See the "Tanenbaum vs Torvalds" debate. It even has its own Wikipedia page.
what is stack memory? is something special about it? could you malloc some ram, put your stack pointer there, and never return?
If you're running on an operating system, the stack pointer is part of the C "runtime" - i.e. the mechanics of how it works are assumed to be taken care of for you. There's not any way to mess about with the stack pointer in the standard library without going beyond C and into assembly. So basically doing stuff like that puts you into undefined behavior territory as far as the C standard goes.
(Note the "running on an operating system" part. If you're in a freestanding environment like a bare metal microcontroller, all bets are off.)
Some operating systems give you some additional tools that let you add additional stacks and jump between them without dropping to assembly. During university I remember implementing a simple coroutine-based consumer-producer program that used two stacks on a FreeBSD or Linux box using (I believe) sigaltstack() to create the second stack. But I mostly copied the stack creating code and that was years ago, so I don't remember the exact technique.
When two nand gates love each other very much...
Yeah, I think it would be easier to manage memory with a couple of wrapper to both mmap and malloc (I say malloc because I don't like collisions in handling), I'm imagining something like the following:
typedef struct _PAGE PAGE;
struct _PAGE
{
size_t size;
PAGE *prev;
PAGE *next;
void* (*palloc)( void *addr, size_t size );
}
PAGE stdc_page = { 0, NULL, NULL, realloc );
void* palloc( void *addr, size_t size )
{
PAGE* page = (PAGE*)(((PAGE**)addr) - 1);
page = page->palloc( page, size + sizeof(PAGE) );
return page ? (void*)(((PAGE**)page)+1) : NULL;
}
Of course with a little more fault checking and declarations but the above is the simplest form I could think of to get across how I would map both types of functions into one
Which compiler do you use ?
Does c++ new use malloc behind the scenes?
Yes. In any case, they have to use the same memory management code, at some point, as C++ can call C code.
I don’t think your estimates of the different performance of malloc vs mmap vs sbrk are correct.
Yes, sbrk is probably the fastest. Aber all it’s doing not much more than asking the operating system move the border of mapped pages.
But glibcs malloc is way more complicated than most people think. It’s not just managing some blocks of memories. It’s doing tons of optimizations, it’s filling gaps that came from freeing and it’s using the empty space as an inner linked list only between the free blocks to traverse the list faster when it’s searching for a fitting free block. For big allocations if even uses mmap internally
malloc() actually does even more! At some point it becomes impissible to extend the last segment even further as it hits some other already allocated virtual memory and brk() just returns -1. That is when malloc uses mmap to get more memory chunks and each time it calls mmap it may get a random place within the virtual adress space. So it has to manage multiple chunks of various sizes which may or may not be connected together. And it must try its best to squeeze a newly requested data sections into one of the chunks before trying to mmap a new chunk for the thing and some spare. And don't forget that mmap-ed chunks can be redized unless they get way too big or hit some other already allocated memory so it is beneficial to try also that first instead of just getting new chunks mmap-ed whenever needed. And it is best to get some spare memory for next time. The logic is not super complex but to do mallloc() efficiently really takes much more than you might expect at first glance.
Why does the stack grow backwards?
sometime a pointer and malloc love each others a lot... so they get married and have a lot of memory buffers together...
Cool!
¿The memory addresses in the memory map and programs output correspond to x86 or they are not specific to any CPU?
It depends on the CPU and the hardware. The example he gave is most likely based on x86 but I wouldn't take the address ranges too literally. This was more of a general overview
The computer fairies
How can I tell where a line of my code is stored? Can i execute it from inside the program itself or pass it as a string to some function?
If you have a function of that code you can take it's function pointer and pass that around to some other functions. However you can't get the original source code at runtime once compiled. The C compiler turns your C code into a binary file if native machine code. What you can do, is cast the function pointer into an char pointer and read some of that machine code.
But keep in mind that that string isn't zero terminated and you can't know the length of your functions bytecode.
Technically yes, but it's not a good idea, since most modern machines use virtual memory and address space layout randomization, which means there's no guarantee that say for example &main = 0xDEADBEEF every time. This is something ypu can only really do on Game Boy or most game consoles made before the year 2000 or so
I think the closest you can get is printf("%p",&main); or something like that. As for how many bytes your function takes up, you pretty much need assembly for that. Pity there's no sizeof() for functions. Would be useful on embedded hardware
mmap only allocates multiples of pages
By the way, whichever method you use, you aren't REALLY allocating any memory in RAM. The real allocation in the real physical memory happens when you try to write to a memory page. When you try to write the first time to a newly created page, the MMU in the CPU will generate a page fault. After the page fault is processed you will have a real page (4kb) sitting inside of RAM.
Out of curiosity, why did you write "(NULL != myHeap)"? It's backwards.
Getting in the habit of typing comparisons to constants backwards is good because, in the event that you accidentally type = instead of ==, for example (0 = x) will throw a compiler error and force you to check the problem, where as (x = 0) is valid syntax that returns value, and you may miss it.
@@LowLevelTV interesting, and a good point.
ok but how does it work on windows?
TIL about sbrk/brk.
Thank you.
It's kind of missing the part I find really relevant, while repeating much that I already know.
And actually the first part of the title is at the very least misleading regarding that.
Yeah, of course there's some syscall behind additional memory, if the heap is running out.
And it's neat to see how these syscalls behave differently.
But what I'm wondering is related to WHY they're behaving like that. How do these system calls do it?
I don't really know how this works, what the OS does. All I know there's some kind of memory address translation going on.
I would've loved to see that explained, and based on that you could explain why the different syscalls behave the way they do.
I can't give you a complete and detailed answer, but before the Pentium processors the memory was accessed using absolute addresses, segment:offset pairs. Since then there is MMU (memory mapping unit), and segment registers inside the CPU are actually descriptor table indexes, and this descriptor table holds information where that memory block in the contiguous address space is. There is a related thing, TLB (translation lookaside buffer). It is a cache helping the virtual->physical memory address translation.
Like I said, I have gaping holes in the detailed knowledge, I just started writing an operating system somewhere more then 20 years ago just for fun, and was very interested in the internals back then. Of course I never finished it, but at least it was bootable and it printed a welcome prompt. 😃
FIRST
I knew it! Elfs had to be involved in this shit! Hah! And you scientists deny magic! (ROFL)
It’s not very useful todo a mmunmap in the example.
Evil syscalls 😂
I really wish people would stop pronouncing abbreviations in a manner inconsistent with the word of which they're an abbreviation. For instance, "strcpy()" is not "stir copy" but rather "string copy". It's like when people pronounce "char" as though you're going to flame-broil some meat. It should be "care" as it is an abbreviation for character.
I can really visualize you sitting in a room with a dozen people talking about Care pointers and no one knowing wtf you're talking about lmao.
I get the idea, but it's just not intuitive.
@@44r0n-9 Well, people could just be unlazy and say character in full, as that's what it's supposed to represent. Supposed because it doesn't actually, but that was the original intent.