Around 8:50 the mov eax, 0 before the call is actually not about main returning zero if you don’t specify anything, but actually part of the C abi. The functions here are written with nothing in the parenthesis as is normal in C++ but in C empty parenthesis does not mean no arguments, it actually declares a K&R style function with an unknown number of arguments of unknown types. The actually definition would need to specify args in order to use them, but callers could just write extern void whatever() to declare them since K&R function calls are not type checked. What the 0 specifically represents is the number of vector arguments (usually floating point values) passed to the function. Variadic functions need to know how many registers to save and so that value allows them to have an upper bound. If the empty parens were replaced with (void) the mov eax, 0 would go away even on -O0 and without making that change it will persist even at higher levels of optimization (assuming that the two functions are actually in two translation units so the function doesn’t get inlined)
Hah weird, I thought the compiler just makes sure if the function doesn't actually return something (aka write to eax), we still return a 0. This makes more sense.
Nice talk! However, lazy binding is disabled by default in modern Linux distributions as one of the attack mitigation techniques so that plt table is read-only during program execution. This is known as RELRO.
18:05 Actually the compiler does know which compute() function you are calling in this example, compute() was in the same file. In fact, even at -O1 it will remove the call and inline compute(). The compiler wouldn't know that if compute() was not available in the same translation unit.
jhe didn't explain why it says "call 21" (at 11:42) nor did he explain why a reolocation is necessary either at this point - he might explain it later, but it's super confusing he didn't explain WHY at the beginning, esp since 'compute()' is in the same translatino unit as 'main'
@6:21 Middle line on the right has "48 89 e5" which is the start of your compute function, bottom left has "b8 01 00 00 00" which moves 0 into eax, followed by 5d (pop ebp) and c3 (ret).
Boy this explains a lot of nitty-gritty details. We had an application that required several separate processes to have access to a large block of common memory (about 64kbytes). We did this by defining a large int-array in a shared object and initializing it to non-zero. This was 20 some years ago, it might not still work, I don't know. But by initializing it the array was put in the shared .data segment. So each process had access to the same large array and one process could 'see' what another process wrote. (yes, there were other concerns about collisions and such, but the gist of it was that the DLL and its .data segment where shared by all)
53:20 (on lazy-loading): I understand the concept as being partially preparing data when declared, and only loading the full extent when used (or not even then) - or, disguising accessing some data as actually fetching it first, meaning it is declared, you can reference it, but it's not actually there. Instead the instructions to get it there is. What you're describing sounds to me like caching - i.e. storing the output of some functionality in an easily accessible way so that you do not have to invoke said functionality again. But I might be off here (also I come from a very much not systems/compiler background so I completely understand if "lazy-loading" is the term used for it in your field). Side question: Would this mean that you could have runtime dynamic linking if you implemented cache invalidation for this step of the process? (i.e. be able to change bits of the machine code as stored on disk, which when the invalidation occurs, would take effect?)
Yes, lazy loading is a good way to describe this! I guess you could do some sort of runtime dynamic linking if you had some way of resetting the GOT to point back into the PLT stub and then convince the dynamic linker/loader to load something different next time. Provided that you have prepared GOT/PLT entries for everything during compilation. Depending on what you mean by runtime dynamic linking of course, I'm just replying very generally here. Note, btw, that we never change any *machine code* here, we only change data. It's just pointers in the GOT that are updated, from pointing at the stubs in the PLT to pointing at the real functions.
Thank you for the interesting talk! Question about the last chapter: will the loader copy ("load") the function from the shared object into the .got section? I am confused how the state (if the function has any) is differentiated between the processes using the same shared object.
What kind of state are you thinking of? If you're thinking of function arguments and local variables, these go on the stack or in registers, which are unique to each process and in fact each invocation in that process. If you're thinking of local static variables, these go in data sections like `.data`, which each process gets a unique copy of. Only the read-only segments are shared between processes.
This video is a gem!!!
50:14
-fpic = enforce memory limits on the size of the GOT.
-fPIC = no size limit for the GOT
Around 8:50 the mov eax, 0 before the call is actually not about main returning zero if you don’t specify anything, but actually part of the C abi. The functions here are written with nothing in the parenthesis as is normal in C++ but in C empty parenthesis does not mean no arguments, it actually declares a K&R style function with an unknown number of arguments of unknown types. The actually definition would need to specify args in order to use them, but callers could just write extern void whatever() to declare them since K&R function calls are not type checked. What the 0 specifically represents is the number of vector arguments (usually floating point values) passed to the function. Variadic functions need to know how many registers to save and so that value allows them to have an upper bound. If the empty parens were replaced with (void) the mov eax, 0 would go away even on -O0 and without making that change it will persist even at higher levels of optimization (assuming that the two functions are actually in two translation units so the function doesn’t get inlined)
Thanks for correcting that! I had forgotten about this difference between C and C++. I'm mostly writing C++, I guess it shows.:)
Hah weird, I thought the compiler just makes sure if the function doesn't actually return something (aka write to eax), we still return a 0. This makes more sense.
I learned so much from this talk,
Thank you!
The part about function calling near the end was the most interesting
I can also recommend James McNellis’ “Everything you wanted to know about DLLs” on this topic.
Excellent talk, the clearest I've seen on this topic!
Great to hear! Thank you for your comment.
Thank you! I'm very happy to hear that.
Nice talk! However, lazy binding is disabled by default in modern Linux distributions as one of the attack mitigation techniques so that plt table is read-only during program execution. This is known as RELRO.
Thanks, I didn't know that.
18:05 Actually the compiler does know which compute() function you are calling in this example, compute() was in the same file. In fact, even at -O1 it will remove the call and inline compute(). The compiler wouldn't know that if compute() was not available in the same translation unit.
Awesome talk, interested to know what, Use Link Time Code Generation, and other optimizations like COMDAT folding and /OPT:REF do
jhe didn't explain why it says "call 21" (at 11:42) nor did he explain why a reolocation is necessary either at this point - he might explain it later, but it's super confusing he didn't explain WHY at the beginning, esp since 'compute()' is in the same translatino unit as 'main'
@6:21 Middle line on the right has "48 89 e5" which is the start of your compute function, bottom left has "b8 01 00 00 00" which moves 0 into eax, followed by 5d (pop ebp) and c3 (ret).
Yeah, that's what I'm trying to point out at @7:25 too.
Boy this explains a lot of nitty-gritty details. We had an application that required several separate processes to have access to a large block of common memory (about 64kbytes). We did this by defining a large int-array in a shared object and initializing it to non-zero. This was 20 some years ago, it might not still work, I don't know. But by initializing it the array was put in the shared .data segment. So each process had access to the same large array and one process could 'see' what another process wrote. (yes, there were other concerns about collisions and such, but the gist of it was that the DLL and its .data segment where shared by all)
Well, I suppose you could put a lock on that shared memory to ensure concurrent integrity.
53:20 (on lazy-loading): I understand the concept as being partially preparing data when declared, and only loading the full extent when used (or not even then) - or, disguising accessing some data as actually fetching it first, meaning it is declared, you can reference it, but it's not actually there. Instead the instructions to get it there is.
What you're describing sounds to me like caching - i.e. storing the output of some functionality in an easily accessible way so that you do not have to invoke said functionality again. But I might be off here (also I come from a very much not systems/compiler background so I completely understand if "lazy-loading" is the term used for it in your field).
Side question: Would this mean that you could have runtime dynamic linking if you implemented cache invalidation for this step of the process? (i.e. be able to change bits of the machine code as stored on disk, which when the invalidation occurs, would take effect?)
Yes, lazy loading is a good way to describe this!
I guess you could do some sort of runtime dynamic linking if you had some way of resetting the GOT to point back into the PLT stub and then convince the dynamic linker/loader to load something different next time. Provided that you have prepared GOT/PLT entries for everything during compilation. Depending on what you mean by runtime dynamic linking of course, I'm just replying very generally here.
Note, btw, that we never change any *machine code* here, we only change data. It's just pointers in the GOT that are updated, from pointing at the stubs in the PLT to pointing at the real functions.
Thank you for the interesting talk!
Question about the last chapter: will the loader copy ("load") the function from the shared object into the .got section? I am confused how the state (if the function has any) is differentiated between the processes using the same shared object.
What kind of state are you thinking of? If you're thinking of function arguments and local variables, these go on the stack or in registers, which are unique to each process and in fact each invocation in that process. If you're thinking of local static variables, these go in data sections like `.data`, which each process gets a unique copy of. Only the read-only segments are shared between processes.