1 Handmade Linux x86 executables: ELF header

แชร์
ฝัง
  • เผยแพร่เมื่อ 17 ธ.ค. 2024

ความคิดเห็น • 62

  • @xKaihatsu
    @xKaihatsu 3 ปีที่แล้ว +45

    For anyone wanting to create an ELF-64 executable that just exits, here is how I did it.
    ; ELF Header
    7F 45 4C 46 ; magic number
    02 ; ELF-64
    01 ; little endian
    01 ; ELF version
    00 ; System V ABI
    00 ; ABI version
    00 00 00 00 00 00 00 ; unused bytes
    02 00 ; executable object file
    3E 00 ; x86-64 (AMD 64)
    01 00 00 00 ; ELF version
    78 00 40 00 00 00 00 00 ; entry point
    40 00 00 00 00 00 00 00 ; program header offset
    00 00 00 00 00 00 00 00 ; section header table offset
    00 00 00 00 ; flags
    40 00 ; ELF header size
    38 00 ; program header entry size
    01 00 ; program header entry count
    00 00 ; section header table entry size
    00 00 ; section header table entry count
    00 00 ; string table index
    ; Program Header (.text = 0x400000 compared to 0x8048000 on x86)
    01 00 00 00 ; loadable program
    05 00 00 00 ; permissions (read & execute flags)
    78 00 00 00 00 00 00 00 ; program offset (ELF header size + this program header size)
    78 00 40 00 00 00 00 00 ; program virtual address (0x400000 + offset)
    00 00 00 00 00 00 00 00 ; physical address (irrelevant for x86-64)
    10 00 00 00 00 00 00 00 ; file size (just count the bytes for your machine instructions)
    10 00 00 00 00 00 00 00 ; memory size (if this is greater than file size, then it zeros out the extra memory)
    00 10 00 00 00 00 00 00 ; alignment
    ; Program
    ; Entry = 0x400078
    48 C7 C0 3C 00 00 00 ; mov rax, 60
    48 C7 C7 2A 00 00 00 ; mov rdi, 42
    0F 05 ; syscall (the newer syscall instruction for x86-64 int 0x80 on x86)
    Use this assembler whenever you get stuck trying to decode instruction opcodes: defuse.ca/online-x86-assembler.htm
    I'll admit that using the 64 bit registers were unnecessary because of the zero extension property of the 32 bit registers, I want to get used to encoding them though.

    • @davidsmith7791
      @davidsmith7791  3 ปีที่แล้ว +6

      Very nice! It runs. The file command reports a corrupted section header size in the binary. You can deal with this by changing the section header entry size (e_shentsize) from 00 00 to 40 00.

    • @Cowboy8625
      @Cowboy8625 2 ปีที่แล้ว

      That disassembler is priceless for learning this! That was an extreme help! Thanks for sharing!

  • @i4xa
    @i4xa 3 ปีที่แล้ว +68

    This explanation is amazing. I watched the whole video and only afterwards I realized it has 66 views atm. This deserves much more attention.

    • @mattiviljanen8109
      @mattiviljanen8109 3 ปีที่แล้ว +1

      I'm watching this now with 5555 views.

    • @godnyx117
      @godnyx117 2 ปีที่แล้ว

      15k now but still to few for this gem!

  • @FoxLivestreams
    @FoxLivestreams 3 ปีที่แล้ว +28

    Ok man, you are hella underrated.
    1. Descriptive
    2. Great word flow
    3. Great video flow
    4. Interesting content
    Keep it up!

  • @cj_ayho
    @cj_ayho 3 ปีที่แล้ว +22

    maybe gcc binary will be smaller if you strip it?

    • @davidsmith7791
      @davidsmith7791  3 ปีที่แล้ว

      Brian Raiter goes down that path in this nice lecture: www.muppetlabs.com/~breadbox/software/tiny/techtalk.html

  • @lucianoosinaga2980
    @lucianoosinaga2980 3 ปีที่แล้ว +13

    this reminds me of that old meme picture of a guy programming with a keyboard with just two big 1 and 0 keys lol Thanks for this video, please do more of these!

  • @1vader
    @1vader 3 ปีที่แล้ว +25

    Awesome video! Though you should be able to shave off a few more bytes from the instructions by using xor etc. instead of loading numerical constants. But I guess that might be a little bit harder to understand and those 5 or so bytes probably don't matter.
    For those interested, I think this is the shortest x86 assembly to set eax=1 and ebx=0:
    0: 31 c0 xor eax, eax
    2: 40 inc eax
    3: 31 db xor ebx, ebx
    "xor reg, reg" is a classic trick to zero a register in very few bytes (since it doesn't require a 4 byte intermediate constant zero) and it also used to be one of the quickest ways. I'd be surprised if it's actually faster than "mov reg, 0" on modern hardware but compilers still use it when you do something like "int a = 0" in C code.

    • @davidsmith7791
      @davidsmith7791  3 ปีที่แล้ว +8

      Yes, good point. As you say, I am keeping our instruction set small for the benefit of beginners.

  • @ushiocheng
    @ushiocheng 3 ปีที่แล้ว +16

    When someone goes to write pure mach in 2021, I would call you the master 👍

    • @daveshouldaine2520
      @daveshouldaine2520 3 ปีที่แล้ว +2

      sorry for probably stupid question,. but what does "mach" means in this case?

    • @ushiocheng
      @ushiocheng 3 ปีที่แล้ว +3

      @@daveshouldaine2520 Abbrivation of machine, referring to machine code, which is what he is writing

    • @daveshouldaine2520
      @daveshouldaine2520 3 ปีที่แล้ว +2

      @@ushiocheng thank you very much!

  • @AnimeLover-su7jh
    @AnimeLover-su7jh 2 ปีที่แล้ว +2

    All I need to know, is the fact you read the technical specifications, and provided the links for them. That is enough to give a link a sub.

  • @pauloconci4196
    @pauloconci4196 3 ปีที่แล้ว +42

    When assembly is too bloated lol

    • @gSys1337
      @gSys1337 5 หลายเดือนก่อน +1

      Assembly isn't bloated. Compiler are bloated

  • @Jennn
    @Jennn 2 ปีที่แล้ว +1

    Holy Crap. You made this so easy to understand and straight forward. My youtube algorithm changed, and is showing me 1 new channel per 8 feeds and I try to visit the new ones as much as possible and got lucky again today. Yay World~!

  • @deniismailov1782
    @deniismailov1782 3 ปีที่แล้ว +7

    Great work! Keep it up man!

  • @AJMansfield1
    @AJMansfield1 3 ปีที่แล้ว +8

    This is really cool! Also, there's at least 16, possibly 20 more bytes you can shave off that I've spotted:
    First, you can save 6 bytes by using more compact instruction opcodes in the program. Instead of the five-byte "mov eax, 1;" you can instead do "xor eax, eax; inc eax;" (with those instructions encoding as `31 c0` and `ff c0` respectively) and then likewise instead of the five-byte "mov ebx, 0;" you can use just "xor ebx, ebx;" (encoded as `31 db`). Note that xoring a register with itself is actually the preferred way to zero a register in x86 and those opcodes normally execute in zero cycles because internally they actually just trigger a register rename.
    Next, you can trim off another 8 bytes by packing two of those opcodes into the e_ident[E_PAD] region. Place the `31 c0 ff c0` starting at offset 0A, then add an additional two-byte relative jump encoded as `eb 49` to jump to the remaining `31 db cd 80` back in the "normal" program area at offset 59. (Adjust your entry point field and all that to match of course.)
    Two more bytes can then come off by aliasing e_shstrndx over top of the first two bytes of p_type - just set those offsets so that those regions overlap; AFAIK the segment name index doesn't actually do anything at runtime and can be whatever value you want.
    And I haven't spotted precisely where yet, but it's probably possible to save four more bytes if you can find another existing unused bit of header area to pack the last two opcodes `31 db cd 80` into. That way, you could have an ELF file with actually _no_ dedicated program section at all, just by reusing existing unused header regions. Maybe p_paddr could be used this way?

    • @protonjinx
      @protonjinx 3 ปีที่แล้ว +3

      xor ebx, ebx
      lea 1(ebx), eax

    • @AJMansfield1
      @AJMansfield1 3 ปีที่แล้ว +4

      ​@@protonjinx Oh, nice, that shaves off one more byte, now the entire program can fit into the padding region without needing to splice it with a jump!

  • @rathnec
    @rathnec 3 ปีที่แล้ว +1

    I liked your approach of knowing it to the core!!

  • @2005kpboy
    @2005kpboy 3 ปีที่แล้ว +1

    That's a brave, creative and novel attempt...

  • @jolex_nerd8132
    @jolex_nerd8132 7 หลายเดือนก่อน

    if you want to save some bytes,
    instead of:
    mov eax, 00000000h
    mov ebx, 00000001h
    int 80
    you could use:
    mov eax, 00000000h
    mov bl, 01h
    int 80
    wich just saves some bytes of immediate load operands, since return codes are modulo 255 anyway.

  • @triularity
    @triularity 3 ปีที่แล้ว +1

    I remember back, a few decades ago, I used a graphical text editor (which could mostly edit existing binary content without corrupting it, aside from adding a newline to the end of file) to merge several static GIF files together and create an animated GIF. I think I also used awk to generate custom binary bytes to copy/paste into the text editor. Luckily, the GIF format didn't seem to care about that erroneous newline tacked on the end of file.

  • @coolandsmartrr
    @coolandsmartrr 3 ปีที่แล้ว +2

    Looking forward to the next video!

  • @t74devkw
    @t74devkw 3 ปีที่แล้ว +2

    Damn you're underrated

  • @leonhrad
    @leonhrad 4 ปีที่แล้ว +2

    really cool video :)

  •  3 ปีที่แล้ว +6

    I only make executables by hand, just like my grandpa. /s

  • @isaackay5887
    @isaackay5887 3 ปีที่แล้ว +3

    I feel like I just *_watched_*_ a _*_StackOverflow_*_ explanation_

  • @alik250
    @alik250 3 ปีที่แล้ว +2

    This was so cool

  • @Borodinskyy
    @Borodinskyy 10 หลายเดือนก่อน

    i have been wanting to do something for a while, first time i have found what i was looking for since most times i search for it i just get assembly stuff

  • @Dude29
    @Dude29 3 ปีที่แล้ว

    Very well done!

  • @happygimp0
    @happygimp0 3 ปีที่แล้ว

    You can edit binary files with bvi, no need to use xxd

  • @jacquesquipere
    @jacquesquipere 2 ปีที่แล้ว

    Ok but gcc hello.c fails immediately with studio.h: No such file or directory.

  • @TheJackal917
    @TheJackal917 3 ปีที่แล้ว +3

    I never understood a word, but I think your vids are helpful, especially for someone like, who considers to move from Windows. Thanks! I wish you many subs.

    • @davidhusicka8440
      @davidhusicka8440 3 ปีที่แล้ว

      This explains how "exe" files work internally on Linux. How is this useful to someone who wants to switch?

    • @TheJackal917
      @TheJackal917 3 ปีที่แล้ว

      @@davidhusicka8440 oh boi. Now how can I explain.obvious things?

    • @totally_not_a_bot
      @totally_not_a_bot 3 ปีที่แล้ว +1

      I mean, this video shows that vi is a thing? And some basic shell scripting? Other than that, that you can kinda do whatever you feel like on Linux without much restriction provided you have the knowledge. So yeah, Linux is nifty. If you're particularly attached to any Windows-exclusive software I'd give it a pass, but otherwise, load up a virtual machine and give it a spin. You might like it.

    • @TheJackal917
      @TheJackal917 3 ปีที่แล้ว

      @@totally_not_a_bot games, man. Games. But even more so, it's privacy and security.

  • @ryanhaart
    @ryanhaart 3 ปีที่แล้ว +2

    How to link in libraries?

  • @623-x7b
    @623-x7b 3 ปีที่แล้ว +1

    It would be amazing if you guys covered ultimate doom's .wad file format - I want to make a random level generator for Doom but lack the skills and the time.

  • @Jennn
    @Jennn 2 ปีที่แล้ว

    Thank you boys

  • @rathnec
    @rathnec 3 ปีที่แล้ว +1

    wow!!

  • @neilmeich
    @neilmeich 10 หลายเดือนก่อน

    nice

  • @mikolajkozakiewicz1070
    @mikolajkozakiewicz1070 3 ปีที่แล้ว

    🥰

  • @der.Schtefan
    @der.Schtefan 3 ปีที่แล้ว +4

    Don't you think that if somebody is interested in this video, he would know what a hex dump is?

    • @davidsmith7791
      @davidsmith7791  3 ปีที่แล้ว +2

      I meant for the discussion beginning around 0:30 to be enough definition of hex dump for our purpose.

    • @iwikal
      @iwikal 3 ปีที่แล้ว +1

      @@davidsmith7791 I think OP meant the opposite; anyone who would potentially be interested in this video probably already knows what a hex dump is. I disagree with the sentiment, though. I think it's great that you gave this brief explanation, on the off chance that someone doesn't know.

    • @davidsmith7791
      @davidsmith7791  3 ปีที่แล้ว +4

      @@iwikal Oh, you are right. Thanks. I hope this video series is accessible even to those who know only a little about programming.

    • @aylen7062
      @aylen7062 3 ปีที่แล้ว +1

      I'm interested in this video and a beginner who didn't know what it was. xD

    • @aylen7062
      @aylen7062 3 ปีที่แล้ว +2

      @@davidsmith7791 It was, for me. Thank you!

  • @albertvanderhorst4160
    @albertvanderhorst4160 ปีที่แล้ว +1

    In ciforth (lina/wina/xina) I use a slightly different approach. The Forth iscreated by an assembler (fasm/gas). I compile a program ( lina -c hello.frt) to an runnable binary that execute a word such as
    HELLO. That is an application of SAVE-SYSTEM, and I merely have to patch the header of the the original lina. So I avoid the generation of an elf header, leaving it to tools that knows how to do it. I don't do an analysis of all the fields, merely the fields I need. Fasm is ideal; contrary to gcc tools it doesn't generate a plethora of sections that may help dbg. The -c options takes merely 2 screens in the library, inclusive SAVE-SYSTEM.

    • @davidsmith7791
      @davidsmith7791  ปีที่แล้ว

      Thank you for the FASM recommendation. That's new to me.
      In order to patch an existing ELF header, you have to know which fields to change. It is easy to imagine that I fail to change some field properly; and then something goes wrong when my data space grows over 1 million bytes. Or perhaps I never encounter such a problem, but every time a mysterious bug arises, I wonder whether I have written a bad ELF header. It is freeing to see a Linux executable defined in terms of the sequence of bytes of the file rather than in terms of tools people often use. I never get a good answer when I ask the tools, "what exactly are you doing?"

  • @l2ubio
    @l2ubio 3 ปีที่แล้ว +1

    talking about compression...you compressed a lot of information in 11 minutes of video

  • @elementiro
    @elementiro 3 ปีที่แล้ว

    𝕤 𝕖 𝕧 𝕖 𝕟