How is a binary executable organized? Let's explore it (2014)

As I've said in other threads https://news.ycombinator.com/item?id=38847750#38862450, I highly recommend writing an ELF by hand at least once. It's a great exercise to understand the basic parts of an executable. It's also helpful if you want to go the opposite direction of this article - bottom-up instead of top-down.

Lots of other great discussion in various threads on that other HN post.

Writing an ELF file by hand is something I did recently: https://github.com/avik-das/garlic/blob/master/recursive/elf...

To explain the format to myself and others, I also created an interactive visualization for the bytes in the file. It helps me to click on a byte, see an explanation for it and see related bytes elsewhere in the file highlighted. https://scratchpad.avikdas.com/elf-explanation/elf-explanati...

That's an awesome interactive page! Did you write it by hand, or did you use some sort of generator/tool?

I agree it’s very nice. I’d also like to know how it was done.

Also if you click two bytes that are in the same caption group one after another then it bugs out.

Thanks for the feedback. I replied in a sibling comment about how I made it.

For the bug, feel free to email me at avik at avikdas dot com if you'd like. The behavior I verified just now (for me) is that if you click one byte to highlight it, then clicking any other byte in the same group will remove the highlighting.

I wrote it all by hand :)

Lately, I've been using Svelte for interactive visualizations (see my post on using a tool called Astro with Svelte: https://avikdas.com/2023/12/30/interactive-demos-using-astro... ). But this one is all hand-written JS!

You and your web site are huge inspirations for me.

Wow. Cool! I’m a CS teacher and definitely going to use this. Thanks for your work! (Anyone aware of a Windows or Mac equivalent?)

Wow! Very nice work. This is super educational resource. Again nice work.

I’ve had such fun making interactive educational visualisations like this. My life’s work is going into an interactive simulation of the USB protocol. Unfortunately I’m yet to bang it out over a weekend.

Similarly, I'd recommend writing a simple ELF loader. There's a fair bit of implementation complexity in dynamic linking, but if you only support static ELFs then it's straight-forward.

Yes, likewise I wrote a reader that simply tried to parse every bit of a complex ELF binary to report its structure and quickly found myself in poorly documented territory. It’s an education if you want it.

I assume there's a better modern source (assuming for some reason you don't want to reference libbfd &c, but really, you at least want to cross-check with it) but BITD there was an AT&T System V book - I think it was https://books.google.com/books?id=mrImAAAAMAAJ but only at about 70% confidence, it's part of a series and might have been one of the others - that was "arcane but true" for ELF on existing platforms (at the time, which was the mid 1990s, which is why I hope there's a better starting point now...)

I had written a static ELF loader for reasons, but when I was no longer to compile a static version of the binary I wanted to load, I found it wasn't too hard to load the system's dynamic loader instead. That's kind of the best of both worlds --- I can run a dynamicly linked binary, and I didn't have to do the linking and relocations.

I've seriously considered writing an ELF loader that uses a special symbol (like _resolve) where dynamic library resolving is done imperatively. The flexibility from libdl always feel underwhelming.

Any good resources on the matter? I'm gonna need to write a fully featured ELF loader for my language soon. I need to prepare.

Modifying existing ELFs can also be extremely educational and fun. It's a bit frustrating at first because it's more or less impossible to debug this stuff when it doesn't work but when things finally start working it's awesome. Turns out it's possible to patch ELFs in all sorts of interesting ways. With the auxiliary vector you can even have introspection at runtime: Linux gives us the address of the program header table and from there you can get to anywhere. Just gotta extend a LOAD segment to cover the whole binary.

For example I wrote tools to embed lisp modules and code right into my lisp's interpreter executable. The embedded segment is loaded from the ELF automatically, the interpreter just finds it and runs the code. I'm so proud of this little feature I wrote an article about it.

https://www.matheusmoreira.com/articles/self-contained-lone-...

Would be cool if mainstream languages adopted this method.

These tricks can be easily done simply and portably, without caring about the underlying executable format.

One trick is that you can reserve some global array in the executable, which is prefixed by a byte sequence that doesn't occur anywhere. A small utility can find that byte sequence and write custom data after it to create a customized executable.

I think that, also, many executable formats don't mind if something is appended to the executable. If the executable somehow knows its original size (you can write that size somewhere using the previous trick: no grotty executable format parsing required), it can open itself, seek to that offset and read the data.

I think this might be how CLISP creates an .exe file on Windows; I think it takes the base clisp.exe and combines it with the lispinit.mem image into one file.

Put the offset at the end! Famously, the table of contents of a .zip file are the end rather than the beginning, which has many useful properties (such as being able to patch the contents by only appending to an existing file). And you can concatenate an executable and a .zip and get a file which is both.

Yes. Cosmopolitan libc has support for exactly this. It contains a lot of platform-specific hacks in order to open the executable though. I went through the implementation.

I think the problem is this notion of a "memory image". It would be so much easier if the kernel just copied the entire file into memory and called it a day.

reserve some global array in the executable

This is neat but has a limitation in that it cannot be expanded after the program is compiled and linked. Resizing the array would invalidate all pointers that follow as well as render incorrect any code that takes its size.

This can be solved with a layer of indirection: just append the data to the executable and write its size and file offset in the array. That way the data block can be freely resized. That's the solution people told me to use and indeed the one that I usually see in existing repositories. The problem is you run into some additional complexity later which results in the loss of portability and thus the main reason to choose this method.

it can open itself

That's the crux of the issue. How does an executable open itself? That's where portability goes out the window. I've seen source code that opens argv[0] which is under the control of the parent program and therefore unreliable. I've seen code which opens /proc/self/exe which is Linux specific. I've seen code that calls Win32 API functions to get the path to the executable. All this just so it can open and load into memory a file which the kernel has already loaded, just so it can read some additional data off of it.

My solution sidesteps that question entirely. It just adds a LOAD segment for the embedded data which instructs the kernel to map it in automatically before the program even runs. There's no need to open, seek or read anything, it will already be there by the time execution begins.

The auxiliary vector contains a pointer to the program's segments table so it can reach the data from there. Then it's just a matter of walking this table looking for the custom descriptor segment. It's all done in a structured way, using the standard magic number locations and ranges. There's no chance of a magic number being recognized by mistake.

The only possible portability issue is the availability of the AT_PHDR, AT_PHENT and AT_PHNUM entries in the auxiliary vector. I'm not sure if they're standard. I know Linux has them and it's all I personally care about but if these entries do turn out to be standard then I can confidently say that my method is portable to any ELF-based operating system.

is ELF unique to x86/x64?

It is not, and it has never been.

ELF is platform agnostic, and has been used in operating systems on nearly every existing CPU platform since mid 90's (with a few notable exceptions being OS X, AIX, the embedded world and Windows).

Seconded.

Doing it is basically a hand assembly. One reads the documentation, selects the bytes needed using a processor data sheet, orders them into the various sections, populates the ELF fields and then it really does boil down to typing them all in.

Pre-ELF times, on say an 8 bit Apple 2, the machine code monitor, allowed input of the program bytes directly. Those are then executed.

Storing to disk is only a bit more involved, and there is another opportunity! Disk sector editors allow one to create a file...

...and so it goes!

Chris Wellons' "A Magnetized Needle and a Steady Hand," a piece on building an ELF executable from scratch:

https://nullprogram.com/blog/2016/11/17/

When the program starts running, you might think it starts at main. It doesn’t! It actually goes to _start. This does a bunch of Very Important Things that I don’t understand very well, including calling main. So I won’t explain them.

The way I understand it, the symbol main is a C-specific thing. The symbol _start is a language-agnostic entry point for the binary that will in this case call main.

A convention of i.e. calling the entry point _start with main's argc/argv would make the format a lot less flexible.

Things that I don’t understand very well, including calling main. So I won’t explain them.

It depends on the language runtime, but a common task will be initializing global non-0 statics. For languages like Rust/C/C++ you can also inject variables to be initialized via linker flags. Before start if the program is dynamically linked then I believe the linker runtime is run to resolve the links and then transfer control to _start.

Basically hacks on hacks on hacks added organically to offer extensibility and the hacks have enough social adoption and are good enough that we stick with them.

The style guide at both my previous and current employers explicitly forbids having global non-0 statics for this exact reason: code that runs before main() is very unusual. Many assumptions do not hold.

A far better way is to use function-local statics. A static variable inside a function is initialized when execution reaches that point when the function is being called. Furthermore, such initialization is thread safe so that one initialization happens despite multiple concurrent calls of the function.

The only exception to that style guide rule is the new constinit in C++20. It is sometimes called linker-initialized to make it even clearer that the program didn't do anything to initialize it, the linker did.

Furthermore, such initialization is thread safe so that one initialization happens despite multiple concurrent calls of the function.

IIRC, there are some popular compilers in which initialization of static variables inside functions is not thread safe (even though AFAIK the C++ standard said they should be).

I’m not aware of this problem in MSVC, clang and gcc and those are the most popular afaik.

I’m not aware of this problem in MSVC

The compiler I was thinking of was indeed MSVC. From a quick web search, it seems that more recent versions of MSVC have changed them to be thread safe by default, so if you can make sure that your code will never be compiled on older MSVC versions (and that nobody will ever use the compiler option which disables the thread-safe initialization), it might be fine to depend on it.

True as of msvc 2015 which is 8.5 years old at this point. I agree it’s weird that you can disable it but libraries retaining correctness in the face of random compiler flags is hard (eg ffast-math is a common one that can break your floating point library)

The issue with Meyers Singletons is that every time they're accessed a flag must be checked first. This is bad in hop loops.

It’s not as bad as you might think because the CPU should speculate through it pretty easily.

initializing global non-0 statics

Does that mean this doesn't work in a freestanding environment? Yet another reason to assiduously avoid global variables. I suppose that's why I never ran into this issue.

Basically hacks on hacks on hacks

And then there's this insanity right here:

https://blogs.oracle.com/solaris/post/init-and-fini-processi...

I like to imagine that I won't have to actually implement this when I write my ELF loader. Someone please tell me no modern software uses this.

Does that mean this doesn't work in a freestanding environment? Yet another reason to assiduously avoid global variables. I suppose that's why I never ran into this issue.

Why do you say that? A freestanding environment will still enter through _start and execute the compiler generated code to initialize globals before main. What I can’t recall is how compile time non-0 values are initialized - I think it could be part of the bss and initialized by the loader instead (but freestanding environments would implement that too as part of being a target for a language) but both them and runtime initialized globals initialized between _start and main would work.

Basically freestanding targets might not give you access to runtime APIs (eg POSIX) but the language is still the language and all features defined as language features should work and it’s the responsibility of the compiler and target environment to provide that guarantee.

Wow, great read. I worked on the Windows DLL loader and we had to implement similar mechanics for similar reasons. The PE image format makes some part of this a little easier, but the complexity is essentially the same.

Basically hacks on hacks on hacks added organically to offer extensibility and the hacks have enough social adoption and are good enough that we stick with them.

The more I learn about the deep depths of modern computing, the more I realize that they're actually full of inelegant legacy cruft.

Technically the name _start is not special either. The binary lists its entry point address in a header and that’s where the OS starts execution from. That symbol is just called _start by convention by C and other languages, which is what the linker uses to set the entry point when writing the ELF headers, but if you’re writing your own linker scripts you could call the entry point whatever you want.

to extend on it, _start is where .text begins and address of that is set by linker

The entry point can be anywhere in the .text section, and often won't be at the beginning of the section.

yes and then you'll have a bad time, but at the same time per convention _start is where .text begins. You can see where it starts with readelf --file-header <executable> and look at Entry point address field. You can change it, yes.

A common hack to reduce ELF size is actually to start the first section (possibly the .text) right on the elf header, as this circumvents the alignment requirements.

probably not even mandatory.. lots of /usr/bin stuff on my ubuntu machine have __libc_start_main only

No, it's not even a convention, _start is most commonly not where .text begins.

Compiling a static hello world binary on my system (aarch64 fedora 39, gcc -static hello.c -o hello), .text starts at 0x410080, e_entry is at 0x4103c0, and the _start symbol is also at 0x4103c0. This is not unusual at all.

Technically it doesn't even need to be in the .text section, it could be anywhere in the address space. You'll get a segfault if it's not somewhere executable though (assuming you're on a system with an appropriately configured MMU)

the symbol main is a C-specific thing

Absolutely. And only available on hosted C. Freestanding C lets you have any entry point you want.

The symbol _start is a language-agnostic entry point for the binary that will in this case call main.

That's just the linker's default. You can set it to a nicer symbol with -Wl,--entry="${symbol}" and GCC even supports setting it directly with no need for the unsightly -Wl.

Also, the entry point is actually a pointer, not a symbol. The linker just takes the address of the symbol you specify and sets the ELF entry point to that.

calling the entry point _start with main's argc/argv

In addition to argument count and argument vector, the stack also contains the environment vector and the auxiliary vector. The process startup code can be as simple as popping all that stuff off the stack and into the appropriate registers and then calling a C function of your choosing. Note that the entry point is not itself a function: there's nothing to return to. The entry point code finishes with an exit system call to ensure clean process termination when main returns a status code. This is how things work on Linux at least.

Julia's articles are always excellent. I've always had great results teaching people that compiled code doesn't keep secrets by demoing `strings`.

Can you elaborate?

The other replies are pretty good. You can find all sorts of goodies in string data inside a binary: hostnames, URL fragments, error messages or templates, credentials. Pretty much any string constants that a program might use.

You can run the `strings` command on most executables (or PDFs) and get an output of the strings represented in the file. Of course you can obfuscate some of those strings if you do things right but a lot of people who don't know about `strings` could write a password protected feature in a compile bit of code and be embarrassed to see how easy it is to find out what the password is.

man strings

If you put something like

  if mySecretPassword == "Qwerty123" {
     ...

then "Qwerty123" will be easily seen by strings utility. Which is pretty obvious but I'm guessing some junior folks will be surprised.

Explain that to the German judges that fined some poor fella for finding passwords in a binary by [doing the equivalent of] running strings on it. They claim he 'circumvented' the software's 'security measures'.

https://www.theregister.com/2024/01/19/germany_fine_security...

Not a criticism, not even a nit-pick, but a reflection

"(binaries are kind of the definition of platform-specific, so this is all platform-specific) (this is true!)

When "Actually Portable Executable" took the (geek) proved that the same binary could run on a bunch of platforms, that was a surreal moment I still haven't mentally recovered from.

Here we spent decades trying to solve the cross-platform problem, in so many fractals of ways (Java, cross-platform libraries, etc etc) and the solution was right under our noses all this time.

I may have misunderstood, but I'm pretty sure APE is not a "binary format" per se.

It is a script that can be executed on any system. That script can then LOAD a binary. IIRC the original needed to decode it from base64 before it could be loaded.

So... it's an executable binary loader

It's a script that starts with an EXE header ("MZ"). Having both EXE and ELF headers at the same location is obviously impossible, since they start with different bytes.

A possible, although very limited way to have an actual binary program execute on different platforms would be to create a DOS .COM file (which has no header, just the raw machine code) with a valid ELF header. It would then also work on 32-bit Windows via its built-in DOS emulator, and presumably on 64-bit Windows with WSL2.

The start bytes for a 32-bit ELF header decode as 16-bit x86 into:

    7F 45    JG    +45h
    4C       DEC   SP
    46       INC   SI
    01 01    ADD   [BX+DI],AX
    01 xx    ADD   ...

The first instruction is a jump past the end of the ELF header, unfortunately it's conditional. But we have 9 reserved bytes to continue this code, which is enough to undo the effects of the DEC and ADD instructions and then jump to the same address. I've written a 138 byte "Hello world" that works on Linux, DOS and also CP/M-80 that way.

It's possible to have the code that executes under Linux be a small (less than 2K bytes) loader program that creates 16-bit code and data segments and installs a handler for SIGSEGV. It can then jump into the same code that would run under 16-bit DOS, trapping every INT 21h and translating the most important syscalls into their Linux equivalents, kind of like a minimal version of Wine.

I have a proof of concept for that, it only handles the "read", "write" and "exit" syscalls, which is enough to write something like rot13 or hexdump. With a lot more work, it could be possible to produce really non-trivial software that runs in such a restricted environment...

Same same, but different…

How can the syscalls work the same in linux, windows and n my macos?!

It’s bananas

I personally am not convinced that portable binaries are a net positive. I believe in the era of fast computers that source distribution and local compilation is superior to binary distribution. Unfortunately, much of the software we rely on is so large, and compilers so relatively slow, that binary distribution is something of a necessary evil. I'd rather see more effort towards simpler software components (that naturally compile fast) and faster compilers than portable binaries.

You can have both. APE are generally faster and smaller.

Fat APEs (aarch64 + x86_64) are larger, but interesting in their own way.

Not in general.

Executables aren’t magic.

Nothing in a computer is magic. It was all designed by humans, every single one of which was once a clueless noob. No one is born understanding this stuff.

The actual /behavior/ of computers, though, tends to emerge from the confluence of complex processes that humans /can't/ understand...our AGI leverages this emergence to enable problem solving in domains where complexity exceeds human capabilities.

Nothing in a computer is magic.

I think that’s covered by the text, in the sentence right after that one (emphasis mine):

ELF is a file format like any other!

"It is no exaggeration to regard this as the most fundamental idea in programming: The evaluator, which determines the meaning of expressions on a piece of paper, is just another piece of paper." --SICP

This does a bunch of Very Important Things that I don’t understand very well, including calling main. So I won’t explain them.

Honestly, this line was the best in the whole article. It felt like at that moment I knew the person talking to me wasn't trying to prove that they were some sage (personally guilty here) but instead of was someone who wanted to show me something cool that we could both enjoy.

Wonderful write up.

(Self promotion) Check out my tool which let's you explore ELF using SQL

https://github.com/fzakaria/sqlelf

The format of executable files fascinated me back in the early 90s, to the point that I spent weeks writing (in Modula 2) a DOS and Windows executable file viewer that I named VEXE, releasing it as shareware in 1991.

It found a niche following among crackers, even deserving a mention in a +ORC tutorial, https://gist.github.com/callowaysutton/48bdf0245e17e72d41a15..., probably because it could detect various encryption and compression methods used to prevent the reverse engineering of those programs.

Amazingly helpful!

I started my blog in 2012, when I shifted my academic career from Mathematics to Computer Science. This topic was literally the first thing that I studied:

https://heinrichhartmann.com/archive/Dissecting-Hello-World....

Never regretted going down this deep rabbit hole. IIRC, Julia also has a math background. Maybe it's the desire for bottom-up reasoning that leads math folks towards experiments like this. Great to see her making this approachable for a large audience.

cat-ing a binary to the terminal is a recipe for sadness. I like | hd, which is hexdump -C, though that's just as impenetrable to the naked eye.

I think ELF should absolutely be mentioned in the title.

great thread, thank you!

Great thread

For a person with a heavy Python background, can anyone suggest a resource/book that would be a good applied intro to practical low-level programming? I've recently started learning Rust and realized I need to catch up on many things. I haven't taken any compiler course, so maybe that's the reason I am missing so much information. For example, I had no idea that symbols in a binary were a thing or what the difference between ELF/MACH-O was

For folks interested in this topic who have not seen Cosmopolitan and RedBean, αcτµαlly pδrταblε εxεcµταblε (2020) is a great read too: https://justine.lol/ape.html

https://redbean.dev/

If you are curious about how small a ELF binary file can be, you might like the following amusing article: https://www.muppetlabs.com/~breadbox/software/tiny/teensy.ht...

A book about this topic which I enjoyed is "Learning Linux Binary Analysis" by Ryan O'neil.