Breadbox, the author, wants to make smaller binary executables. He explains about ELF binaries, a.out binaries, old MSDOS .COM binaries, and how the later had no metadata, and could be very small. He then explains how you can dynamically load code that deals with new executable binary formats into the Linux kernel, and how this process works. He walks through some sample C for building a "Hello World" kernel module. He then walks you through ~1 page of code for a kernel module that registers a new binary format, sets up some callbacks, and if conditions are right, will vm_mmap() the code into memory and call start_thread() on it.
Yay, it works! He has a tiny binary. This is where most articles would end, but Breadbox goes deeper. What if you want a stack and a heap? What if you want to access argc, argv, and envp? What if you want to append code at the end that automatically calls the exit syscall? All these details are covered, and I think it's glorious.
While this all may seem like pretty dry stuff, there is humor sprinkled throughout, which makes it more fun to read.
Ah, the pre-internet was glorious.
And it pairs well with another article on the front page. [0]
Which I bring up because they disagree on a particular point. And that is how a script without a shebang gets run as a script.
> This is done by registering a set of callback functions, and these callbacks get invoked when the kernel is asked to execute a binary file. The kernel invokes the callbacks on this list, and the first one that claims to recognize the file takes responsibility for getting it properly loaded into memory. If nobody on the list accepts it, then as a last resort the kernel will attempt to treat it as a shell script without a shebang line. And if that doesn't fly, then you'll get that "Exec format error" message described above.
But the article I linked to says the shell actually handles it. And based off of its research (terribly reproduced below), I'm inclined to believe it.
echo echo Hello world > test.sh
chmod +x test.sh
strace ./test.sh
strace sh -c ./test.sh
You'll see the first one errors with `ENOEXEC`, but the second one does not. Also, in my head, I don't know how the kernel would know what shell to choose, or that it should even expect to have access to a shell.So I was going to go edit my essay, when I learned that my essay was also posted on Hacker News. And now I discover that someone has already called out my error before I could fix it. Sigh.
Anyway, I just thought I should acknowledge this before I go to fix it.
And don't beat yourself up too much. This was a phenomenal article and it gave me the courage to dig into the kernal code myself.
I might be wrong, so do correct me if so.
> If nobody on the list accepts it, then as a last resort the kernel will attempt to treat it as a shell script without a shebang line.
They said that the kernel is responsible for invoking the shell. I honestly think this was just a brain fart and the author meant to put shell and not kernel. With both words flying around in your head, it's an easy mistake to make.
But, the again, the article goes on to talk about how it decides to even try that last step:
> Interesting side note: The kernel decides whether or not to try to parse a file as a shell script by whether or not it contains a line break in the first few hundred bytes — specifically if it contains a line break before the first zero byte. Thus a data file that just happens to have a "\n" near the top can produce some odd-looking error messages if you try to execute it.
So I don't know.
I guess my next step is to look at the kernel source itself. I'll probably end up doing that in a bit.
I also put together two version of the same call to a shebangless script in Python, one with `shell=True` and the other without. It's only the one that calls into the shell that successfully runs the script. The strace outputs corroborate my theory.
Without shell=True (truncated)
[pid 961626] execve("./sh.sh", ["./sh.sh"], 0x7fff7bae94a0 /* 66 vars */) = -1 ENOEXEC (Exec format error)
With shell=True (truncated) [pid 961623] execve("/bin/sh", ["/bin/sh", "-c", "./sh.sh"], 0x7ffd75009e50 /* 66 vars */) = 0
[pid 961624] execve("./sh.sh", ["./sh.sh"], 0x5980a07c70a8 /* 66 vars */) = -1 ENOEXEC (Exec format error)
[pid 961624] execve("/bin/sh", ["/bin/sh", "./sh.sh"], 0x5980a07c70a8 /* 66 vars */) = 0
edit: Actually, just define your binary format so that the first byte is copied to the stack and all subsequent bytes are copied to text with the epilogue appended to it.
edit: You could also define it so that the first byte is copied into the first argument register/RDI if you want to shrink loaded RAM usage to just 4 bytes of code and 1 byte of data.
This is of course assuming it is a "generic" binary format that is not literally just encoding the contents of the tiny program. Otherwise you could do 0 bytes and just have the loader pre-fill RAX with 60 and RDI with 42 and insert a one instruction epilogue consisting of syscall. You could technically still call that a "generic" binary format since any actual binary you attempt to load will just blow away those pre-filled GPR values.
It's also possible to detect which mode the CPU is in:
bits 16
mov ax,start16 ;may load EAX instead,
jmp ax ;skipping this 2-byte instruction
bits 32
dec eax ;REX prefix in long mode,
mov eax,start32 ;may load RAX,
jmp eax ;skipping these 4 bytes
nop
nop
bits 64
jmp start64
You can even be compatible with CP/M-80 by putting this at the start: add bx,start8 ;8080: ADD C, JMP start8
nop ;immediate may be 16 or 32 bits
nop
Nowadays however, interviewers are rarely impressed with what arcane knowledge you may or may not have, regardless of how hard won the experiences were that taught it to you.
I was reminded of this when reading the "Demystifying the shebang" article today on HN when I saw it in some strace output, which along with the other similarities got me to thinking about this article.
Woah, I have a feeling this does something even more. If the program modifies its own instructions, the kernel will probably save those modifications in the file too.
https://news.ycombinator.com/item?id=32648359
Id recommend using QEmu for the type of work the author is doing. It makes iteration much faster.
0: https://blogs.oracle.com/linux/post/introduction-to-netfilte...