This sort of asymmetry is why system modules, and platforms in general, should absorb pain in order to benefit their many clients, rather than doing the opposite.
Could be worse though - some platforms (cough, iOS) are happy to break user apps every year and offload a constant maintenance burden onto many thousands of app developers, when a more stable ABI would save developers (and users) billions of dollars in aggregate.
Not sure why the trade-off consideration led to a different outcome for in-kernel api’s, but given the work done to ensure the stability of the userland abi, I’m sure there is thought behind it..
The system call interface per se is relatively stable. Then there's all that stuff that has been dumped into /proc...
I understand that technically eBPF programs run on a VM in kernel space but aren't they loaded from userspace? Isn't eBPF an alternative to developing kernel modules and in-tree drivers? To a layperson like me it walks, talks, and quacks like userspace much more than the kernel. The fact that struct layout can change at the whim of kernel developers seems counterproductive. I guess this is what CO-RE is supposed to solve but having to deal with a bunch of pointer and sizeof() chicanery seems archaic (coming from a total luser/kernel nublet that hasn't written C in over a decade).
https://elixir.bootlin.com/linux/v5.3/source/include/uapi/li...
https://elixir.bootlin.com/linux/v5.4/source/include/uapi/li...
Here's a fun bug we recently had: we had to ban substractions in our program (replacing them with an __asm__ macro) because of a bug in linux kernel 5.7.0 to 5.10.10, which had the (indirect) impact of not properly tracking the valid min/max values in the verifier[0]. The worst part is, it didn't cause the verifier to reject our program outright - instead, it used that information to optimize out some branches it thought were never reachable, making for some really wonky to debug situation where the program was running an impossible control-flow[1], resulting in it returning garbage to user-space.
All this to say, CORE is really only half the problem. Supporting every kernel in existance is still a huge effort. Still worth it compared to the alternative of writing a linux kernel driver though!
[0]: https://github.com/torvalds/linux/commit/bc895e8b2a64e502fbb...
[1]: https://github.com/torvalds/linux/blob/bc895e8b2a64e502fbba7...
Yes, each kernel version might have different features between then and now. You have to pick a minimum supported version and write against that.
I really wish customers would update to a newer distro, but I also understand why they don't. So it's up to me to adapt.
> You have to pick a minimum supported version and write against that.
What we end up doing is progressively enabling features based on what's available in the kernel. Every eBPF we write is compiled multiple times with a couple of different flags to enable/disable certain features. It works decently well, and allows using the most capable datastructure/helpers based on the kernel version.
I wonder why no one needs to write this article about dtrace probes? Is it because they are less used? Less capable? More stable? Better engineered?
Probably all of the above, alas.
[1] https://www.illumos.org/books/dtrace/chp-syscall.html#chp-sy... [2] https://www.illumos.org/books/dtrace/chp-sdt.html#chp-sdt
This is essential complexity following the accidental complexity of allowing user space to depend on the unstable kernel internals. Which was probably unavoidable, but also a decision that leads to continuing complexity (and probably bugs from it later).