Why Does My eBPF Program Work on One Kernel but Fail on Another?(ebpfchirp.substack.com)

108 pointsby musha68k3 months ago6 comments

musicale3 months ago
Contempt for stable kernel data structures and APIs (and forget about any sort of kernel ABI) might make things easier for certain kernel developers, but it offloads a constant maintenance burden onto many other people, such as eBPF, driver, and kernel extension developers.
This sort of asymmetry is why system modules, and platforms in general, should absorb pain in order to benefit their many clients, rather than doing the opposite.
Could be worse though - some platforms (cough, iOS) are happy to break user apps every year and offload a constant maintenance burden onto many thousands of app developers, when a more stable ABI would save developers (and users) billions of dollars in aggregate.
- beng-nl3 months ago
  In Linux’s defense, the userland abi is stable, which is no small feat in terms of absorbing pain in order to benefit their many users..
  Not sure why the trade-off consideration led to a different outcome for in-kernel api’s, but given the work done to ensure the stability of the userland abi, I’m sure there is thought behind it..
  - DSMan1952763 months ago
    Well from a certain POV it's like stabilizing APIs internal to your application - nobody else should be calling them so "stabilizing" them just creates unnecessary maintenance work. Obviously in practice certain things like eBPF or externally-maintained drivers can break this model, but then they don't really want people doing those things vs. merging code into the kernel.
  - musicale3 months ago
    > userland abi is stable
    The system call interface per se is relatively stable. Then there's all that stuff that has been dumped into /proc...
  - alexjplant3 months ago
    > In Linux’s defense, the userland abi is stable, which is no small feat in terms of absorbing pain in order to benefit their many users..
    I understand that technically eBPF programs run on a VM in kernel space but aren't they loaded from userspace? Isn't eBPF an alternative to developing kernel modules and in-tree drivers? To a layperson like me it walks, talks, and quacks like userspace much more than the kernel. The fact that struct layout can change at the whim of kernel developers seems counterproductive. I guess this is what CO-RE is supposed to solve but having to deal with a bunch of pointer and sizeof() chicanery seems archaic (coming from a total luser/kernel nublet that hasn't written C in over a decade).
- Dylan168073 months ago
  Things were in flux for a while, but what don't you like about CORE+BTF as a stable ABI?
- pjmlp3 months ago
  Most do, the Linux kernel is the exception in the OS world.
kazinator3 months ago
"struct tcphdr" follows the wire format dictated by the TCP protocol. It positively has not changed between 5.3 and 5.4, and cannot. It would make no sense, since it would violate RFCs and fail to interoperate.
https://elixir.bootlin.com/linux/v5.3/source/include/uapi/li...
https://elixir.bootlin.com/linux/v5.4/source/include/uapi/li...
linuxftw3 months ago
This is all covered in the eBPF documentation. CORE was introduced over 6 years ago.
- mackman3 months ago
  CORE only works on kernels that support BTF. This post introduces one workaround which is to generate BTF data for kernels without it. That's still only half the problem though. You also need to write your eBPF program so every kernel verifier passes it, even though every kernel's eBPF verifier has different bugs, capabilities, and complexity limits. I maintain a large eBPF program that supports 4.14 through 6.14. We implemented our own version of CORE before CORE really existed. In reality, it's a lot more work than "compile once run everywhere."
  - roblabla3 months ago
    Yeah same, we maintain some eBPF probes spanning 4.11 to latest kernel, and holy hell, it's really bad. The worst offender being some old RedHat kernels with half-baked backports of the eBPF features containing a bunch of weird bugs or features that aren't perfectly in line with what's used in mainline...
    Here's a fun bug we recently had: we had to ban substractions in our program (replacing them with an __asm__ macro) because of a bug in linux kernel 5.7.0 to 5.10.10, which had the (indirect) impact of not properly tracking the valid min/max values in the verifier[0]. The worst part is, it didn't cause the verifier to reject our program outright - instead, it used that information to optimize out some branches it thought were never reachable, making for some really wonky to debug situation where the program was running an impossible control-flow[1], resulting in it returning garbage to user-space.
    All this to say, CORE is really only half the problem. Supporting every kernel in existance is still a huge effort. Still worth it compared to the alternative of writing a linux kernel driver though!
    [0]: https://github.com/torvalds/linux/commit/bc895e8b2a64e502fbb...
    [1]: https://github.com/torvalds/linux/blob/bc895e8b2a64e502fbba7...
  - linuxftw3 months ago
    Kernels without BTF data are ancient at this point. BTF was added in 4.18, that was in 2018. 2018! If you're running a kernel older than that, you don't need BPF, you need a whole new operating system.
    Yes, each kernel version might have different features between then and now. You have to pick a minimum supported version and write against that.
    roblabla3 months ago
    Many, many distributions didn't embed the BTF information until fairly recently. OpenSUSE did it in 15.4, released in 2023. At $WORK, we have many customers running on distros that didn't have embedded BTF - such as RHEL7 (yes, they pay for extended maintenance).
    I really wish customers would update to a newer distro, but I also understand why they don't. So it's up to me to adapt.
    > You have to pick a minimum supported version and write against that.
    What we end up doing is progressively enabling features based on what's available in the kernel. Every eBPF we write is compiled multiple times with a couple of different flags to enable/disable certain features. It works decently well, and allows using the most capable datastructure/helpers based on the kernel version.
    magicalhippo3 months ago
    We've got customers who complained when we bumped some critical dependencies and our software suddenly didn't work on Windows 2008 R2 servers any more... in 2025.
    acheong083 months ago
    4.18? My uni still runs on 3.10…
Upvoter333 months ago
There's some research on this topic, e.g., https://depsurf.github.io
jeffrallen3 months ago
Feels like yet another example of "essential complexity driven by too much churn in infrastructural code".
I wonder why no one needs to write this article about dtrace probes? Is it because they are less used? Less capable? More stable? Better engineered?
Probably all of the above, alas.
- heinrichhartman3 months ago
  From my experience most DTrace users rely on DTrace "providers" [1] and Static Trace Points [2] rather than directly probing kernel structs. Also these days the Solaris kernel is not moving all that much.
  [1] https://www.illumos.org/books/dtrace/chp-syscall.html#chp-sy... [2] https://www.illumos.org/books/dtrace/chp-sdt.html#chp-sdt
  - toast03 months ago
    DTrace isn't limited to Solaris. Per Wikipedia, it's in FreeBSD, NetBSD, Mac OS (but you can't use it with SIP), and Windows. And lots of userland stuff too.
- chronid3 months ago
  IIRC eBPF and DTrace are (no longer) solving a similar problem, eBPF has become far bigger than just tracing, it's now a way to have user space code "driving" kernel decisions. I'm not sure they can be compared this way - and even if we do, the user base of DTrace is infinitesimally smaller of the one of eBPF.
  - jeffrallen3 months ago
    Right, but it might have been possible to deliver these features without someone having to write a dynamic loader for kernel space bytecode. And yet here we are, talking about it.
    This is essential complexity following the accidental complexity of allowing user space to depend on the unstable kernel internals. Which was probably unavoidable, but also a decision that leads to continuing complexity (and probably bugs from it later).
jstrong3 months ago
wow that sounds like a PITA to deal with