I'm not sure if it is HN-crowd type material since it is easy enough information for most of us to dig up, assuming we didn't already know it. Yet it does not simplify things to the point of, "technology is magic."
The huffman tree, LZ77 and LZMA explanation is truly excellent for how concise the explanation is.
The earlier Veritasium video on Markov Chains in itself is linked if you don't know what a markov chain is.
I expected Veritasium to tank when it got sold to private equity & Derek went to Australia, but been surprised to see the quality of the long form stuff churned out by Casper, Petr, Henry & Greg.
This isn't correct at all. The changes were merged into xz and made it into testing branches of major Linux distros.
It was caught at T plus a few minutes only because a neurotic Microsoft employee performing debugging noticed an obscure performance issue.
You can literally say Microsoft saved Linux that day. Imagine thinking this 25 years ago.
It's the difference between something really bad which happened, and something really, really, really, really bad: a malicious actor having RCE credentials to every new Debian and Red Hat box on planet Earth.
It's possible though. The noise around it did at least put Freund on alert and we should be very glad both that "Jia Tan" made the mistakes they made originally and that Freund followed up on their gut feeling
One wonders whether the xz backdoor would have been discovered if slightly less obfuscation was used.
The whole xz incident is a pretty strong argument to:
a) change practice from including binary (opaque) test files themselves to human-readable scripts and tooling that build test files on-demand,
b) raise suspicion of any binaries included in open source projects, and
c) create much more scrutiny around dependencies of 'highly scrutinised' packages like OpenSSH.
It's a shame that there isn't a foundation (that I'm aware of) that can donate time and effort of vetted developers to foundational open source projects like xz.
But xz is not a dependency of upstream OpenSSH you see. It was a dependency of a patch created by Linux distros for systemd integration.
Video of Jia Tan fixing the valgrind bugs: https://www.youtube.com/watch?v=A16YuzuKN58&t=138s
If a state actor (it almost has to be a state actor at the time frame they were operating under) could put in this much effort once, they clearly could afford to do it X times. And when you look through the history of communications from the author, it just reads like 'another day at the office'.
Outside of Valgrind bugzilla bug reports these claims almost never stand up to close scrutiny. Not that the people making the claims ever perform any scrutiny. It's usually "my application doesn't crash so it must be a false positive" or "I'm sure that I initialised that variable" or "it's not really a leak, the OS will reclaim the memory".
> A lot of the aliases, like Jia Tan, they sound like Asian names, and the published changes are all timestamped in UTC+8, Beijing time. So the signs point to China. And that's why it's probably not China. I mean, why would they make it that obvious? Every other part of the operation has been so meticulous, so cautious.
> And they also worked on Chinese New Year, but not on Christmas. And over the years, there were nine changes that fall outside of the Beijing time into UTC+2, which is a time zone that includes Israel and parts of Western Russia. That's why some experts have speculated that this could be the work of APT29, a Russian-state-backed hacker group also known as Cozy Bear. But again, do we know? No, of course we don't know who it is, and we likely will never know.
Also quick search suggested UTC+3 was seen during the summer, and Russia doesn't do DST either.
Edit: some of the UTC+2/3 times are attributable to being differences in git committer and author dates (e.g. email patches)
Except one: commit 3d1fdddf9 has Jia Tan as both author and committer but the author timestamp is in +0300 while the commit timestamp is +0800.
Their "Christmas" family celebrations are on New Years Eve.
So if you're drawing conclusions from them not working on the 25th (which is a literal normal day in eastern europe) then signs point elsewhere unfortunately.
That's just what they want you to think!
This is the scariest part to me:
> A pull request (https://github.com/jamespfennell/xz/pull/2) to a go library by a 1Password employee is opened asking to upgrade the library to the vulnerable version
https://oxide-and-friends.transistor.fm/episodes/discovering...
Europe should have an equivalent scheme for programmers of important Open Source projects such as this one.
Also today as I understand it much of OSS is done in-house by major companies (red hat, Ubuntu, ibm, Google, etc)
But in the video itself, they show that the actual ssh time was about 100 ms and the new time it took was about 600 ms. It is almost 6 times the actual time. I am expecting the performance of the benchmark to significantly drop with these times. And it should be obvious to see that something was wrong.
( I am taking nothing from Andres here. I think he's a brilliant engineer to actually find the root cause of this himself. He is a hero. I am just pointing that 500 ms is not something obscure time interval).
...and yet, zero mention of systemd's recommendation for programs to link in the libsystemd kitchen sink just to call sd_notify() (which should really be its own library)
...and no mention of why systemd felt the need to preemptively load compression libraries, which it only needs to read/write compressed log files, even if you don't read/write log files at all? Again, it's a whole independent subsystem that could be its own library.
The video showed that xz was a dependency of OpenSSH. It showed on screen, but never said aloud, that this was only because of systemd. Debian/Redhat's sshd [0] was started with systemd and they added in a call to the sd_notify() helper function (which simply sends a message to the $NOTIFY_SOCKET socket), just to inform systemd of the exact moment sshd is ready. This loads the whole of libsystemd. That loads the whole of liblzma. Since the xz backdoor, OpenSSH no longer uses the sd_notify() function directly, it writes its own code to connect to $NOTIFY_SOCKET. And the sd_notify manpage begrudgingly gives a listing of code you can use to avoid calling it, so if you're an independent program with no connection to systemd, you just want to notify it you've started... you don't need to pull in the libsystemd kitchen sink. As it should've been in the first place.
Is the real master hacker Lennart Poettering, for making sure his architectural choices didn't appear in this video?
[0]: as an aside, the systemd notification code is only in Debian, Redhat et al because OpenSSH is OpenBSD's fork of Tatu Ylönen's SSH, which went on to become proprietary software. systemd is Linux-only and will never support OpenBSD, so likewise OpenBSD don't include any lines of code in OpenSSH to support systemd. Come to think of it, "BSD" is another thing they don't mention in the script, despite mentioning the AT&T lawsuit (https://en.wikipedia.org/wiki/USL_v._BSDi)
However the editors (correctly IMHO) took the decision to simplify the whole story of dependencies. In an early draft they simplified it too much, sort of implying that sshd depended directly on liblzma, but they corrected that (adding the illustration of dependencies) after I pointed out it was inaccurate.
I agree with everything you say, but you have to pick your battles when explaining very complicated topics like shared libraries to a lay audience.
In general I was impressed by their careful fact checking and attention to detail.
Sadly they missed the misspelling (UNRESOVLED) even though I pointed it out last week :-( But that's literally the only thing they didn't fix after my feedback.
They never once utter the word "systemd", anywhere in the script... isn't that strange for such a key dependency?
OpenSSH is maintained by the OpenBSD developers. OpenSSH does not use liblzma (xz) at all.
Linux distros which chose to switch to systemd also chose to patch OpenSSH to call systemd's sd_notify() function, to inform systemd when sshd is fully started.
This sd_notify() function is in the huge, sprawling kitchen sink of a library called libsystemd. sd_notify() is only a few lines of code, but it's convenient (to Linux distro packagers) to make systemd a dependency of OpenSSH, link in the whole library and call that one function. It makes their patches of the upstream software smaller and easier to review for correctness.
In the sprawling libsystemd is an entire subsystem for reading/writing systemd's famous binary log files, and the user can choose compression (xz, zstd or lz4). It depended on and loaded all three of these compression libraries, whether you read/write compressed logs or not. In the video you hear about the imminent request to load these libraries dynamically on demand -- https://github.com/systemd/systemd/pull/31550 -- but this arrives many years adding these functions to the libsystem kitchen sink, and generally speaking most programs shouldn't use the libsystemd functions for reading/writing log files, they only need to send log messages to journald via syslog() or sd_journal_print()
So you can see this unwarranted dependency chain was introduced by Linux distros adding systemd to everything, and nation-state level hackers saw and tried to exploit it, seeking out the xz maintainer for social engineering.
systemd is doing what it was designed to do... Cute videos are doing what they were designed to do too - hiding that!
> OpenBSD don't include any lines of code in OpenSSH to support systemd. Come to think of it, "BSD" is another thing they don't mention in the script
And this!
The technical explanations are way too complex (even though they're "dumbed down" somewhat with the colour mixing scenario), that anyone who understands those will also know about how dependencies work and how Linux came to be.
It feels almost like it's made for people like my mum, but it will lose them almost immediately at the first mention of complex polynomials.
The actual weight of the situation kinda lands though, and that's important. It's really difficult to overstate how incredibly lucky we were to catch it, and how sophisticated the attack actually was.
I'm really sad that we will genuinely never know who was behind it, and anxious that such things are already in our systems.
Her comment was that she was really impressed that it didnt dumb anything down like normal documentaries do. She was able to follow along more technical stuff than she anticipated, and that made her enjoy it even more.
I think we need to give people more credit when it comes to complex or techincal explanations. If people are enjoying the context but dont understand the techincal, they can just gloss over that if they prefer. But I felt this was quite telling at how and why Veritasium is such a popular channel.
They aren't really a technology channel though, at least as it relates to software/computers, so that's probably why the video starts out with a brief history of Linux.
(But also, my conspiratorially-inclined mind is quite entertained by the thought of some sort of parallel construction or tip from a TLA.)
Instead we have come to expect them to cowardly sit on exploits, or actively introduce them, rather than working to secure the general public from adversaries.
What a mess.
For sure you were/are not alone in this thinking. How fast the whole thing was exposed in decent enough details was... surprising.
Why are build scripts not operating in a clean directory, stripping away all test related files?
Isn't this something we should begin to consider doing, seen that it's all too easy to put arbitrary things in test files (you can just pretend stuff is "fuzzed" or "random" or "test vectors" and whatnots: there's always going to be room to hide mischief in test files)?
Like literally building, but only after having erased all test directories/files/data.
Or put it this way: how many backdoors are actually live but wouldn't be if every single build was only done after carefully deleting all the irrelevant files related to tests?