Then I found an HN comment I wrote a few years ago that confirmed this:
“[...] I remember that day pretty clearly because in the same lightning talk session, Solomon Hykes introduced the Python community to docker, while still working on dotCloud. This is what I think might have been the earliest public and recorded tech talk on the subject:”
YouTube link: https://youtu.be/1vui-LupKJI?t=1579
Note: starts at t=1579, which is 26:19.
Just being pedantic though. That’s about 13 years ago. The lightning talk is fun as a bit of computing history.
(Edit: as I was digging through the paper, they do cite this YouTube presentation, or a copy of it anyway, in the footnotes. And they refer to a 2013 release. Perhaps there was a multi-year delay between the paper being submitted to ACM with this title and it being published. Again, just being pedantic!)
We first submitted the article to the CACM a while ago.
The review process takes some time and "Twelve years of
Docker containers" didn't have quite the same vibe.
(The CACM reviewers helped improve our article quite a bit. The time spent there was worth it!)Here’s the announcement from 2013:
The flip side is that the world still hasn’t settle on a language-neutral build tool that works for all languages. Therefore we resort to running arbitrary commands to invoke language-specific package managers. In an alternate timeline where everyone uses Nix or Bazel or some such, docker build would be laughed out of the window.
> running arbitrary commands to invoke language-specific package managers.
This is exactly what we do in Nix. You see this everywhere in nixpkgs.
What sets apart Nix from docker is not that it works well at a finer granularity, i.e. source-file-level, but that it has real hermeticity and thus reliable caching. That is, we also run arbitrary commands, but they don't get to talk to the internet and thus don't get to e.g. `apt update`.
In a Dockerfile, you can `apt update` all you want, and this makes the build layer cache a very leaky abstraction. This is merely an annoyance when working on an individual container build but would be a complete dealbreaker at linux-distro-scale, which is what Nix operates at.
That's not going to work if both parties get different hashes when they build the image, which won't happen as long as file modification timestamps (and other such hazards) are part of what gets hashed.
It's not just the timestamps you need to worry about. Tar needs to be consistent with the uid vs username, gzip compression depends on implementations and settings, and the json encoding can vary by implementation.
And all this assumes the commands being run are reproducible themselves. One issue I encountered there was how alpine tracks their package install state from apk, which is a tar file that includes timestamps. There are also timestamps in logs. Not to mention installing packages needs to pin those package versions.
All of this is hard, and the Dockerfile didn't make it easy, but it is possible. With the right tools installed, reproducing my own images has a documented process [2].
Personally I love using mkosi and while it has all the composability and deployment options I'd care for, its clear not everyone wants to build starting only with a blank set of OS templates.
In Spack [1] we do one layer per package; it's appealing, but I never checked if besides the layer limit it's actually bad for performance when doing filesystem operations.
Want to throw a requirements.txt in there? No no, why would you even ask that? Meanwhile docker says yeah sure just run pip install, why should I care?
If you care about getting it to work with minimal effort right now more thar about it being sustainable later, then sure.
Like all LLM boosters, you've ignored the fact that the largest time sink in many kinds of software is not initial development, but perpetual maintenance.
I wish we had standardized on something other than shell commands, though. Puppet or terraform or something more declarative would have been such a better alternative to “everyone cargo cults ‘RUN apt-get upgrade’ onto the top of their dockerfiles”.
Like, the layer/stage/caching behavior is fine. I just wish the actual execution parts had been standardized using something at a higher level of abstraction than shell.
Until you need to do something that isn't covered with its DSL, and you extend it with an external command execution declaration... At which point people will just write bash scripts anyway and use your declarative language as a glorified exec.
However, Dockerfiles are so popular because they run shell commands and permit 'socially' extending someone else shell commands; tacking commands onto the end of someone else's shell script is a natural process. /bin/sh is unreasonably effective at doing anything you need to a filesystem, and if the shell exposes a feature, it has probably been used in a Dockerfile somewhere.
Every other solution, especially declarative ones, tend to come up short when _layering_ images quickly and easily. However, I agree they're good if you control the entire declarative spec.
Its a Buildkit frontend, so you still use "docker build".
And if you want something weird that's not supported by your particular tool of choice, you have the escape hatch of running arbitrary commands in the Dockerfile.
What more do you want?
They sounded nice on paper but the work they replaced was somehow more annoying.
I moved over to Docker when it came out because it used shell.
I'd get much better results it I used something else to do the foreach and gave terraform only static rules.
If your dockerfile says “ensure package X is installed at version Y” that’s a lot clearer (and also more easy to make performant/cached and deterministic) than “apt-get update; apt-get install $transitive-at-specific-version; apt-get install $the-thing-you-need-atspecific-version”. I’m not thrilled at how distro-locked the shell version makes you, and how easy it is for accidental transitive changes to occur too.
But neither of those approaches is at a particularly low abstraction level relative to the OS itself; files and system calls are more or less hidden away in both package-manager-via-bash and puppet/terraform/whatever.
But as long as people want to use scripting languages (like php, python etc) i guess docker is the neccessary evil.
I'll tell that to my CI runner, how easy is it for Go to download the Android SDK and to run Gradle? Can I also `go sonarqube` and `go run-my-pullrequest-verifications` ? Or are you also going to tell me that I can replace that with a shitty set of github actions ?
I'll also tell Microsoft they should update the C# definition to mark it down as a scripting language. And to actually give up on the whole language, why would they do anything when they could tell every developer to write if err != nil instead
Just because you have an extremely narrow view of the field doesn't mean it's the only thing that matters.
E.g. systemd exposes a lot of resource control as well as sandboxing options, to the point that I would argue that systemd services can be very similar to "traditional" runtime containers, without any image involved.
Interesting. How does go build my python app?
What I want to do when running a Docker container on Mac is to be able to have the container have an IP address separate from the Mac's IP address that applications on the Mac see. No port mapping: if the container has a web server on port 80 I want to access it at container_ip:80, not 127.0.0.1:2000 or something that gets mapped to container port 80.
On Linux I'd just used Docker bridged networking and I believe that would work, but on Mac that just bridges to the Linux VM running under the hypervisor rather than to the Mac.
Is there some officially recommended and supported way to do this?
For a while I did it by running WireGuard on the Linux VM to tunnel between that and the Mac, with forwarding enabled on the Linux VM [1]. That worked great for quite a while, but then stopped and I could not figure out why. Then it worked again. Then it stopped.
I then switched to this [2] which also uses WireGuard but in a much more automated fashion. It worked for quite a while, but also then had some problems with Docker updates sometimes breaking it.
It would be great if Docker on Mac came with something like this built in.
BTW are you trying to avoid port mapping because ports are dynamic and not known in advance? If so you could try running the container with --net=host and in Docker Desktop Settings navigate to Resources / Network and Enable Host Networking. This will automatically set up tunnels when applications listen on a port in the container.
Thanks for the links, I'll dig into those!
Genuinely fascinating and clever solution!
[1] https://github.com/rootless-containers/slirp4netns
[2] https://blog.podman.io/2024/03/podman-5-0-breaking-changes-i...
[3] https://passt.top/passt/about/#pasta-pack-a-subtle-tap-abstr...
There was another component that we didn't have room to cover in the article that has been very stable (for filesystem sharing between the container and the host) that has been endlessly criticised for being slow, but has never corrupted anyone's data! It's interesting that many users preferred potential-dataloss-but-speed using asynchronous IO, but only on desktop environments. I think Docker did the right thing by erring on the side of safety by default.
Sir, this is a hacker news.
SLIRP was useful when you had a dial up shell, and they wouldn't give you slip or ppp; or it would cost extra. SLIRP is just a userspace program that uses the socket apis, so as long as you could run your own programs and make connections to arbitrary destinations, you could make a dial script to connect your computer up like you had a real ppp account. No incomming connections though (afaik), so you weren't really a peer on the internet, a foreshadowing of ubiquitous NAT/CGNAT perhaps.
That's a mistake indeed; "popularised by" might have been better. Before my beloved Palmpilot arrived one Christmas, I was only using SLIRP to ninja in Netscape and MUD sessions onto a dialup connection which wasn't a very mainstream use.
Well, before Docker I used to work on Xen and that possible future of massive block devices assembled using Vagrant and Packer has thankfully been avoided...
One thing that's hard to capture in the article -- but that permeated the early Dockercons -- is the (positive) disruption Docker had in how IT shops were run. Before that going to production was a giant effort, and 'shipping your filesystem' quickly was such a change in how people approached their work. We had so many people come up to us grateful that they could suddenly build services more quickly and get them into the hands of users without having to seek permission slips signed in triplicate.
We're seeing the another seismic cultural shift now with coding agents, but I think Docker had a similar impact back then, and it was a really fun community spirit. Less so today with the giant hyperscalars all dominating, sadly, but I'll keep my fond memories :-)
Funny comment considering lightweight/micro-VMs built with tools like Packer are what some in the industry are moving towards.
Some of those talks strangely make more sense today (e.g. Rump Kernels or unikernels + coding agents seems like a really good combination, as the agent could search all the way through the kernel layers as well).
"Ship your machine to production" isn't so bad when you have a ten-line script to recreate the machine at the push of a button.
Wonder when some enterprising OSS dev will rebrand dynamic linking in the future...
I don't care about glibc or compatibility with /etc/nsswitch.conf.
look at the hack rust does because it uses libc:
> pub unsafe fn set_var<K: AsRef<OsStr>, V: AsRef<OsStr>>(key: K, value: V)
So what do you do when you need to resolve system users? I sure hope you don't parse /etc/passwd, since plenty of users (me included) use other user databases (e.g. sssd or systemd-userdbd).
I think it’s laziness, not difficulty. That’s not meant to be snide or glib: I think gaining expertise in how to package and deploy non-containerized applications isn’t difficult or unattainable for most engineers; rather, it’s tedious and specialized work to gain that expertise, and Docker allowed much of the field to skip doing it.
That’s not good or bad per se, but I do think it’s different from “pre-container deployment was hard”. Pre-container deployment was neglected and not widely recognized as a specialty that needed to be cultivated, so most shops sucked at it. That’s not the same as “hard”.
I sort of had the problem in mind. Docker is the answer. Not clever enough to have inventer it.
If I did I would probably have invented octopus deploy as I was a Microsoft/.NET guy.
Minus the kernel of course. What is one to do for workloads requiring special kernel features or modules?
Good luck convincing people to switch!
Using it, solving problems with it, and building a real community around it tend to make a much greater impact in the long run.
That means unlike Gentoo, I've never dealt with a "slot conflict" where two packages want conflicting dependencies. And unlike Ubuntu, I have new versions of everything.
Pick 2: share dependencies, be on the bleeding edge, or waste your time resolving conflicts.
If you have adopted a bad tool then people are likely to want the bad tool in more places. This is the opposite of a virtuous cycle and is a horrible form of tech debt.
"Docker, Guix and NixOS (stable) all had their first releases
during 2013, making that a bumper year for packaging aficionados."
Now we get coding agent updates every week, but has there been a similar year since 2013 where multiple great projects all came out at the same time?Have others found this to be the case? Perhaps we're doing something wrong.
When compared to a VM, yes. But shipping a separate userspace for each small app is still bloat. You can reuse software packages and runtime environments across apps. From an I/O, storage, and memory utilization point of view, it feels baffling to me that containers are so popular.
Docker containers also do reuse shared components, layers that are shared between containers are not redownloaded. The stuff that's unique at the bottom is basically just going to be the app you want to run.
Why? It's not virtualization, it's containerization. It's using the host kennel.
Containers are fast.
You can hardly call this efficient hardware utilization.
For running your own machine, sure. But this would become non maintainable for a sufficiently multi tenant system. Nix is the only thing that really can begin to solve this outside of container orchestration.
I've recently switched from docker compose to process compose and it's super nice not to have to map ports or mount volumes. What I actually needed from docker had to do less with containers and more with images, and nix solves that problem better without getting in the way at runtime.
What process-compose gives me is a single parent with all of that project's processes as children, and a nice TUI/CLI for scrolling through them to see who is happy/unhappy and interrogating their logs, and when I shut it down all of that project's dependencies shut down. Pretty much the same flow as docker-compose.
It's all self-contained so I can run it on MacOS and it'll behave just the same as on Linux (I don't think systemd does this, could be wrong), and without requiring me to solve the docker/podman/rancher/orbstack problem (these are dependencies that are hard to bundle in nix, so while everything else comes for free, they come at the cost of complicating my readme with a bunch of requests that the user set things up beforehand).
As a bonus, since it's a single parent process, if I decide to invoke it through libfaketime, the time inherited by subprocess so it's consistently faked in the database and the services and in observability tools...
My feeling for systemd is that it's more for system-level stuff and less for project-level dependencies. Like, if I have separate projects which need different versions of postgres, systemd commands aren't going to give me a natural way to keep track of which project's postgres I'm talking about. process-compose, however, will show me logs for the correct postgres (or whatever service) in these cases:
~/src/projA$ process-compose process logs postgres
~/src/projB$ process-compose process logs postgres
This is especially helpful because AI agents tend to be scoped to working directory. So if I have one instance of claude code on each monitor and in each directory, which ever one tries to look at postgres logs will end up looking at the correct postgres's logs without having to even know that there are separate ones running.Basically, I'm alergic to configuring my system at all. All dependencies besides nix, my text editor, and my shell are project level dependencies. This makes it easy to hop between machines and not really care about how they're set up. Even on production systems, I'd rather just clone the repo `nix run` in that dir (it then launches process compose which makes everything just like it was in my dev environment). I am however not in charge of any production systems, so perhaps I'm a bit out of touch there.
Why do you think other tools will make a comeback?
I want it not to just be invisible but to be missing. If you have kubernetes, including locally with k3s or similar, it won't be used to run containers anyway. However it still often is used to build OCI images. Podman can fill that gap. It has a Containerfile format that is the same syntax but simpler than the Docker builds, which now provides build orchestration features similar to earthly.dev which I think are better kept separate.
(article author here)
Apple containers are basically the same as how Docker for Mac works; I wrote about it here: https://anil.recoil.org/notes/apple-containerisation
Unfortunately Apple managed to omit the feature we all want that only they can implement: namespaces for native macOS!
Instead we got yet another embedded-Linux-VM which (imo) didn't really add much to the container ecosystem except a bunch of nice Swift libraries (such as the ext2 parsing library, which is very handy).
Is there any insight into this, I would have thought the opposite where developers on the platform that made docker succeed are given first preview of features.
There’s another one, at least IMHO, that this entire stack from the bottom up is designed wrong and every day we as a society continue marching down this path we’re just accumulating more technical debt. Pretty much every time you find the solution to be, “ok so we’ll wrap the whole thing and then…” something is deeply wrong and you’re borrowing from the future a debt that must come due. Energy is not free. We tend to treat compute like it is.
Maybe I’m in a big club but I have a vision for a radically different architecture that fixes all of this and I wish that got 1/2 the attention these bandaids did. Plan 9 is an example of the theme if not the particular set of solutions I’m referring to.
Linux user space decided to try and share dependencies. Docker obliterates this design goal by shipping dependencies, but stuffing them into the filesystem as-if they were shared.
If you’re going to do this then a far far far simpler solution is to just link statically or ship dependencies adjacent to the binary. (Aka what windows does). Replicating a faux “shared” filesystem is a gross hack.
This is a distinctly Linux problem. Windows software doesn’t typically have this issue. Because programs ship their dependencies and then work.
Docker is one way to ship dependencies. So it’s not the worst solution in the world. But I swear it’s a bad solution. My blood boils with righteous fury anytime anyone on my team mentions they have a 15 minute docker build step. And don’t you damn dare say the fix to Docker being slow is to add more layers of complexity with hierarchical Docker images ohmygodiswear. Running a computer program does not have to be hard I promise!!
The more recent half of my career has been more focused on ML and now robotics. Python ML is absolute clusterfuck. It is close to getting resolved with UV and Pixi. The trick there is to include your damn dependencies… via symlink to a shared cache.
Any program or pipeline that relies on whatever arbitrary ass version of Python is installed on the system can die in a fire.
That’s mostly about deploying. We can also talk about build systems.
The one true build system path is a monorepo that contains your damn dependencies. Anything else is wrong and evil.
I’m also spicy and think that if your build system can’t crosscompile then it sucks. It’s trivial to crosscompile for Windows from Linux because Windows doesn’t suck (in this regard). It almost impossible to crosscompile to Linux from Windows because Linux userspace is a bad, broken, failed design. However Andrew Kelley is a patron saint and Zig makes it feasible.
Use a monorepo, pretend the system environment doesn’t exist, link statically/ship adjacent so/dll.
Docker clearly addresses a real problem (that Linux userspace has failed). But Docker is a bad hack. The concept of trying to share libraries at the system level has objectively failed. The correct thing to do is to not do that, and don’t fake a system to do it.
Windows may suck for a lot of reasons. But boy howdy is it a whole lot more reliable than Linux at running computer programs.