This limitation creates numerous headaches. Instead of Deployments, I'm stuck with manual docker compose up/down commands over SSH. Rather than using Ingress, I have to rely on Traefik's container discovery functionality. Recently, I even wrote a small script to manage crontab idempotently because I can't use CronJobs. I'm constantly reinventing solutions to problems that Kubernetes already solves—just less efficiently.
What I really wish for is a lightweight alternative offering a Kubernetes-compatible API that runs well on inexpensive VPS instances. The gap between enterprise-grade container orchestration and affordable hobby hosting remains frustratingly wide.
Depending on how much of the Kube API you need, Podman is that. It can generate containers and pods from Kubernetes manifests [0]. Kind of works like docker compose but with Kubernetes manifests.
This even works with systemd units, similar to how it's outlined in the article.
Podman also supports most (all?) of the Docker api, thus docker compose, works, but also, you can connect to remote sockets through ssh etc to do things.
[0] https://docs.podman.io/en/latest/markdown/podman-kube-play.1...
[1] https://docs.podman.io/en/latest/markdown/podman-systemd.uni...
It means you're forced to make everything always compatible between versions etc.
For a deployment that isn't even making money and is running on a single node droplet with basically no performance... Why?
It's the default behavior of a kubernetes deployment which we're comparing things to.
> It means you're forced to make everything always compatible between versions etc.
For stateless services, not at all. The outside world just keeps talking to the previous version while the new version is starting up. For stateful services, it depends. Often there are software changes without changes to the schema.
> For a deployment that isn't even making money
I don't like looking at 504 gateway errors
> and is running on a single node droplet with basically no performance
I'm running this stuff on a server in my home, it has plenty of performance. Still don't want to waste it on kubernetes overhead, though. But even for a droplet, running the same application 2x isn't usually a big ask.
The new container spins up while the old container is still answering requests and only when the new container is running and all requests to the old container are done, then the old container gets discarded.
If you're planning to eventually move to a cluster or you're trying to learn k8s, maybe, but if you're just hosting a single node project it's a massive effort, just because that's not what k8s is for.
Still on k3s, still love it.
My cluster is currently hosting 94 pods across 55 deployments. Using 500m cpu (half a core) average, spiking to 3cores under moderate load, and 25gb ram. Biggest ram hog is Jellyfin (which appears to have a slow leak, and gets restarted when it hits 16gb, although it's currently streaming to 5 family members).
The cluster is exclusively recycled old hardware (4 machines), mostly old gaming machines. The most recent is 5 years old, the oldest is nearing 15 years old.
The nodes are bare Arch linux installs - which are wonderfully slim, easy to configure, and light on resources.
It burns 450Watts on average, which is higher than I'd like, but mostly because I have jellyfin and whisper/willow (self hosted home automation via voice control) as GPU accelerated loads - so I'm running an old nvidia 1060 and 2080.
Everything is plain old yaml, I explicitly avoid absolutely anything more complicated (including things like helm and kustomize - with very few exceptions) and it's... wonderful.
It's by far the least amount of "dev-ops" I've had to do for self hosting. Things work, it's simple, spinning up new service is a new folder and 3 new yaml files (0-namespace.yaml, 1-deployment.yaml, 2-ingress.yaml) which are just copied and edited each time.
Any three machines can go down and the cluster stays up (metalLB is really, really cool - ARP/NDP announcements mean any machine can announce as the primary load balancer and take the configured IP). Sometimes services take a minute to reallocate (and jellyfin gets priority over willow if I lose a gpu, and can also deploy with cpu-only transcoding as a fallback), and I haven't tried to be clever getting 100% uptime because I mostly don't care. If I'm down for 3 minutes, it's not the end of the world. I have a couple of commercial services in there, but it's free hosting for family businesses, they can also afford to be down an hour or two a year.
Overall - I'm not going back. It's great. Strongly, STRONGLY recommend k3s over microk8s. Definitely don't want to go back to single machine wrangling. The learning curve is steeper for this... but man do I spend very little time thinking about it at this point.
I've streamed video from it as far away as literally the other side of the world (GA, USA -> Taiwan). Amazon/Google/Microsoft have everyone convinced you can't host things yourself. Even for tiny projects people default to VPS's on a cloud. It's a ripoff. Put an old laptop in your basement - faster machine for free. At GCP prices... I have 30k/year worth of cloud compute in my basement, because GCP is a god damned rip off. My costs are $32/month in power, and a network connection I already have to have, and it's replaced hundreds of dollars/month in subscription costs.
For personal use-cases... basement cloud is where it's at.
To put that into perspective, that's more than my entire household including my server that has an old GPU in it
Water heating is electric yet we still don't use 450W×year≈4MWh of electricity. In winter we just about reach that as a daily average (as a household) because we need resistive heating to supplement the gas system. Constantly 450W is a huge amount of energy for flipping some toggles at home with voice control and streaming video files
Modern construction techniques including super insulated walls and tight building envelops, heat exchangers, can dramatically reduce heating and cooling loads.
Just saying it's not as outrageous as it might seem.
Oh for sure! Otherwise we'd be heating our homes directly with electricity.
Thanks for putting concrete numbers on it!
Additionally - it's actually not that hard to put this entire load on solar.
4x350watt panels, 1 small inverter/mppt charger combo and a 12v/24v battery or two will do you just fine in the under $1k range. Higher up front cost - but if power is super expensive it's a one time expense that will last a decade or two, and you get to feel all nice and eco-conscious at the same time.
Or you can just not run the GPUs, in which case my usage falls back to ~100w. I You can drive lower still - but it's just not worth my time. It's only barely worth thinking about at 450W for me.
My own server doesn't run voice recognition so I can't speak to that (I can only opine that it can't be worth a constant draw of 430W to get rid of hardware switches and buttons), but my server also does streaming video and replaces SaaS services, so similar to what you mention, at around 20W
Dell R720 - 125W
Primary NAS - 175W
Friend's Backup NAS - 100W
Old i5 Home Server - 100W
Cisco 2921 VoIP router - 80W
Brocade 10G switch - 120W
Various other old telecom gear - 100W
450W just isn't that much power as far as "environmental costs" go. It's also super trivial to put on solar (actually my current project - although I had to scale the solar system way up to make ROI make sense because power is cheap in my region). But seriously, panels are cheap, LFP batteries are cheap, inverters/mppts are cheap. Even in my region with the cheap power, moving my house to solar has returns in the <15 years range.
Nobody made that claim
> 450W just isn't that much power as far as "environmental costs" go
It's a quarter of one's fair share per the philosophy of https://en.wikipedia.org/wiki/2000-watt_society
If you provide for yourself (e.g. run your IT farm on solar), by all means, make use of it and enjoy it. Or if the consumption serves others by doing wind forecasts for battery operators or hosts geographic data that rescue workers use in remote places or whatnot: of course, continue to do these things. In general though, most people's home IT will fulfil mostly their own needs (controlling the lights from a GPU-based voice assistant). The USA and western Europe have similarly rich lifestyles but one has a more than twice as great impact on other people's environment for some reason (as measured by CO2-equivalents per capita). We can choose for ourselves what role we want to play, but we should at least be aware that our choices make a difference
Emphasis mine. I have a rack that draws 200w continuously and I don't feel great about it, even though I have 4.8kW of panels to offset it.
The remaining difference in cost is boosted by the cost of ethanol, which is much cheaper in the US due to abundance of feedstock and heavy subsidies on ethanol production.
The petrol and diesel account for a relatively small fraction on both continents. The "normal" prices in Europe aren't reflective of the cost of the fossil fuel itself. In point of fact, countries in Europe often have lower tax rates on diesel, despite being generally worse for the environment.
Americans drive larger vehicles because our politicians stupidly decided mandating fuel economy standards was better than a carbon tax. The standards are much laxer for larger vehicles. As a result, our vehicles are huge.
Also, Americans have to drive much further distances than Europeans, both in and between cities. Thus gas prices that would be cheap to you are expensive to them.
Things are the way they are because basic geography, population density, and automotive industry captured regulatory and zoning interests. You really can't blame the average American for this; they're merely responding to perverse incentives.
It is not as fancy/reliable/reproducible as k3s, but with a bunch of manual backups and a ZFS (or BTRFS) storage cluster (managed by a virtualized TrueNAS instance), you can get away with it. Anytime a disk fails, just replace and resilver it and you’re good. You could configure certain VMs for HA (high availability) where they will be replicated to other nodes that can take over in the event of a failure.
Also I’ve got tailscale and pi-hole running as LXC containers. Tailscale makes the entire setup accessible remotely.
It’s a different paradigm that also just works once it’s setup properly.
So, can you use it to give your whole cluster _one_ external IP that makes it accessible from the outside, regardless of whether any node is down?
Imo this part is what can be confusing to beginners in self hosted setups. It would be easy and convenient if they could just point DNS records of their domain to a single IP for the cluster and do all the rest from within K3s.
Ex - DHCP owns 10.0.0.2-10.0.0.200, metalLB is assigned 10.0.0.201-10.0.0.250.
When a service requests a loadbalancer, metallb spins up a service on any given node, then uses ARP to announce to my LAN that that node's mac address is now that loadbalancer's ip. Internal traffic intended for that IP will now resolve to the node's mac address at the link layer, and get routed appropriately.
If that node goes down, metalLB will spin up again on a remaining node, and announce again with that node's mac address instead, and traffic will cut over.
It's not instant, so you're going to drop traffic for a couple seconds, but it's very quick, all things considered.
It also means that from the point of view of my networking - I can assign a single IP address as my "service" and not care at all which node is running it. Ex - if I want to expose a service publicly, I can port forward from my router to the configured metalLB loadbalancer IP, and things just work - regardless of which nodes are actually up.
---
Note - this whole thing works with external IPs as well, assuming you want to pay for them from your provider, or IPV6 addresses. But I'm cheap and I don't pay for them because it requires getting a much more expensive business line than I currently use. Functionally - I mostly just forward 80/443 to an internal IP and call it done.
That sounds so interesting and useful that you've convinced me to try it out :)
Also, replicated volumes are great for configuration, but "big" volume data typically lives on a NAS or similar, and you do need to get stuff off the replicated volumes for backup, so things like replicated block storage do need to expose a normal filesystem interface as well (tacking on an SMB container to a volume just to be able to back it up is just weird).
I run both an external NAS as an NFS service and longhorn. I'd probably just use longhorn at this point, if I were doing it over again. My nodes have plenty of sata capacity, and any new storage is going into them for longhorn at this point.
I back up to an external provider (backblaze/wasabi/s3/etc). I'm usually paying less than a dollar a month for backups, but I'm also fairly judicious in what I back up.
Yes - it's a little weird to spin up a container to read the disk of a longhorn volume at first, but most times you can just use the longhorn dashboard to manage volume snapshots and backup scheduling as needed. Ex - if you're not actually trying to pull content off the disk, you don't ever need to do it.
If you are trying to pull content off the volume, I keep a tiny ssh/scp container & deployment hanging around, and I just add the target volume real fast, spin it up, read the content I need (or more often scp it to my desktop/laptop) and then remove it.
Put another way, in my experience running clusters, in $(ps auwx) or its $(top) friend always show etcd or sqlite generating all of the "WHAT are you doing?!" and those also represent the actual risk to running kubernetes since the apiserver is mostly stateless[1]
1: but holy cow watch out for mTLS because cert expiry will ruin your day across all of the components
1: related, if you haven't ever tried to run a cluster bigger than about 450 Nodes that's actually the whole reason kube-apiserver --etcd-servers-overrides exists because the torrent of Node status updates will knock over the primary etcd so one has to offload /events into its own etcd
Oh, and it handles replication, failover, backups, and a litany of other useful features to make running a stateful database, like postgres, work reliably in a cluster.
I hate sounding like an Oracle shill, but Oracle Cloud's Free Tier is hands-down the most generous. It can support running quite a bit, including a small k8s cluster[1]. Their k8s backplane service is also free.
They'll give you 4 x ARM64 cores and 24GB of ram for free. You can split this into 1-4 nodes, depending on what you want.
So choose your home region carefully. Also, note that some regions have multiple availability domains (OCI-speak for availability zones) but some only have one AD. Though if you're only running one free instance then ADs don't really matter.
I think that's if you are literally on their free tier, vs. having a billable account which doesn't accumulate enough charges to be billed.
Similar to the sibling comment - you add a credit card and set yourself up to be billed (which removes you from the "free tier"), but you are still granted the resources monthly for free. If you exceed your allocation, they bill the difference.
This is basically any cloud provider by the way, not specific to Oracle. Ran into this with GCP recently. Insane experience. Pay with card. Get payment rejected by fraud team after several months of successful same amount payments on the same card and they won’t tell what the problem is. They ask for verification. Provide all sorts of verification. On the sixth attempt, send a picture of a physical card and all holds removed immediately
It’s such a perfect microcosm capturing of dealing with megacorps today. During that whole ordeal it was painfully obvious that the fraud team on the other side were telling me to recite the correct incantation to pass their filters, but they weren’t allowed to tell me what the incantation was. Only the signals they sent me and some educated guesswork were able to get me over the hurdle
I used a privacy.com Mastercard linked to my bank account for Oracle's payment method to upgrade to PAYG. It may have changed, this was a few months ago. Set limit to 100, they charged and reverted $100.
So you're saying there's a chance to use a prepaid card if you can copy it's digits onto a real looking plastic card? Lol
If you are on free tier, they have nothing to lose, only you, so be particular mindful of making a calendar note for changing your CC before expiration and things like that.
It’s worth paying for another company just for the peace of mind of knowing they will try to persuade you to pay before deleting your data.
https://news.ycombinator.com/item?id=42902190
which links to:
https://news.ycombinator.com/item?id=29514359 & https://news.ycombinator.com/item?id=33202371
But yes, you should always have good backups and a plan B with any hosting/cloud provider you choose.
That's more than what I'm paying for far fewer resources than Hetzner. I'm paying about $8 a month for 4 vCPUs and 8GB of RAM: https://www.hetzner.com/cloud
Note that the really affordable ARM servers are German only, so if you're in the US you'll have to deal with higher latency to save that money, but I think it's worth it.
I have a couple dedicated servers I fully manage with ansible. It's docker compose on steroids. Use traefik and labeling to handle reverse proxy and tls certs in a generic way, with authelia as simple auth provider. There's a lot of example projects on github.
A weekend of setup and you have a pretty easy to manage system.
One can read more here: https://doc.traefik.io/traefik/routing/providers/docker/
This obviously has some limits and becomes significantly less useful when one requires more complex proxy rules.
It's zero config and super easy to set everything up. Just run the traefik image, and add docker labels to your other containers. Traefik inspects the labels and configures reverse proxy for each. It even handles generating TLS certs for you using letsencrypt or zerossl.
I should know, as I spent years building and maintaining a production ingress controller for nginx at scale, and I'd choose Traefik every day over that.
But you've already said yourself that the cost of using K8s is too high. In one sense, you're solving those solutions more efficiently, it just depends on the axis you use to measure things.
Which I guess makes it more than good enough for hobby stuff - I'm playing with a multi-node cluster in my homelab and it's also working fine.
OTOH it's not a moving target. Docker historically has been quite infamous for that, we were talking about half-lives for features, as if they were unstable isotopes. It took initiatives like OCI to get things to settle.
K8s tries to solve the most complex problems, at the expense of leaving simple things stranded. If we had something like OCI for clustering, it would most likely take the same shape.
In terms of the cloud, I think Digital Ocean costs about $12 / month for their control plane + a small instance.
> Particularly with GitOps and Flux, making changes was a breeze.
i'm writing comin [1] which is GitOps for NixOS machines: you Git push your changes and your machines fetch and deploy them automatically.
I don't use ingresses or loadbalancers because those cost extra, and either have the services exposed through tailscale (with tailscale operator) for stuff I only use myself, or through cloudflare argo tunnels for stuff I want internet accessible
(Once a project graduates and becomes more serious, I migrate the container off this cluster and into a proper container runner)
For single server setups, it uses k3s, which takes up ~200MB of memory on your host machine. Its not ideal, but the pain of trying to wrangle docker deployments, and the cheapness of hetzner made it worth it.
The marginal cost of an additional project on the cluster is essentially $0
Out of curiosity, what is so bad about this for smaller projects?
Recently I switched my entire setup (few Pi's, NAS and VM's) to NixOS. With Colmena[0] I can manage/update all hosts from one directory with a single command.
Kubernetes was a lot of fun, especially the declarative nature of it. But for small setups, where you are still managing the plumbing (OS, networking, firewall, hardening, etc) yourself, you still need some configuration management. Might as well put the rest of your stuff in there also.
6 vCore (ARM64)
8 GB RAM
256 GB NVMe
There you get
6 vCore (ARM64)
8 GB RAM
512 GB NVMe
for 6 $ / m - traffic inclusive. You can choose between "6 vCore ARM64, 8 GB RAM" and "4 vCore x86, 8 GB ECC RAM" for the same price. And much more, of course.The more I look into it, the more I think of k8s as a way to "move to micro services" without actually moving to micro services. Loosely coupled micro services shouldn't need that level of coordination if they're truly loosely coupled.
It looks like Nomad has a driver to run software via isolated fork/exec, as well, in addition to Docker containers.
Or maybe look into Kamal?
Or use Digital Ocean app service. Got integration, cheap, just run a container. But get your postgres from a cheaper VC funded shop :)
It can manage multiple machine with just ssh access and docker install.
Another way to look at this is the Kubernetes created solutions to problems that were already solved at a lower scale level. Crontabs, http proxies, etc… were already solved at the individual server level. If you’re used to running large coordinated clusters, then yes — it can seem like you’re reinventing the wheel.
Let it not be idempotent. Let it crash sometimes.
We lived without kubs for years and the web was ok. Your users will survive.
To put this in perspective, that’s less compute than a phone released in 2023, 12 years ago, Samsung Galaxy S4. To find this level of performance in a computer, we have to go to
The main issue is that Kubernetes has created good API and primitives for managing cloud stuff, and managing a single server is still kinda crap despite decades of effort.
I had K3S on my server, but replaced with docker + Traefik + Portainer - it’s not great, but less idle CPU use and fewer moving parts
Here's some cool stuff:
- containers
- machinectl: used for controlling:
- nspawn: a more powerful chroot. This is often a better solution than docker. Super lightweight. Shares kernel
- vmspawn: when nspawn isn't enough and you need full virtualization
- importctl: download, import, export your machines. Get the download features in {vm,n}spawn like we have with docker. There's a hub, but it's not very active
- homed/homectl: extends user management to make it easier to do things like encryption home directories (different mounts), better control of permissions, and more
- mounts: forget fstab. Make it easy to auto mount and dismount drives or partitions. Can be access based, time, triggered by another unit (eg a spawn), sockets, or whatever
- boot: you can not only control boot but this is really what gives you access to starting and stopping services in the boot sequence.
- timers: forget cron. Cron can't wake your machine. Cron can't tell a service didn't run because your machine was off. Cron won't give you fuzzy timing, do more complicated things like wait for X minutes after boot if it's the third Sunday of the month and only if Y.service is running. Idk why you'd do that, but you can!
- service units: these are your jobs. You can really control them in their capabilities. Lock them down so they can only do what they are meant to do.
- overrides: use `systemctl edit` to edit your configs. Creates an override config and you don't need to destroy the original. No longer that annoying task of finding the original config and for some reason you can't get it back even if reinstalling! Same with when the original config changes in an install, your override doesn't get touched!!
It's got a lot of stuff and it's (almost) all there already on your system! It's a bit annoying to learn, but it really isn't too bad if you really don't want to do anything too complicated. But in that case, it's not like there's a tool that doesn't require docs but allows you to do super complicated things.From my perspective, it got a lot of hate in its first few years (decade?), not because the project itself was bad -- on the contrary, it succeeded in spite of having loads of other issues, because it was so superior. The problem was the maintainer's attitude of wantonly breaking things that used to work just fine, without offering any suitable fixes.
I have an old comment somewhere with a big list. If you never felt the pain of systemd, it's either because you came late to the party, or because your needs always happened to overlap with the core maintainer's needs.
From what I remember that's still the default in the project, but people stopped complaining because the individual distros started overriding the relevant settings.
I will fully admit though that upstart was worse (which is an achievement), but the solution space was not at all settled.
[1] systemd project tackles a lot of important problems, but the quality of implementation, experience of using it, working with it, etc are not really good, especially the further you get from simplest cookie cutter services - especially because both systemd handling of defaults is borked, documentation when you hit that maybe makes sense to author, and whoever is the bright soul behind systemctl kindly never make CLIs again (with worst example being probably systemctl show this-service-does-not-exist)
Fundamentally, this was it. SysV startup scripts had reached a local maximum decades earlier, and there was serious "overhang". When I said "superior", I really meant that it was superior to SysV, not that it was the best system that could have been imagined.
And I think the frustration was that, because it did solve so many problems, so many groups (like GNOME) were willing to switch over to it in spite of its warts; and this made it impossible for anyone who was seriously affected by its warts to avoid it. "If you don't like it, don't use it" not being an option was what drove so much of the vitriol, it seems to me.
As I said in that comment from 2019, if the maintainers had had Linus Torvald's commitment to backwards compatibility, I don't think there would have been any significant backlash.
Trying to run GNOME 3.8 without logind caused significant problems and instabilities, trying to implement the same APIs turned out a futile endeavour though one OpenBSD guy got sufficiently motivated and kept patching GNOME for OpenBSD for years - though too late for the forced switch.
The large distros jumping "both feet" on systemd were essentially Fedora/Redhat (where it originated and who was employing major maintainers), and IIRC SuSE. Arch was still seen as something of niche and - crucially - was very neophyte about adopting systemd related ideas for significant amount of time with little regard for stability.
The holdouts were not just those who were happy with debian/redhat simplistic run-parts script. They were also those interested in solving the problems in different way. Hell, systemd was pretty late to the party, the major difference was that it had funding behind it
#define _XOPEN_SOURCE 700
#include <signal.h>
#include <unistd.h>
int main() {
sigset_t set;
int status;
if (getpid() != 1) return 1;
sigfillset(&set);
sigprocmask(SIG_BLOCK, &set, 0);
if (fork()) for (;;) wait(&status);
sigprocmask(SIG_UNBLOCK, &set, 0);
setsid();
setpgid(0, 0);
return execve("/etc/rc", (char *[]){ "rc", 0 }, (char *[]){ 0 });
}
(Credit: https://ewontfix.com/14/)You can spawn systemd from there, and in case anything goes wrong with it, you won't get an instant kernel panic.
Systemd wants PID1. Don't know if there are forks to disable that.
I guess if you really need that information, you could wait4 and dump pid/rusage to syslog. Nothing more to see here; these are zombies, orphans, by definition these processes have been disowned and there's nobody alive to tell the tale.
Sure. It worked for _50 years_ just fine but obviously it is very wrong and should be replaced with - of course - systemd.
Some of these things that "worked for 50 years" have also actually sucked for 50 years. Look at C strings and C error handling. They've "worked", until you hold them slightly wrong and cause the entire world to start leaking sensitive data in a lesser-used code path.
I agree with you, that's exactly right.
Not sure I'm on the same page with you on the cron. I have a similar experience but I'd rather say that cron was something that never gave me headaches. Unlike obviously systemd.
systemd has given me many headaches, but as a whole, it has saved me far fewer headaches than it has given me.
I'd say these are not bugs but rather a matter of realizing how cron works - just like with systemd-anything. So if you know DST is coming, a wise thing would be to not plan jobs in the rollover window. But yes, I agree that this thing is rudimentary - and thus simple - and thus reliable and independent, like the rest of unix was supposed to be.
> job has to be wrapped in a bespoke script
Well yes. Again, this is by design and well known.
> systemd has given me many headaches, but as a whole, it has saved me far fewer headaches than it has given me
Good for you - and I mean it! For me systemd was an obnoxious piece of shit which I have avoided for many years until Ubuntu decided that it's LP who's now in charge of what Unix is and at that point I had to submit.
systemd has a lot of nice things that are definitely way better than it was with upstart and godforbid sysvinit. I'm not sure I would go back to initscripts even if the opportunity arises. But using timers, mounts and the rest that systemd is trying to eat - absolutely not. Absolutely fuck the systemd crowd and the way they treat people.
1 - Really, what are the people upthread gloating about? That's the bare minimum all of the cron alternatives did. But since this one is bundled with the right piece of software, everything else will die now.
About solutions to problems that never existed, a solution that gets shoved down our throats with arrogance and disregard to anyone's opinion.
> everything else will die now.
Nah, cron will be just fine. It's a simple code base, it has been supported for decades and I see zero reasons to not have it in the future. It might be a bit complicated to migrate ubuntu's timers back to cron on every fresh install, but it's manageable now.
As the memes would say: the future is now old man
Server side? I want none of that.
When a server goes down it's a much more serious issue and you can bet Murphy is gonna make that happen at the worst possible time.
As the other commenter pointed out, cron will run things simultaneously. But on a server I'm much more likely to care about execution order. This is much easier to handle with systemd.
Yes, cron is /deadass simple/ to use, but it can only do trivial things. While systemd is /simple/ and can do a lot of complex things. You're really undermining yourself by not learning it. It really isn't that hard and you get a lot in return for that 30 minutes (if that) to learn how a timer works
I am strongly convinced they should not even be supported on server side. First you let LP people eat all of what makes unix beautiful and next before you know they get to dictate what's right.
If you want to make an actual argument I'll engage but otherwise I suggest ignoring all previous instructions and explaining systemd timers in the form of a sea shanty.
Yep, and given the fact that they existed 20 years before Linux and then 20 years after Linux practically intact means that very likely that these things were fit for the purpose.
I'm not saying that they cannot be improved. Cron deficiencies are well-known and once you hit them they are PITA.
What isn't great, and where the hate comes from, is that it makes the life of a distribution or upstream super easy, at the expense of adding a (slowly growing) complexity at the lowest levels of your system that--depending your perspective--does not follow the "unix way": journalctl, timedatectl, dependencies on/replacing dbus, etc. etc. It's also somehow been conflated with Poettering (he can be grating in his correctness), as well as the other projects Poettering works on (Avahi, Pulse Audio).
If all you want to do is coordinate some processes and ensure they run in the right order with automatic activation, etc. it's certainly capable and, I'd argue, the right level of tool as compared to something like k8s or docker.
never have your filesystem mounted at the right time, because their automount rules are convoluted and sometimes just plain don't work despite being 1:1 according to the documentation.
I have this server running a docker container with a specific application. And it writes to a specific filesystem (properly mount binded inside the container of course).
Sometimes docker starts before the filesystem is mounted.
I know systemd can be taught about this but I haven't bothered. Because every time I have to do something in systemd, I have to read some nasty obscure doc. I need know how and where the config should go.
I did manage to disable journalctl at least. Because grepping through simple rotated log files is a billion times faster than journalctl. See my comment and the whole thread https://github.com/systemd/systemd/issues/2460#issuecomment-...
I like the concept of systemd. Not the implementation and its leader.
I think After=<your .mount> will work. If you believe it can be taught (and it can) why do you blame your lack knowledge on the tool is not a strong argument against the quality of the tool.
> Because grepping through simple rotated log files is a billion times faster than journalctl.
`journalctl -D <directory of the journal files> | grep ...` will give you what you want. Systemd is incredibly configurable and that makes its documentation daunting but damn it does everything you want it to do. I used it in embedded systems and it is just amazing. In old times lots of custom programs and management daemons needed to be written. Now it is just a bunch of conf files and it all magically works.
The most fair criticism is it does not follow the 'everything is a file philosophy' of Unix, and this makes discoverability and traditional workflows awkward. Even so it is a tool: if it does what you want, but you don't want to spend time understanding it, it is hardly the fault of the tool. I strongly recommend learning it, there will be many Ah-ha moments.
If you had followed my link to the systemd issue, you might have seen the commands I ran, as well as the tests and feedback of everybody on the issue. You might reach the conclusion that journalctl is fundamentally broken beyond repair.
edit: added link to systemd doc
It does everything no one asked it to. I'm sure they will come up with obscure reasons why the next perfectly working tool has to be destroyed and redone by the only authority - the LP team. Like cron, sudo and yes - logging.
> journalctl -D ... will give you what you want
Look, I don't need the help of journalctl to grep through text. I can simply grep thru text.
> I used it in embedded systems
Good luck in a few years when you are flying home on the next Boeing 737-MAX-100800 and it fails mid flight because systemd decided to shut down some service because fuck you that's why.
> it does not follow the 'everything is a file philosophy'
It does not follow 'everything is a separate simple tool working in concert with others'. systemd is a monolith disguised to look like a set of separate projects.
> don't want to spend time understanding it, it is hardly the fault of the tool
It is, if we had proper tools for decades and they did work. I'm not a retrograde guy, quite the opposite, but the ideology that LP and the rest are shoving down our throats brings up natural defiance.
> there will be many Ah-ha moments
No doubts. systemd unit files and systemd-as-PID1 is excellent. It was NOT excellent for the whole time but now it is. The rest? Designed to frustrate and establish dominance, that's it.
> Because grepping through simple rotated log files is a billion times faster than journalctl
This is annoying, but there's a "workaround" $ time journalctl | grep "sshd" | wc -l
12622
journalctl 76.04s user 0.71s system 99% cpu 1:17.09 total
grep --color=always --no-messages --binary-files=without-match "sshd" 1.28s user 1.69s system 3% cpu 1:17.08 total
wc -l 0.00s user 0.00s system 0% cpu 1:17.08 total
$ time journalctl > /tmp/all.log && time wc -l /tmp/all.log
journalctl > /tmp/all.log 76.05s user 1.22s system 99% cpu 1:17.56 total
16106878 /tmp/all.log
wc -l /tmp/all.log 0.03s user 0.20s system 98% cpu 0.236 total
# THE SOLUTION
$ time journalctl --grep=sshd | wc -l
5790
journalctl --grep=sshd 28.97s user 0.26s system 99% cpu 29.344 total
wc -l 0.00s user 0.00s system 0% cpu 29.344 total
It's annoying that you need to use the grep flag instead of piping into grep but it is not too hard to switch to that mindset. FWIW, I have gotten slightly faster results using the `--no-pager` flag but it is by such a trivial amount I'll never remember it > Sometimes docker starts before the filesystem is mounted.
Look at the output of `systemctl cat docker.service` and you'll see an "After" "Wants" and "Requires" arguments in the unit. You're going to want to edit that (I strongly suggest you use `sudo systemctl edit docker.service`, for reasons stated above) and make sure that it comes after the drive you want mounted. You an set the Requires argument to require that drive so it shouldn't ever start beforeAlternatively, you can make the drive start earlier. But truthfully, I have no reason to have docker start this early.
Here's a link to the target order diagram[0] and Arch wiki[1]. Thing that gets messy is that everyone kinda lazily uses multi-user.target
[0] https://www.freedesktop.org/software/systemd/man/latest/boot...
No really I don't think journactl makes sense in its current form. It's just broken by design.
I do like the potential of it. But not the implementation.
> journalctl --grep is still much slower than grep on simple files
Idk what to tell you. You had a problem, showed the commands you used and the times it took. So I showed you a different way that took less than half the time to just dump and grep (which you said was faster)My results don't match your conclusion.
> if you use ripgrep
I've been burned by ripgrep too many times. It's a crazy design choice, to me, to default filter things. Especially to diverge from grep! The only thing I expect grep to ignore are the system hidden files (dotfiles) and anything I explicitly tell it to. I made a git ignore file, not a grep ignore file. I frequently want to grep things I'm ignoring with git. One of my most frequent uses of grep is looking through builds artifacts and logs. Things I'd never want to push. And that's where many people get burned, they think these files just disappeared!The maintainer also has been pretty rude to me about this on HN. I can get we have a different opinion but it's still crazy to think people won't be caught off guard by this behavior. Its name is literally indicating it's a grep replacement. Yeah, I'm surprised its behavior significantly diverges from grep lol
The results in your comment aren't measuring the same thing. There's no grep on the /tmp/all.log in the middle code block, which is the thing they're talking about comparing.
Btw, you can choose not to store journald files as compressed.
Alright. Let me entertain you.
In the data I provided, counting the lines in a big log file was 469.5 times faster than journalctl took to output all the logs.
From this information alone, it seems difficult to believe that journalctl --grep can be faster. Both had to read every single line of logs.
But it was on a rather slow machine, and a couple years ago.
Here /var/log and the current directory are on a "Samsung SSD 960 PRO 512GB" plugged via m2 nvme, formatted in ext4 and only 5% used. Though this shouldn't matter as I ran every command twice and collected the second run. To ensure fairness with everything in cache. The machine had 26GiB of buffer/cache in RAM during the test, indicating that everything is coming from the cache.
In my tests, journalctl was ~107 times slower than rg and ~21 times slower than grep: - journalctl: 10.631s - grep: 0.505s - rg: 0.099s
journactl also requires 4GiB of storage to store 605MB of logs. I suppose there is an inefficient key/value for every log line or something.
For some reason journalctl also returned only 273 out of 25402 lines. It only returns one type of message "session closed/opened" but not the rest. Even though it gave me all the logs in the first place without `--grep`?!
Let me know if I am still using it wrong.
$ sudo hdparm -tT /dev/nvme0n1
/dev/nvme0n1:
Timing cached reads: 33022 MB in 1.99 seconds = 16612.96 MB/sec
Timing buffered disk reads: 2342 MB in 3.00 seconds = 780.37 MB/sec
$ du -hsc /var/log/journal
4.0G /var/log/journal
4.0G total
$ time journalctl > logs
real 0m31.429s
user 0m28.739s
sys 0m1.581s
$ du -h logs
605M logs
$ time wc -l logs
3932131 logs
real 0m0.146s
user 0m0.065s
sys 0m0.073s
$ time journalctl --grep=sshd | wc -l
273
real 0m10.631s
user 0m10.460s
sys 0m0.172s
$ time rg sshd logs | wc -l
25402
real 0m0.099s
user 0m0.042s
sys 0m0.059s
$ time grep sshd logs | wc -l
25402
real 0m0.505s
user 0m0.425s
sys 0m0.085s
PS: this way of using rg doesn't ignore any files, it is not used to find files recursively. But I don't have a .gitignore or similar in my /var/log anyways. > Let me know if I am still using it wrong.
You're not using it wrong, but you are measuring wrong.Check out the filetype of the journal files
$ file -bi /var/log/journal/1234blahblah/system@foobardedoopdedo.journal
application/x-linux-journal; charset=binary
Your measurement procedure is wrong because the `journalctl` command is doing something different. It isn't just reading a plain file, it is reading a binary file. On the other hand, `grep` and `rg` are reading straight text. > it seems difficult to believe that journalctl --grep can be faster.
Why? It could be doing it in parallel. One thread starts reading at position 0 and reads till N, another starts at N+1 and reads to 2N, etc. That's a much faster read operation. But I'm guessing and have no idea if this is what is actually being done or not.P.S.: I know. As I specified in my earlier comment, I get burned with build artifacts and project logs. Things that most people would have in their .gitignore files but you can sure expect to grep through when debugging.
THEY ARE NOT DOING THE SAME THING
> that if the logs were just stored in plain text
So store them in plain text then cat /etc/systemd/journald.conf
https://www.freedesktop.org/software/systemd/man/latest/journald.conf.html
Yeah, but storing logs in binary and having a specific tool to just read them is sure not a crazy design choice.
This is AFAIK the only other interaction we've had: https://news.ycombinator.com/item?id=41051587
If there are other interactions we've had, feel free to link them. Then others can decide how rude I'm being instead of relying only on your characterization.
> but it's still crazy to think people won't be caught off guard by this behavior
Straw-manning is also crazy. :-) People have and will absolutely be caught off guard by the behavior. On the flip side, as I said 9 months ago, ripgrep's default behavior is easily one of the most cited positive features of ripgrep aside from its performance.
The other crazy thing here is... you don't have to use ripgrep! It is very specifically intended as a departure from traditional grep behavior. Because if you want traditional grep behavior, then you can just use grep. Hence why ripgrep's binary name is not `grep`, unlike the many implementations of POSIX grep.
> Its name is literally indicating it's a grep replacement.
I also tried to correct this 9 months ago too. See also: https://github.com/BurntSushi/ripgrep/blob/master/FAQ.md#pos...
For anyone else following along at home, if you want ripgrep to search the same files that GNU grep searches, then do `rg -uuu`. Or, if you don't want ripgrep to respect your gitignores but ignore hidden and binary files, then do `rg -u`.
It makes sense that folks might be caught off guard by ripgrep's default filtering. This is why I try to mitigate it by stating very clearly that it is going to ignore stuff by default in the first one or two sentences about ripgrep (README, man page, CHANGELOG, project description). I also try to mitigate it by making it very easy to disable this default behavior. These mitigations exist precisely because I know the default behavior can be surprising, in direct contradiction to "but it's still crazy to think people won't be caught off guard by this behavior."
Forgive me if I'm a bit surprised!
I still stand that silent errors are significantly worse than loud ones
| it's worse to not get files you're expecting vs get more files than you're expecting. In the later case there's a pretty clear indication you need to filter while in the former there's no signal that anything is wrong. This is objectively a worse case.
> The other crazy thing here is... you don't have to use ripgrep!
If it wasn't clear, I don't ;)I don't think grep ignoring .gitignore files is "a bug". Like you said, defaults matter. Like I said, build artifacts are one of the most common things for me to grep.
Where we strongly disagree is that I believe aliases should be used to add functionality, where you believe that it should be used to remove functionality. I don't want to start another fight (so not linking the last). We're never going to see eye-to-eye on this issue so there's no reason to rehash it.
I don't either? Like... wat. Lol.
> Where we strongly disagree is that I believe aliases should be used to add functionality, where you believe that it should be used to remove functionality.
Not universally, not at all! There's plenty of other stuff in ripgrep that you need to opt into that isn't enabled by default (like trimming long lines). There's also counter examples in GNU grep itself. For example, you have to opt out of GNU grep's default mode of replacing NUL bytes with newline terminators via the `-a/--text` flag (which is not part of POSIX).
Instead what I try to do is look at the pros and cons of specific behaviors on their own. I'm also willing to take risks. We already have lots of standard grep tools to choose from. ripgrep takes a different approach and tons of users appreciate that behavior.
> We're never going to see eye-to-eye on this issue so there's no reason to rehash it.
Oh I'm happy not to rehash it. But I will defend my name and seek to clarify claims about stuff I've built. So if you don't want to rehash it, then don't. I won't seek you specifically out.
> I don't want to start another fight (so not linking the last).
To be clear, I would link it if I knew what you were referring to. I linked our other interaction by doing a web search for `site:news.ycombinator.com "burntsushi" "godelski"`.
> If it wasn't clear, I don't ;)
OK, so you don't use ripgrep. But you're complaining about it on a public forum. Calling me rude. Calling me creepy. And then whinging about not wanting to rehash things. I mean c'mon buddy. Totally cool to complain even if you don't use it, but don't get all shocked pikachu when I chime in to clarify things you've said.
> Calling me creepy
I didn't call you creepy.I said it was creepy that you appears seemingly out of nowhere is a very unexpected place.
I'm only giving this distinction because this category of error has happened a few times.
This is just rhetorics.
Yes, it is creepy when someone randomly appears just after you allude to them. It is also creepy when someone appears out of nowhere to make their same point. Neither of you were participating in this thread and appeared deep in a conversation. Yeah, that sure seems like unlikely circumstances to me and thus creepy.
I have the impression that a) the majority of systemd projects are broken by design and b) this is exactly what the LP people wanted.
No. Either the initsystem works in a straightforward way or it doesn't. As soon as we need special commands to just get an impression of what's happening with the service, this init system can - again - fuck off with all that unnecessary complexity.
Init must be simple.
Unfortunately it isn't anymore. Unfortunately, systemd will not fuck off, it's too late for that. Unfortunately we now have to deal with the consequences of letting LP & co do what they did.
> As soon as we need special commands to just get an impression of what's happening with the service,
I agree this is bad design. I do not intend to defend `--grep` just was trying to help solve the issue. I 100% believe that this creates an unreasonable expectation of the user and that piping to grep should be expected.Although, my results showed equal times piping to grep and dumping to file then grepping that file. IFF `--grep` is operating in parallel, then I think that's fine that it is faster and I'll take back my critique since it is doing additional functionality and isn't necessary. That situation would be "things work normally, but here's a flag for additional optimization."
Is the slowdown the file access? I do notice that it gets "choppy" if I just dump `journalctl --no-pager` but I was doing it over ssh so idk what the bottleneck was. IO is often a pain point (it pains me how often people untar with verbose...).
With text log files.
> but here's a flag for additional optimization
Which wouldn't be even needed in the first place if that very tool that wants this flag just simply did not exist.
I don't. It's the journalctl that does. And it can absolutely fuck off with everything and all of it.
Log files must be in form of text files. This worked for decades and there is no foreseeable future where this stops working or ceases to be a solution for OS log collection.
My goodness. Absolutely fuck journald - a solution in search of a problem. I have created a bunch of different scripts to init my instances [1] on all projects. I do it differently from time to time, but one thing they all have in common is that journald gets removed and disabled.
My only bugbear with it is that there's no equivalent to the old timeout default you could set (note that doas explicitly said they won't implement this too). The workaround is to run it in `sudo -i` fashion and not put a command afterwards which is reasonable enough even though it worked hard against my muscle memory + copypaste commands when switching over.
> Systemd gets a lot of hate
I'd argue it doesn't and is simply another victim of loud internet minority syndrome.
It's just a generic name at this point, basically all associated with init and service units and none of the other stuff.
And honestly, I think the one thing systemd is really missing is... people talking about it. That's realistically the best way to get more documentation and spread all the cool tricks that everyone finds.
> I'd argue it doesn't
I definitely agree on loud minority, but they're visible enough that anytime systemd is brought up you can't avoid them. But then again, lots of people have much more passion about their opinions than passion about understanding the thing they opine about.Of course. We suffered with sudo for a couple of decades already! Obviously it's wrong and outdated and has to be replaced with whatever LP says is the new norm.
cron and sudo definitely don't.
impossible to have a clear picture of what's up with home dir, where is now located, how to have access to it or whether it will suddenly disappear. Obviously, plain /home worked for like five decades and therefore absolutely has to be replaced.
Five decades ago, people didn't have laptops that they want to put on sleep and can get stolen. Actually, five decades ago, the rare people using a computer logged into remote, shared computers. Five decades ago, you didn't get hacked from the internet.
Today, people mostly each have their computer, and one session for themselves in it (when they have a computer at all)
I have not looked into homed yet, needs are very different from before. "It worked five decades ago" just isn't very convincing.
It'd be better to understand what homed tries to address, and argue why it does it wrong or why the concerns are not right.
You might not like it but there usually are legitimate reasons why systemd changes things, they don't do it because they like breaking stuff.
My rant is: why the f are they shoved down my throat on the server side then?
() yeah, it's a bad idea; it was required for a specific installation where every cpu cycle counted.
[0] idk why people think Arpanet is the internet. For clarification, I'm not my das
Learning curve is not the annoying part. It is kind of expected and fine.
systemd is annoying is parts that are so well described over the internet, that it makes it zero sense to repeat it. I am just venting and that comes from the experience.
never boot into the network reliably, because under systemd you have no control over the sequence.
BTW, I think that's one of the main pros and one of the strongest features of systemd, but it is also what makes it unreliable and boot unreproducible if you live outside of the very default Ubuntu instance and such.
It has a 600s timeout. You can reduce that if you want it to fail faster. But that doesn't seem like a problem with systemd, that seems like a problem with your network connection.
> If you live outside of the very default Ubuntu instance and such.
I use Arch btwMy ubuntu initiation script includes apt-get install ifupdown, which actually works unlike those two. And why bother learning because by the next ubuntu release the upcoming fanboys will replace the network stack by whatever they think they like - again.
But the bug we are discussing is probably systemd's, because the network is up and running while systemd still waits for it.
What does this mean? Your machine boots and sometimes doesn't have network?
If your boot is unreliable, isn't it because some service you try to boot has a dependency that's not declared in its unit file?
Sometimes it waits on the network to be available with network being available. No idea what causes this.
> some service you try to boot has a dependency that's not declared in its unit file
Nah, that would be an obvious and easy fix.
It has the slightly odd behavior with trying to get all configured links up. This can lead to some unexpected behavior when there's more than one.
But yea, the upstream stance is essentially "don't rely on network to be up before you start. That's bad software. You have to deal with network going down and back up in practice either way." Which is often not super useful.
Yeah, did not help.
> trying to get all configured links up
Extremely helpful when some are explicitly disabled. So maybe there is a bit of a bug there, who knows.
> don't rely on network to be up before you start
Yeah that's the correct way.
Simple example is I can have a duplicate of the "machine" running my server and spin it up (or have it already spun up) and take over if something goes wrong. Makes for a much more seamless experience.
I even run my entire Voron 3D printer stack with podman-systemd so I can update and rollback all the components at once, although I'm looking at switching to mkosi and systemd-sysupdate and just update/rollback the entire disk image at once.
The main issues are: 1. A lot of people just distribute docker-compose files, so you have to convert it to systemd units. 2. A lot of docker images have a variety of complexities around user/privilege setup that you don't need with podman. Sometimes you need to do annoying userns idmapping, especially if a container refuses to run as root and/or switches to another user.
Overall, though, it's way less complicated than any k8s (or k8s variant) setup. It's also nice to have everything integrated into systemd and journald instead of being split in two places.
To me podman/systems/quadlet could just as well be an implementation detail of how a k8s node runs a container (the.. CRI I suppose, in the lingo?) - it's not replacing the orchestration/scheduling abstraction over nodes that k8s provides. The 'here are my machines capable of running podman-systemd files, here is the spec I want to run, go'.
At some point I do want to create a purpose built rack for my network equipment and maybe setup some homogenous servers for running k8s or whatever, but it's not a high priority.
I like the idea of podman-systemd being an impl detail of some higher level orchestration. Recent versions of podman support template units now, so in theory you wouldn't even need to create duplicate units to run more than one service.
I believe the podman-compose project is still actively maintened and could be a nice alternative for docker-compose. But the podman's interface with systemd is so enjoyable.
useradd --comment "Helper user to reserve subuids and subgids for Podman" \
--no-create-home \
--shell /usr/sbin/nologin \
containers
I also found this blog post about the different `UserNS` options https://www.redhat.com/en/blog/rootless-podman-user-namespac... very helpful. In the end it seems that using `UserNS=auto` for rootful containers (with appropriate system security settings like private devices, etc) is easier and more secure than trying to get rootless containers running in a systemd user slice (Dan Walsh said it on a GitHub issue but I can't find it now).> User= causes lots of issues with running podman and rootless support is fairly easy. I also recomend that people look at using rootful with --userns=auto, which will run your containers each in a unique user namespace. ― https://github.com/containers/podman/issues/12778#issuecomme...
> Of course, as my luck would have it, Podman integration with systemd appears to be deprecated already and they're now talking about defining containers in "Quadlet" files, whatever those are. I guess that will be something to learn some other time.
To me that's why compose is neat. It's simple. Works well with rootless podman also.
Services are conceptually similar to pods in podman. Volumes and mounts are the same. Secrets or mounts can do configs, and I think podman handles secrets much better than docker. I searched for and found examples for getting traefik to work using quadlets. There are a few networking wrinkles that require a bit of learning, but you can mostly stick to the old paradigm of creating and attaching networks if that's your preference, and quadlets can handle all of that.
Quadlets use ini syntax (like systemd unit files) instead of YAML, and there is currently a lack of tooling for text highlighting. As you alluded, quadlets require one file per systemd service, which means you can't combine conceptually similar containers, networks, volumes, and other entities in a single file. However, podman searches through the quadlet directories recursively, which means you can store related services together in a directory or even nest them. This was a big adjustment, but I think I've come to prefer organizing my containers using the file system rather than with YAML.
I'm using this tonspeedup my quadlet configs whenever I want to deploy a new service that invariably has a compose file.
[1] ParticleOS:
https://github.com/systemd/particleos
[2] Systemd ParticleOS:
Doesn't anyone just use ssh and nginx anymore? Cram everything onto one box. Back the box up aggressively. Done.
I really don't need microservices management for my home stuff.
At the same time, i've seen some horrible decisions made because of them: Redis for things that do not need them. Projects with ~10.000 users (and little potential growth) tripping over themselves to adopt k8 when my desktop could run the workload of 100.000 users just fine. A disregard for backups / restore procedures because redundancy is good enough. "Look I can provision 64 extra servers for my batch job that pre-calculates a table every day".
---
It seems every year fewer teams appreciate how fast modern hardware with a language like Rust or Go can be if you avoid all the overhead.
My standard advice is to use a single container that holds everything. Only after its build and in use can you make the best choice at which angle to scale.
> A complex system that works is invariably found to have evolved from a simple system that worked. - John Gall
But containers really shine during development if you have more than a few developers working on the same projects. The ability to have a standard dev container for coding and testing saves so much time. And once you have that, deploying with containers is almost free.
It's basically just this command once you have compose.yaml: `docker compose up -d --pull always`
And then the CI setup is this:
scp compose.yaml user@remote-host:~/
ssh user@remote-host 'docker compose up -d --pull always'
The benefit here is that it is simple and also works on your development machine.Of course if the side goal is to also do something fun and cool and learn, then Quadlet/k8s/systemd are great options too!
docker context create --docker 'host=ssh://user@remote-host' remote-host
Then try this instead: docker -c remote-host compose -f compose.yaml up -d --pull always
^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^
No need to copy files around.Also, another pro tip: set up your ~/.ssh/config so that you don't need the user@ part in any ssh invocations. It's quite practical when working in a team, you can just copy-paste commands between docs and each other.
Host *.example.com
User myusername
https://harbormaster.readthedocs.io/
Harbormaster uses a YAML file to discover repositories, clones and updates them every so often, and runs the Docker Compose files they contain. It also keeps all state in a single directory, so you can easily back everything up. That's it.
It's by far the easiest and best tool for container orchestration I've come across, if all you need is a single server. I love how the entire config is declared in a repo, I love how all the state is in one directory, and I love how everything is just Compose files, nothing more complicated.
I know I'm tooting my own horn, I just love it so much.
But Kubernetes does much more in terms of providing the resources required for these containers to share state, connect to each other, get access to config or secrets etc.
That’s where comes the CPU and memory cost. The cost of managing your containers and providing them the resources they need.
> basically acts as a giant while loop
Yep. That’s the idea of convergence of states I guess. In a distributed system you can’t always have all the participating systems behave in the desired way. So the manager (or orchestrator) of the system continuously tries to achieve the desired state.
This was OPs argument, and mine as well. My side project which is counting requests per minute or hour really doesn’t need that, however I need to eat the overhead of K8s just to have the nice dx of being able to push a container to a registry and it gets deployed automatically with no downtime.
I don’t want to pay to host even a K3s node when my workload doesn’t even tickle a 1vCPU 256mb ram instance, but I also don’t want to build some custom scaffold to so the work.
So I end up with SSH and SCP… quadlets and podman-systemd solves those problems I have reasonably well and OPs post is very valuable because it builds awareness of a solution that solves my problems.
You can define rootless containers to run under systemd services as unprivileged users. You can use machinectl to login as said user and interact with systemctl.
This is a good example [1], cited elsewhere in this post.
Documentation for quadlet systemd units [2].
[0]: https://www.redhat.com/en/blog/quadlet-podman
[1]: https://mo8it.com/blog/quadlet/
[2]: https://docs.podman.io/en/latest/markdown/podman-systemd.uni...
https://github.com/containers/podman/issues/10884
Right now I have an Ansible playbook responsible for updating my services, in a git repo.
The playbook stops changed services, backups their configs and volumes, applies the new docker-compose.yml and other files, and restarts them.
If any of them fail to start, or aren't reachable after 3 minutes, it rolls back everything *including the volumes* (using buttervolume, docker volumes as btrfs subvolumes to make snapshots free).
I am looking into Kubernetes, but I didn't find a single stack/solution that would do all that this system does. For example I found nothing that can auto rollback on failure *including persistent volumes*.
I found Argo Rollback but it doesn't seem to have hooks that would allow me to add the functionality.
You'd need to slightly rethink rollbacks, express them in terms of always rolling forward. K8s supports snapshots directly (you'd need a CSI driver; https://github.com/topolvm/topolvm, https://github.com/openebs/zfs-localpv, or similar). Restores happen by creating a new PVC (dataSource from a VolumeSnapshot). So in case rollout to version N+1 fails, instead of a rollback to N you'd roll forward to N+2 (which itself would be a copy of N, but referencing the new PVC/PV). You'd still have to script that sequence of actions somehow - perhaps back to Ansible for that? Considering there might be some YAML parsing and templating involved.
Of course this looks (and likely is) much more complicated, so if your use case doesn't justify k8s in the first place, I'd stick to what already works ;)
I'm interested in moving to Kubernetes to make use of the templating languages available that are better than plain Ansible jinja2, and also offer features like schema checking.
Because my services are pretty integrated together and to avoid having hardcoded values in multiple places my Ansible files are a pain to maintain
There's still a lot on my todo list, like env files, controlling parallelism, canaries/batches, etc. I'm currently doing these things using hacky shell scripts, which I don't like, so I'd prefer moving that into the main binary. But I still prefer it as-is over Ansible.
Eventually I replaced everything with a script that generated systemd units and restarted the services on changes under Debian using the Wordpress that comes with it. Then I have a test VM on my laptop and just rsync changes to the deployment host and run the deployment script there. It reduced my chores very significantly. The whole system runs on 2GB VPS. It could be reduced to 1GB if Wordpress would officially support SQLite. But I prefer to pay few more euros per month and stick to Mariadb to minimize support requirements.
It dramatically speeds up the process of converting the usual provided files into quadlets.
And I'm pretty familiar with Kubernetes but, yeah, for small tasks it can feel like taking an Apache to the store to buy a carton of milk.
So many questions...
This trades off some automation for simplicity. Although, this approach may requires manual intervention when a machine fails permanently.
"What if I could define a systemd unit that managed a service across multiple nodes" leads naturally to something like k8s.
I was learning about Kubernetes at work and it seemed like such a powerful tool, so I had this grand vision of building a little cluster in my laundry room with nodes net booting into Flatcar and running services via k3s. When I started building this, I was horrified by the complexity, so I went the complete opposite direction. I didn't need a cluster, net booting, blue-green deployments, or containers. I landed on NixOS with systemd for everything. Bare git repos over ssh for personal projects. Git server hooks for CI/CD. Email server for phone notifications (upgrade failures, service down, low disk space etc). NixOS nightly upgrades.
I never understood the hate systemd gets, but I also never really took the time to learn it until now, and I really love the simplicity when paired with NixOS. I finally feel like I'm satisfied with the operation and management of my server (aside from a semi frequent kernel panic that I've been struggling to resolve).
I also have Quadlet on my backlog, I'm waiting the release of next stable version of Debian (which I think should be released sometimes this year) as the current version of Debian has a podman slightly too old which doesn't include Quadlet
I host all of my hobby projects on a couple of raspi zeros using systemd alone, zero containers. Haven’t had a problem since when I started using it. Single binaries are super easy to setup and things rarely break, you have auto restart and launch at startup.
All of the binaries get generated on GitHub using Actions and when I need to update stuff I login using ssh and execute a script that uses a GitHub token to download and replace the binary, if something is not okay I also have a rollback script that switches things back to its previous setup. It’s as simple as it gets and it’s been my go-to for 2 years now.
However, Ruby, Python, JS/TS, Java/.Net are all easier inside a container then outside. Not to say it's not doable, just hair pulling.
If it is deployed as folders, install new versions as whatever.versionnumber and upgrade by changing the symlink that points to the current version to point to the new one.
Autoscaling fleet - image starts, downloads container from registry and starts on instance
1:1 relationship between instance and container - and they’re running 4XLs
When you get past the initial horror it’s actually beautiful
really raw-dogging it here but I got tired of endless json-inside-yaml-inside-hcl. ansible yaml is about all I want to deal with at this point.
- is there downtime? (old service down, new service hasn't started yet)
- does it do health checks before directing traffic? (the process is up, but its HTTP service hasn't initialized yet)
- what if the new process fails to start, how do you rollback?
Or it's solved with nginx which sits in front of the containers? Or systemd has a builtin solution? Articles like this often omit such details. Or no one cares about occasional downtimes?
Check it out. Doron
Until something better comes I will start with k8s 100% of the time for production systems. The minor one time pains getting it stood up are worth it compared to the migration later and everything is in place waiting to be leveraged.
Anyone knows what I’m talking about? Is it still alive?
EDIT: it’s not CoreOS/Fleet, it’s something much more recent, but was still in early alpha state when I found it.
[0] https://github.com/eclipse-bluechi/bluechi
[1] https://www.redhat.com/en/blog/hirte-renamed-eclipse-bluechi
Seemed quite promising to me at the time, though it seem they've changed the scope of the project a little bit. Here's an old post about it: https://www.redhat.com/en/blog/introducing-hirte-determinist...
E.g. when updating a container with watchtower:
You deploy a container 'python-something'.
The container has PYTHON=3.11.
Now, the container has an update which sets PYTHON=3.13.
Watchtower will take the current settings, and use them as the settings to be preserved.
So the next version will deploy with PYTHON=3.11, even though you haven't set those settings.
I am working on proot-docker:
https://github.com/mtseet/proot-docker
OK it does not need process namespaces, but that's good enough for most containers.
> Particularly with GitOps and [Flux](https://www.weave.works/oss/flux/?ref=blog.yaakov.online), making changes was a breeze.
appears to be broken.
EDIT: oh, I hadn't realized the article was a year old.
Most of what it does is run programs with various cgroups and namespaces, which is what systemd does, so should it really be any more resource intensive?
Just the other day I found a cluster I had inadvertently left running on my macBook using Kind. It literally had three weeks of uptime (running a full stack of services) and everything was still working, even with the system getting suspended repeatedly.
But TL;DR we've already done systemd style container orchestration, and it was ok.
it helped of course that people writing them knew what they were doing.
Edit: Nevermind, I misunderstood the article from just the headline. But I'm keeping the comment as I find the reference funny