The rest of the lab is a few ephemeral instances on Google, with dual A100s that spin up when I need to train things.
I put Ubuntu on the old beast, and never touch it. If the power goes out, it automatically comes on and Docker launches all the services when it comes up.
About the only thing that needs watching is the tiny SDR radio plugged into it, which I use for pure random numbers and talking to it with a hand held radio from the other house. Sometimes I have to unplug it and then plug it back in to get it back into service. No amount of finagling seems to fix it from software.
You are an interesting person! We would be friends IRL :-)
May I ask what you use the pure random numbers for? And what you use the radio link for?
I've built an SDR radio stack to integrate with a single pane navigation and chart plotting app I'm building for my new company. You use the radio to talk to the local agent (I just finished a submission to the Gemma Challenge on HF to learn how to train models). Wake words show up on the glass, for security. Been working on training a small model to do agentic controls, including changing autopilot and switching displays when you want to see other content on the screen. I've been working on isochrone routing and have it working well now. Waiting for Fabel to come back to continue the work...
Everything is here: https://deepbluedynamics.com. It's just me as an LLC. No VC. No backers. No users, yet. Few stars, but 100% written by me, and most of it is open source. FWIW, I've been around the block and I'm getting old now. So the radio helps with arthritic fingers! :) The radio stack stuff lives here: https://nuts.services/sdrrand
I'm also integrating the voice stuff into Hyperia, my terminal emulator forked from Hyper Terminal. It's on Github. Hyperia is also agentic controlled, so I can talk to it by pane name to inject text into the prompts. This lets me get up and roam while I'm roaming around. I'm remodeling my front house, and for a while I had one of those smart lights turning red or blue when things happened in the sessions, but I want something I can talk to without having to click and type. I use the Windows transcription feature a lot, but talking on the radio to it is much easier.
Oh and to answer your question about random numbers, I use them for various Monte Carlo based approaches. For example, I inject those numbers into training runs, rolls for the iChing (helpful for feeding agents for decision busting), and seeding other things like simulated wind speeds and current (in the sim beause I'm not sitting on a boat). I even use it for sampling documents I've been indexing (cut up a bunch of business books and have a local model use them for reference).
I have the SDRrand sources wired into almost everything now that needs randomness. Is it necessary? Maybe. We only get pseudo random numbers from computers, so I just attached to the purity of it, if anything.
On your synology holdout: I run two of them plus a custom truenas, and I treat the storage layer as the deliberate boundary of the declarative model rather than fighting to pull it in. Topology is truenas on fast ssd backing the running vms, synology one as primary, synology two as the backup target for both truenas and the primary. The synologys are imperative islands — dsm doesn't want to be config-managed — so I codify the consumption side (the exports, and the vms that mount them) and treat the boxes themselves as data, not infra. Truenas is the exception, since the api gets you closer to declarable.
Are you leaving the synology fully hands-off, or driving any of it through the api?
1. backup, an RS422+ (w/ 2GB): 4x8TB RAID6
2. storage, RS818+ (16GB): 4x12TB RAID0 (everything not easily replaced is backed up)
3. san, an RS2418+ (64GB): 8x1TB SSD RAID10, and 4x5TB RAID10
I can't picture too much of a reason to drive it through the API, or at least not yet - still getting it all finalized. But I also managed to figure out IPv6 with Xfinity across my VLANs and SLAAC.
What this skips though is the complexity of services like NextCloud (stuck in maintenance mode again?), Immich (needs a compose file edit?), MineCraft worlds (Dad! my client is on another version again!), (dmn) AlbyHub (needs re-login and closed its channel).
But to be fair this is really getting quite minimal these days indeed. I didn't really realize it but I too have a mostly hand-off home-lab... Ok, then it's not really a lab anymore, its more "stable home-infra" ;)
The frustrating thing is that basically everyone who self hosts Nextcloud will know exactly what you're talking about, and the solution is generally quite simple, and yet the docs for this problem are terrible, scattered, out of date, and mostly just link to old GitHub issues that don't directly address the problem until you get to some comment half way down.
It's a terrible user experience - seriously, point an LLM at it for a few hours to write the first draft of a "So you're stuck in maintenance mode" page, get it reviewed by the community, publish. Immediate quality of life improvement for everyone struggling to get started self hosting it.
Right now I use Docker compose, with the major version pinned so it doesn’t surprise me. I verify app compatibility before changing that major version.
I use PostgreSQL as the DB.
I've never had real problems with it, and it wasn't for lack of trying.
I've been using Nextcloud since it forked from OwnCloud, and I was running OwnCloud before that. It was on a NAS with 64MB RAM and a 500MHz CPU at one point.
I've run it bare metal, with different Linux distros, on FreeBSD, in VMs, in LXC containers, in Docker containers, with different PHP versions, with different web servers, with different DBs, on terrible hardware, on powerful hardware, more. So many combinations.
Of course, just before I was ready to wipe it all and start with the AIO container my partner started intensively using the calendar (switching from paper, finally!) So I need to spend some time with it and migrate properly without her having any downsides. When I do I will also set up EuroOffice and experiment with that. I'm rather looking forward to that. In that sense it's a real "lab" :)
However, I still build new things, launch new services, etc. I personally don't categorize that as maintenance.
In regards to your complexity comment - I have slowly built up experience over the years and now launch services where I very rarely have to edit compose files or such. However, it does happen and I would lump it into that "~15 minutes per month" bucket. I primarily use Nginx + Docker for any service I launch and can quickly diagnose issues and resolve them.
Perhaps another piece I could've mentioned is sticking to a core set of tools like this, which allow me not only to automate the basics, but also to have become adept at fixing things manually if they break.
I ended up moving everyone to Lunar Client, mostly because the Minecraft Launcher from Microsoft still requires Rosetta, and sometimes would just break for unrelated reasons. As a side effect, you can pick the version that launches, upgrade when you want, run old versions as needed. So the server upgrades can be planned instead of an emergency.
1. I don't like surprise breakages. I am not prepared to fix a service my family uses midday on a Tuesday when I am working since it auto updated. I'd like to specifically make sure I have dedicated time and plan if something is going to go wrong.
2. My family HATES when things change. I try to run LTS versions of things, but annoyingly, some software like nextcloud doesn't have LTS version. One of the things my family likes the most, is that the stuff I host isn't constantly changing like commercial products. Having google photos change or netflix have a new interface randomly is very, very frustrating for them.
Since my homelab is completely internal, I avoid quickly doing updates (unless it is a critical security issue), and definitely avoid doing major version upgrades unless there is good value in it.
Plex is the main app I run that gets used day to day, and they haven’t pushed a meaningful update in years. It’s always some nonsense to try and be a streaming service or social network, which I don’t want.
I should probably RTFA before assuming that, but that's the way my Linux box works. When something breaks or needs upgrading, I just tell the agent to deal with it. Normally that's Claude Code, but the role will be assumed by a local model soon enough.
It's been normal for me for the past 3 years thanks to using NixOS for all server infrastructure.
Helps that things are really easy to test too, spin up a new test VM with your new config and copy of real data, check if it works, then apply the change to the real hardware and you're good to go. Alternatively, do it live with a copy of real data, then rollback in case it doesn't work.
Switched to NixOS a few years ago, and I can't overstate the amount of peace it has brought to my life. It just takes so much stress away, compared to everything else I've used before.
My only criticism is that the Nix language is not super ergonomic or easy to learn. But with LLMs nowadays, even that is barely ever a problem.
It doesn’t change.
Many people keep swapping gear in so they can learn BGP on Cisco edge gear or run clusters on salvaged IB.
OP is not that person.
Indeed. And if you never test your recovery then you don't actually have a workable backup.
* Docker Compose files and various folders for containers live on an NFS share
* SQLite and other databases run off a local SATA SSD for speed and reliability
* Cronjob tarballs the critical stuff nightly and throws it on another NFS share to get ingested into Backblaze B2.
Now I just get to kick back and actually experiment with new things instead of babysitting a convoluted Proxmox upgrade or shunt onto a new container standard.
Does it run rootless? Not atm (blame FreshRSS, my sole holdout). Is it super secure? Probably not, but I’m not doing anything goofy like mounting the Unix socket into a container at the very least, and the server credentials don’t work anywhere else should it get popped. The blast radius is contained, and that’s more important to me than Enterprise-grade security for my homelab (a la Wazuh, another backlog project TBD).
I have been running Proxmox for 3 years and it has been rock solid
- Docker VM : Lots of containers with docker compose, a few examples are Plex, AdguardHome, *Arr stack...
- K3s VM: Mostly to learn keep up with kubernetes; my own apps running in there
- Postgresql VM: database for anything that needs one
Currently trying to simplify, moving the database to a docker container and testing if docker and k3s can coexist on the same system, at that point I might ditch Proxmox and move to NixOS. The only things I might miss are the option to create VMs to test random things, and VM snapshots, which make backups really simple.
I still upgrade mine manually every Friday with an ansible playbook; most of the time nothing breaks, but if it does I know I have time to fix it.
Building/tinkering/playing around is fun, but once you are actually self-hosting services you rely on, it needs to "just work" or you will eventually burn out or lose interest. Especialy when you take on more users than just yourself. The day my wife cancelled her audible subscription because audiobookshelf was just as good (IMHO better) was a good day, but that only happens because it is stable/reliable.
Recently one had their first baby, so they migrated from Fedora to RHEL, just to spend less time on upgrades. :D I thought that was cute. Like RHEL is so stable, even a first time parent can use it.
Don’t super care about updates. If it isn’t too ancient and not internet facing then it’s probably ok
I use a nearly identical alias for docker pull to keep my containers updated. To ensure everything stays running smoothly, I've built a lightweight watchdog (a mix of bash scripting and Uptime Kuma/Beszel) that monitors my services and containers and restarts them if they crash. This way, I rarely need to intervene manually.
For critical services (DNS, VPN, git, web search, crawler and mail, etc.), I add an extra layer of redundancy by running them on multiple servers across different locations. If one server fails, the others seamlessly take over. I also use DNS round-robin as a simple but effective way to handle load balancing and failover; no HaProxy, K8, expensive IP Takeover (ARP Spoofing) or BGP Anycast and VRRP/CARP, Proxmox or fancy orchestration tools required. If a node goes down, another watchdog script temporarily removes it from DNS, and traffic shifts to the remaining servers. Most often the services are self-healing. The best part? My deployment and monitoring are fully self-scripted (no Terraform, Ansible or BundleWrap). Moving services to a new server is as easy as running some scripts over SSH. Everything sets itself up automatically. Currently I run my services on 2 Pi's, 2 stratum 1 servers (from centerclick), and 8 VPSs that cost me around $40/month. It's a great example of how a little automation and redundancy can go a long way in keeping things cheap and reliable without unnecessary complexity.
I invest around 1-2h/month to maintain and (mainly) adjust my setup. Before I head multiple Proxmox instances and a backup server that cost me around $250/month, I was spending 1-2h/week just to keep everything running. The difference is night and day.
However, I've personally had bad experiences with consumer hardware like the Raspberry Pi and hardware failures. Most of the time, I didn't feel motivated to replace the hardware and set up all the services again (even if I had a backup). As an Unify alternative i can recommand GL-iNET; build modern hardware for OpenWRT with some additions and the hardware has enough power to run Wifi7, AdGuard and Tailscale or ZeroTier. (Before I run Protectli Vaults with a virtual PfSense, Tailscale and AdGuard on Proxmox and extra OpenWRT access points) I can recommand the Protectli Hardware over a Raspberry Pi, especially if you want to run a single server/hardware homelab.
Thanks for the inspiration; it's always refreshing to see others embracing simplicity!
I'm not sure what's here to talk about. Things break. We don't have to overthink this. But if you want more predictability, stable distros exist.
After I set it up and stopped fiddling with it it's just run flawlessly for the last 6 months.
The agent has a single mission to maintain the system it is dropped into. It has its own in-process heartbeat and is launched as a service.
As I said above, it is kind of amusing to think that it exists at all. It checks whether things are running fine and can possibly correct them if it finds something wrong.
I can see this becoming a more serious kind of system agent.
I try not to advertise in this forum but I can drop a link if interested.
Yeah, right until the moment it bricks after an update.
Unifi stronk. Noone needs working ipv6 or 2+ gigabit pppoe throughput and many other things, like an ability to assign a name to an entry in embedded radius server.
Edit: zero minutes old already downvoted.
So using AI is not the point of the article but neither was it mine.
My point was I also attempt to implement homelab automation rather than manual maintenance, and I listed a few things that are onerous to do regularly by hand just like the article.
But I totally expected people to just skim my message, see “AI” and dismiss it, so I’m not terribly upset.
Practically Luddite
Sometimes having people ask the obvious questions can cause some useful reflection
So keeping the trigger manual at least gives me the ability to skip it while I work.
I don’t use docker, I’d rather create my own packages. And if a project is too trigger happy about requiring new dependency version, I drop them.
1. How often do I have to touch it during the next ten years?
2. How many of the times that I have to touch it are because I decided to do so?
3. How much pain is it to fix and understand if I had my mind erased?
This often works out in favour of dead simple solutions.Longer term goal is a sleek plug-and-play box anyone can connect to their ISP modem with minimal technical knowledge.
I'm currently running it on a Aoostar WTR Max NAS with my AT&T connection. Got another NUC connected to a Spectrum modem. My goal is to be able to flip back and forth between the two with a backup bundle within minutes.
Considering breaking up the router and app server functionality so they can be run separately. Another idea is to use custom a 3D printed case with Framework laptop motherboard and battery, switch, and wifi AP to make a true all-in-one box. I currently need an external switch, backup battery, and wifi access point.
Once the system feels mature, next steps would be things like federated tailnets with friends and family for things like distributed backups, compute/GPU, CDN, social networking, etc. Hoping that decentralized model training is cracked by someone at some point.
From a coding perspective I'm hoping to modularize everything (since it's NixOS) and add thorough testing and hardening. It's already relatively modularized considering it's built on Nix flakes.
Technology has come along way. But I think that in tech we should be careful to not fall prey to monkey see monkey do.
We should not be deploying technology in our homes to "mimick our employers"
Remember they are miserable for a reason.
Frankenstein couldn’t build a monster without influence. Same thing here.
“CCNA? I’ll show you CCNA…”