I'm not a developer. My background is data science and finance. Six months ago I couldn't tell you what a Nix derivation was.
I started because I wanted Claude Code to manage my system, not just my code. Tried it on Ubuntu first and it was a disaster — Claude would edit .bashrc and break my shell, install packages that conflicted with each other, no way to undo any of it cleanly. A friend mentioned NixOS, I installed it on my Framework laptop, and within a week I realized this is what AI-assisted system management should look like. Claude edits a .nix file, I rebuild, if it breaks I roll back in one command. No more "what did the AI just do to my system?"
It snowballed from there. ~470 commits later I'm running 7 machines off this config — my Framework 16 dev workstation, a ThinkPad, and a bunch of business laptops for people who've never opened a terminal. Two profiles: a "tech" profile with 350+ packages and a full AI toolchain (Claude Code, Cursor, local speech-to-text), and a "business" profile with ~40 curated packages for office use. Adding a new machine is three lines in flake.nix.
Some things I built along the way:
A script that spins up Claude Code in a sandboxed git worktree with bubblewrap + seccomp so it can work autonomously in the background. It runs in a tmux session and loops with fresh context up to 5 times. I use it for overnight refactoring.
Custom NixOS installer ISOs — I ship a USB stick to someone, they plug it in, and they get a working system with Claude Code pre-configured as their "sysadmin." They ask Claude to install software, Claude edits the config, they rebuild. I manage their machines remotely via git push.
CI/CD with BATS tests, ShellCheck, security scanning. A two-branch model where personal (my dev branch) auto-syncs to master via CI with path sanitization so nothing personal leaks to the public repo.
The core insight: NixOS is the only OS I've used where AI can't permanently break anything. Declarative config means Claude always knows the exact system state. Atomic upgrades mean every rebuild either succeeds completely or doesn't happen. If something goes wrong, I pick the previous generation from the boot menu. I've bricked my system maybe 15 times and recovered in under a minute every time.
What still sucks: the Nix learning curve is real even with AI. Claude writes non-idiomatic Nix all the time and I can't always tell. Flake lock updates break things in ways that take hours to debug. Error messages are famously terrible. And NixOS is not for everyone — it's a tradeoff between upfront complexity and long-term reliability.
Is anyone else doing something like this? Not just using AI to write code, but to manage and evolve their actual operating system?
I have two reactions to this.
First: respectfully, this is hilarious. LLMs are good at many things, but judgement is not one of them. At the outset, this was firmly in the "terrible ideas" category. (EDIT: ugh; I misread you. I thought you said Claude Code wanted to manage your system. I need more coffee...)
Second: sometimes from terrible ideas come great creativity. (I'm actually not sure what epistemic basis creativity flows from, if not simply the habit of adding entropy to a search path, and you sure chose a high entropy path!) I don't know anything about the stack you chose, but I've spent many hours almost being enticed by https://guix.gnu.org/ and this sounds similar. And the part where you have recovered from a borked system about 15 times is genuine evidence that you're doing something right. I applaud your grit and hope you're having as much fun as it sounds like you are.
I will definitely look into guix thanks!
Let's say you have 15 programs installed. You use programs 1-5 multiple times a day, 6-10 every few days, 11-15 every couple of weeks.
The AI makes a change. It rebuild the image. Things are working.
A few days later, you go to use one the 10-15 programs that haven't been used for a week or two and it doesn't work. The change could have been that day or a couple of weeks ago. How do you remedy that? Thanks in advance for your insights.
You could build up a test suite of example workflows that run on every build.