1. Developer environment sandboxes. This is a cheap and convenient way to run Claude Code / Codex CLI / etc in YOLO mode in a persistent sandboxed VM with a restricted blast radius if something goes wrong.
2. Sandbox API. Fly now have a product that lets me make a simple JSON API call to run untrusted code in a new sandbox. There's even snapshotting support so I can roll back to a known state after running that code.
I wrote more a bunch more about this here: https://simonwillison.net/2026/Jan/9/sprites-dev/
https://container-use.com/quickstart
BTW Simon, I was super happy when I heard on Theo's podcast that he will be encouraging you to monetise your work more. I'm super appreciative of your work and I'm pretty convinced that the more you profit from it, the better the universe will be!!!
Some things that are unclear:
- How should I auth to github? sprite console doesn't use ssh (afaik) so I guess not agent forwarding?
- What on machine api's are available? Can I use the fly oidc provider[1]? There's a /.sprite/api.sock but curl'ing /v1/tokens/oidc gets a 404.
- How much is it going to cost me? I know there is pricing but its hard to figure out what actual usage would be like. Also I don't see any usage info in the webui right now.
I was previously thinking about doing the same thing on my homeserver with tailscale to expose the web interface publicly and tailscale oidc auth to an s3 bucket for object storage.
Given their principled take on only trusting full-VM boundaries, I doubt they moved any of the storage stack into the untrusted VM.
So maybe a virtio-block device passing through discard to some underlying CoW storage stack, or maybe virtio-fs if it's running on ch instead of fc? Would be interesting to hear more about the underlying design choices and trade-offs.
Edit: from their website, "Since it's just ext4, you won't run into weird edge cases like you might with NFS or FUSE mounts. You can happily use shared memory files, for example, so you can run SQLite in all its modes." So it's a virtio block device supporting discard that's exposed to the VM. Interesting; fc doesn't support virtio discard passthrough, and support for ch is still in progress...
I've been thinking a lot about how to run agents (and skills) securely while giving them a lot of powerful capabilities.
I recently used their macaroons library to turn arbitrary API keys (e.g. for stripe's API) into macaroons. I route requests for an upstream host (like stripe) through Envoy as a mitm proxy which injects the real creds after verifying the macaroon.
It is such a powerful pattern. I'm always worried about leaking sensitive keys through prompt injection attacks (or just sending them to anthropic), but in this model you can attenuate the keys (both capabilities & validity window) client side. The Envoy proxy lives inside my flycast network so it can't be accessed externally.
It would be so cool if fly built something like this into sprites.dev (though I can see how it would be spooky to have fly install their own certs for stripe, etc...)
Tokenizer is an explicit proxy though right?
My use case is very similar, but I wanted a transparent proxy so I could run unmodified scripts. It is a tricky design decision though.
I also mount a little fuse filesystem that mints macaroon on read (with a shorter lifetime, probably inspired by y'all but i forget from where).
I work on realtime collaboration of markdown files (currently in Obsidian), which has become a shared-context substrate for agents, skills, etc.. Our own company workspace has skills that have scoped access to fly, stripe, gmail, etc. We're definitely drinking the file-over-app personal-software-for-teams Kool-Aid, so the problem space for us includes access control and auditing.
Love your work :)
We can also attach Macaroons to Fly Machines and Sprites for configurable ambient privileges, something I've wanted us to expose as a feature for a very long time.
What is the contract with sprites? Is it just built-with-linux but not promising Linux? Or is it more like a machine but y'all control the container image?
I don't want to get too far into the rest of the details only because I'm writing this up for next week. They're not that interesting technically, but they're a really big deal for us in other ways.
This alone was worth the upvote!
>...I’m not entirely sure what that looks like yet, but things like this are a step in that direction.
This made me stop and think for a moment as to what this would look like as well. I'm having trouble finding it, but I think there was a post by Joe Armstrong (of Erlang) that talked about globally (as in across system boundaries, not global as in global variable) addressable functions?
It requires your api token by default.
I'd love to adopt this for all my development (which I currently do using rented cloud instances, so I'm pretty comfortable with the remote development paradigm). I'm especially excited about the snapshot/clone pattern, and have (this past week) been researching solutions for exactly this problem.
Hope you launch multiple regions for this ASAP. Will be watching.
Would I think of this as an EC2 instance which automatically and quickly scales to zero, with pricing only for resources consumed? (CPU and RAM when up, and disk all the time?)
It's a fast starting and fast pausing persistent VM, with a ton of built in developer tools (including a preconfigured Claude Code) and an extra JSON API for executing commands within it so you can treat it as a sandbox.
You may find my writeup here useful: https://simonwillison.net/2026/Jan/9/sprites-dev/
This is needed for sandboxes if you don't want to throw them away and start over when something goes wrong.
With sprites.dev you can create an additional checkpoint and then turn Claude Code (or your preferred agent) loose to do anything. Even if it burns down the sandbox you can just restore a checkpoint in about a second.
I really hate that modern development means not having persistent disk. I’m glad there are new options coming out which let you do this in and easier way than managing my own EC2 instances!
So let's say sprite is my building/dev ground floor. I get my thing/app to where I want it, but at the end of the day I think my thing/app is so awesome that it should be a production app for the whole world, and, I want to actually deploy it on fly, say.
Have you guys thought about that workflow, and what it might take to push button/migrate a sprite app over to fly?
Also, any plans for GPU sprites?
I don't think we're going to do anything new with GPUs any time soon.