Nobody likes lag: How to make low-latency dev sandboxes(www.compyle.ai)

105 pointsby mnazzaro15 days ago17 comments

tuhgdetzhh15 days ago
I’m experiencing a similar issue hosting MCP Server on Cloud Run with scale-to-zero for cost optimization. As far as I know, Cloud Functions v2 and Cloud Run both are container-based, and they tend to have noticeable startup times.
In contrast, AWS Lambdas, which run on Firecracker, have sub-second startup latency, often just a few hundred milliseconds.
Is there anything comparable on GCP that achieves similar low latency cold starts?
- mnazzaro15 days ago
  I'm a huge GCP fan, but cloud run wouldn't fit our use case because of the routing and ephemeral nature. I think you would have to try to build something yourself using GKE + gVisor
nicolaslecomte15 days ago
Thanks for sharing. Makes a lot of sense that removing that routing layer would improve e2e latency.
We had a similar bottleneck building out our sandbox routing layer, where we were doing a lookup to a centralized db to route the query. We found that even with a fast KV store, that lookup still added too much overhead. We moved to encoding the routing logic (like region, cluster ID, etc) directly into the subdomain/hostname. This allowed to drop the db read entirely on the hot path and rely on Anycast + latency-based DNS to route the user to the exact right regional gateway instantly. Also, if you ever find yourselves outgrowing standard HTTP proxies for those long-lived agent sessions, I highly recommend looking at Pingora. It gave us way more control over connection lifecycles than NGINX.
For the compute aspect doing sandbox pooling is cool but might kill your unit economics, especially if at some point each tenant has different images. Have you looked into memory snapshots (that way you only have storage costs not full VMs)?
- metadat14 days ago
  Pingora looks good https://github.com/cloudflare/pingora
iterateoften15 days ago
Why is there all the sudden an explosion of sandbox related posts and tools? Llms and agents always needed sandboxes… was it just the collective conscious decided all at once that it mattered and the area to focus building tools?
- simonw15 days ago
  I think sandboxes are having their moment because it's become undeniable that coding agents are useful, and that they're more useful if you run them in YOLO mode rather than having to approve everything they want to do.
  Coding agents are still a relatively new category to most people. Claude Code dates back to February last year, and it took a while for the general engineering public to understand why that format - coding LLMs that can execute and iterate on the code they are writing - was such a big deal.
  As a result the demand for good sandboxing options is skyrocketing.
  It also takes a while for new solutions to spin up - if someone realized sandboxes were a good commercial idea back in September last year the products they built may only just be ready for people to start trying out today.
  - ambicapter15 days ago
    Why/how are they more useful in YOLO mode than in careful mode?
    simonw15 days ago
    You can literally give them a task that will take a couple of hours to finish (like "port this library to language X, start by porting the tests, don't stop until all of the tests pass against the new implementation"), go out for lunch, come back and they'll have finished and probably got it ~90% right.
    phainopepla215 days ago
    It's just a lot easier to let them run loose and finish a task before reviewing it, rather than have to babysit and approve every command they want to run. It frees you up to do other things in that time. For some people, that's running more agents in a different terminal, for others that doing something else entirely.
    esperent15 days ago
    The flow I'm using is plan -> technical plan -> execute using TDD.
    My level of involvement decreases from step to step. I'm totally in control of the initial plan. I'm giving strong oversight of the technical plan. But by the time it comes to executing, I'm happy to let it completely take over and I'll review either at the end, or break it down into 2 - 4 phases for long plans and I'll review after each phases.
    For this final step, which might be 30 minutes, I'll step out and do something else. I want to be sure nothing bad will happen on my machine if I do that, so sandboxing is important.
    skinner92714 days ago
    Look up Ralph
    theblazehen14 days ago
    To expand - This refers to the Ralph Wiggum loop, which keeps repeating a prompt to the agent until it responds with a completion promise
    https://awesomeclaude.ai/ralph-wiggum has some tips and examples of it
- cedws15 days ago
  Particularly an explosion of SaaS sandboxes... why should I pay a subscription for some remote sandbox with paltry compute power, which I need a constant internet connection to access? I have this brilliant processor in my own laptop I want to use that I have already paid for, I don't want to use someone else's!
  - reactordev15 days ago
    Some companies only allow access through a VDI like Windows Remote Desktop or some VMWare setup. It’s crazy.
    subscribed14 days ago
    For a very good reasons. Not everyone needs a full blown, powerful laptop/desktop to run their server-side tools in the browser.
jasonjmcghee14 days ago
If you don’t have control over all these pieces and just need terminal (ssh / remote vim etc), highly recommend https://mosh.org/
jpalepu3315 days ago
Great write-up on the evolution of your architecture. The progression from 200ms → 14ms is impressive.
The lesson about "delete code to improve performance" resonates. I've been down similar paths where adding middleware/routing layers seemed like good abstractions, but they ended up being the performance bottleneck.
A few thoughts on this approach:
1. Warm pools are brilliant but expensive - how are you handling the economics? With multi-region pools, you're essentially paying for idle capacity across multiple data centers. I'm curious how you balance pool size vs. cold start probability.
2. Fly's replay mechanism is clever, but that initial bounce still adds latency. Have you considered using GeoDNS to route users to the correct regional endpoint from the start? Though I imagine the caching makes this a non-issue after the first request.
3. For the JWT approach - are you rotating these tokens per-session? Just thinking about the security implications if someone intercepts the token.
The 79ms → 14ms improvement is night and day for developer experience. Latency under 20ms feels instant to humans, so you've hit that sweet spot.
- mnazzaro15 days ago
  1. The pools are very shallow- two machines per pool. While it's certainly possible for 3 tasks to get requested in the same region within 30 seconds, we handle that by falling back to the next closest region if a pool is empty. This is uncommon, though. 2. I haven't considered it, but yeah- the caching seems to work great for us. 3. The tokens are generated per-task, so if you are worried about your token getting leaked, you can just delete the task!
  - hinkley15 days ago
    One of the perennial problems with on call situations I encountered was that at some point everyone knew that a production incident was going on and people were either trying to help or learn by following along running the same diagnostics the on point people were running, and exhausting the available resources that were needed to diagnose the problem.
    Splunk was a particular problem that way, but I also started seeing it with Grafana, at least in extremis, once we migrated to self hosted on AWS from a vendor. Most times it was fine, but if we had a bug that none of the teams could quickly disavow as being theirs, we had a lot of chefs in the kitchen and things would start to hiccup.
    There can be thundering herds in dev. And a bunch of people trying a repro case in a thirty second window can be one of them. The question is if anyone has the spare bandwidth to notice that it’s happening or if everyone trudges along making the same mistakes every time.
globular-toast14 days ago
This is a problem that doesn't need to exist. Just run stuff locally on your dev machine with 12 cores and 32Gi of memory. What the hell has happened to need an entire computing cluster and all the network infrastructure between just to write software?
elena222311 days ago
If you suspect that someone is tracking your digital footprints, do not hesitate to reach out to ( techhackers330@gmail.com ) and kindly contact their mail for more consultation ( techhackers330@gmail.com ), Their expertise will empower you to reclaim your privacy and security in the digital world.
barishnamazov15 days ago
Not directly related but can't read the text on my phone. It's too thin, maybe you could increase the font weight a bit?
- mnazzaro15 days ago
  Thanks for letting me know- I'll take a look
rbbydotdev14 days ago
With so many apps in need of these sandboxes I wonder if a browser plugin could be built which provisions a sandbox on the users computer. A type of infra which could be utilized by different providers. The security implications are a little tough, but the attack surface could be likely reduced with the right practices
imiric15 days ago
So they used edge servers? How is this novel or insightful?
This article reads like a thinly veiled ad. Certainly not the best way to start a technical blog. If you didn't have the technical insight to know that physics is a factor in latency, why should I trust you with the problems your product actually solves?
mlhpdx15 days ago
Interesting. It seems to me that client side prediction and lag compensation (aka the basics for games in similar situations) would have been a viable alternative.
- mnazzaro15 days ago
  While I can see that working well for echoing keystrokes in a terminal, I'm not sure how it would work when you actually enter commands into the terminal. Same for opening files in the IDE.
  - mlhpdx15 days ago
    I didn’t get that the IDE is running on both sides, if that’s true. Wow.
    formerly_proven15 days ago
    This is why most IDEs nowadays ask you something about "trusting files" when opening a project. They tend to lick and run on everything in there (at least for dynamic-ish languages, and maybe not "run" intentionally but do stuff which is arbitrary code execution more or less by definition) to analyze the code.
    mnazzaro15 days ago
    Yup! There's a language server and file server running in the sandbox that the editor on the frontend interacts with.
- jgtrosh15 days ago
  These rely on undoing within a game's constrained environment. There isn't a way to magically undo any possible procedure with side effects.
  - mlhpdx15 days ago
    How so? Perhaps I don’t understand the context. Undoing text display is trivial, undoing code changes is already there, what’s missing? We’re not talking eons, less than a second.
nickandbro15 days ago
Interesting, I use cloudflare containers and it takes roughly 6-7 seconds to boot up using a very lightweight image.
hinkley15 days ago
When Covid hit I wasn’t the only one working remotely at my company, but I was the only one working remotely in North America, and apparently the only one trying to Work Smarter. By then there were a handful of feature toggles I had implemented that I quickly set to always on in development, but chief among them was that gzip service calls were a net loss in AWS but very very handy while working from home.
I also had switched a head of line service call that was, for reasons I never sorted out, costing us 30ms TTFB per request for basically fifty bytes of data, to use a long poll in Consul because the data was only meant to be changed at most once every half hour and in practice twice a week. So that latency was hidden in dev sandbox except for startup time, where we had several consul keys being fetched in parallel and applied in order, so one more was hardly noticeable.
The nasty one though was that Artifactory didn’t compress its REST responses, and when you have a CI/CD pipeline that’s been running for six years with half a hundred devs that response is huge because npm is teh dumb. So our poor UI lead kept having npm install timeout and the UI team’s answer for “my environment isn’t working” started with clearing your downloaded deps and starting over.
They finally fixed it after we (and presumably half of the rest of their customers) complained but I was on the back 9 of migrating our entire deployment pipeline to docker and so I had nginx config fairly fresh in my brain and I set them up a forward proxy to do compression termination. It still blew up once a week but that was better than him spending half his day praying to the gods of chaos.
- PaulHoule15 days ago
  One of the most dangerous ideologies is "all good things come to those who wait" or that waiting is a virtue. Applied by people working at all the levels of a system for years and years it leads to steps that could be 30ms taking 30s.
  - hinkley15 days ago
    Another variant is that suffering is a virtue.
    They say doctors make the worst patients. I wonder if programmers are the least useful of users.
    gsf_emergency_615 days ago
    My current ideology:
    Wait on tasks that are urgent, Act on ideas that are important.
    Is that even more dangerous?
    hinkley14 days ago
    Somehow I have a definition of The Last Responsible Moment that is both less and more urgent than most people's.
    gsf_emergency_614 days ago
    Most people, including Jeff Atwood? That would make it important!
sam_lowry_15 days ago
Should't we stop sending 100 IP packets on every keystroketo start with?
- mnazzaro15 days ago
  I could be misunderstanding, but typing "l" into the terminal should send one byte, which should fit into one TCP packet
  - williamstein15 days ago
    It's a reference to https://news.ycombinator.com/item?id=46723990
alooPotato15 days ago
@mnazzaro have you seen fly.io's new sprites.dev offering?
- mnazzaro15 days ago
  I have! It's pretty interesting and handles a lot of the problems discussed here, but is a little young for us. For one thing, it doesn't have fly replay, so we'd have to build a separate proxy again.
  If we were starting from 0, I would definitely try it. My favorite thing about it is the progressive checkpointing- you can snapshot file system deltas and store them at s3 prices. Cool stuff!
hackomorespacko15 days ago
[dead]
yellow_lead14 days ago
> TL;DR: If you want low latency sandboxes, cut out the middlemen and put your servers next to your users.
Valuable insight /s