The agent harness belongs outside the sandbox(www.mendral.com)

68 pointsby shad425 hours ago23 comments

tptacek2 hours ago
There are other models. Eschew the sandbox. Give the agent a computer, with all the trimmings, but keep that computer segregated from sensitive resources. Tokens are a solved problem: tokenize them[1] or do something equivalent with a proxy. The same thing goes for secrets.
A lot of this post presents false dichotomies. It assumes the existence of a sandbox that is by definition ephemeral or "cattle-like". Why? There are reasons to do that and reasons not to do that. You can have a durable computer with a network identity and full connectivity, and you can have that computer spin down and stop billing when not in use.
There are a zillion different shapes for addressing these problems, and I'm twitchy because I think people are super path-dependent right now, and it's causing them to miss a lot of valuable options.
[1]: https://fly.io/blog/tokenized-tokens/ (I work at Fly.io but the thing this post talks about is open source).
- nvader2 hours ago
  I'm also very excited by the different shapes for solving problems in this space. A little worried that the path dependence is ACTUALLY a bit warranted since "popular harness engineering is just claude-wrapping" is a bit of a self-fulfilling prophecy today.
  I've heard many claims that because LLMs are tuned to specific harnesses, we should expect worse performance with novel architectures. That seems to make people reluctant to try to put effort into inventing them.
  - aluzzardian hour ago
    Author here.
    I’m worried about the same (models tuned for specific harnesses).
    We actually work around that by respecting the “contract”. For instance, our harness’ Bash signature is exactly the same as Claude’s. We do our sandboxing stuff and respond using the same format.
    In the “eyes” of the model there’s no difference between what Claude does and what we do (even though the implementation is completely different).
    We basically use Claude’s tools as API contract
- aluzzardi2 hours ago
  Author here.
  This is an interesting and novel field, so I’m not pretending I know the answers, but this is what worked for us :)
  At the end of the day, and oversimplifying things: why would I want to spawn a for loop that calls an API (LLM) into its own dedicated sandbox/computer?
  When the model wants to run a command, it’ll tell you so. Doesn’t need to be a local exec, you can run it anywhere, the model won’t know the difference.
  The agent loop itself doesn’t need sandboxing. In many cases, most tool calls don’t require sandboxing either. For the tools that do require a computer, you can route those requests there when needed, rather than running the whole software in that sandbox.
  To me running the agent loop in the sandbox itself feels like “you should run your API in your DB container because it’ll talk to it at some point”.
jdw643 hours ago
Personally, I find it fascinating to watch how, whenever a new technology appears, people start competing to define and own its standards.
Manus rebuilt its harness five times in six months. The model stayed the same, but the architecture changed five times.
LangChain re-architected Deep Research four times in one year.
Anthropic also ripped out Claude Code’s agent harness whenever the model improved.
Ever since Mitchell Hashimoto mentioned the harness in February, people have been trying to claim that concept. Eventually, someone will probably sell a book called Harness Engineering. I will buy it, of course. Then I will write a blog post about it that nobody reads, with a link that will be buried under ShowDead as soon as I submit it to HN.
And by that point, IT companies will start asking:
“You’re a new grad, right? You know harness engineering, don’t you?”
- aluzzardi3 hours ago
  Author here.
  In my opinion, the main driver here is how fast models have evolved in the past 12 months. It makes the architecture of everything around them obsolete, very fast.
  We went from using models as a building block, wrapping them in heavy workflow code, to now models being smart enough to drive their own workflows and planning.
  - jdw643 hours ago
    [dead]
- tokioyoyo2 hours ago
  Just wait 6 months for something new to come up and everyone will forget about harnesses.
- TeMPOraL3 hours ago
  > Ever since Mitchell Hashimoto mentioned the harness in February
  What. The idea is as old as anyone can remember, and wrt. LLMs, it was known to be important since at least as early as ChatGPT being first released.
  - jdw643 hours ago
    Yes, the concept itself is not new. Around 2022, people would usually have called it the orchestration layer.
    But I think the term started being used closer to its current meaning around this point:
    https://www.softwareimprovementgroup.com/blog/what-is-harnes...
    In a way, the sequence was something like:
    prompt engineering(23~4) -> context engineering(25) ->harness engineering(26)
    At first, it was mostly understood as a correction or extension of prompt engineering. But the idea of “harness” as the layer that corrects, constrains, and operationalizes agents seems to have emerged much more clearly around 2026.
    So yes, there is definitely some terminological confusion in the early phase. That is normal. New technical fields often begin with several competing names for almost the same layer, and only later does one term become stable.
    redanddead25 minutes ago
    My 2c:
    The word harness brings the truth of LLMs back down to Earth.
    it really felt like between 2018 and 2022ish like LLMs had this magical aura, like the orchestration layer was intelligent, maybe even recursive, beyond what simple functions could do. It was assumed that this was a solved problem. The word "orchestration" denoted it, the words we used were full of optimism. When you lift the veil, it really is just regex, and cool tricks sure, but it's a harness it's a utility, there's no magic here, there's realism.
    Maybe the labs even had a part to play in this as well; attempting to make themselves look magical. I mean just look at the choice of name for "Mythos", it's about bringing back that feeling of myth and magic after we saw under the veil.
    The reality is that the labs have produced magical models yes, but are locking them into ecosystems that leave a lot to be desired, and are easily reproducible, and essentially are cron jobs, regex.. things we've seen in traditional cloud for decades. It feels like an attempt to create a moat where there is none.
    Maybe I'm wrong but this has been my impression
    magicalistan hour ago
    Harness itself was a widely used term by at least the "[LLM] plays pokemon" trend, which was a year ago[1]. That was basically the term of art to use when arguing about just how much special treatment LLMs should get.
    "harness engineering" is the term claimed by that article to have originated in February. It does seem obvious in retrospect and I don't remember an origination point, but there's at least one hn comment predating that in December[2] and it doesn't treat it as novel.
    I will admit that my bias is against any self congratulatory buzzword fads (I'm still not over "MCP is the USB of LLMs" or whatever and that's been a year now too). "Who coined the term harness engineering?" -> who cares? It was already widely being done.
    [1] https://www.lesswrong.com/posts/7mqp8uRnnPdbBzJZE/is-gemini-...
    [2] https://news.ycombinator.com/item?id=46331242
    jdw64an hour ago
    I read your comment. I think we may be talking about slightly different contexts.
    The Pokémon article you linked is basically about benchmarking. In that context, the harness functions as part of the benchmark setup: the controlled environment around the model, the available inputs, tools, and assistance.
    The current usage of “harness,” at least in the agent engineering discussion, seems closer to a lower-level runtime layer, almost like an OS around the agent.
    So I see this as a transition: from “harness” as a narrower benchmark/control-variable layer to “harness” as the broader operating environment of the agent.
    That does not mean I think your point is wrong. With topics like this, the interpretation depends on which part of the lineage one emphasizes. The first appearance of the idea may go back to 2022 or earlier, while the usage that looks closer to the current meaning may have emerged at a different point.
    I am probably giving more weight to the SIG article, while you are giving more weight to a different point in the lineage. Both seem reasonable to me.
zmmmmm3 hours ago
I think it omits the real reason I want to run the harness in the sandbox: I barely trust the harness more than the LLM, at least at this point in time. They are so rapidly evolving along with the underlying models, that I don't think they are a reasonable component to rely on to provide safety constraints. Put more precisely: if your harness has an ability to do something the LLM can't, and it has a set of conditions under which the LLM can cause those to be invoked, you have to assume the LLM will work out those conditions and execute them. Effectively you have an arm of the lethal trifecta and pretending otherwise is more dangerous than helpful.
Having said that, some components need to live outside the sandbox (otherwise, who creates the sandbox?). Longer term, I see it as a dedicated security layer, not part of the harness. This probably has yet to emerge fully but it's more like a hypervisor type layer that sits outside of everything and authorises access based on context, human user, etc and can apply policy including mediate the human intervention for decision points when needed.
- angry_octetan hour ago
  I don't trust the harness, and I especially don't trust that the LLM won't be able to subvert the harness, or trick me via the harness. I assume that the LLM will be able to leak any secret in the harness context to arbitrary internet destinations, or somehow encode the secret in a work product. Eg space characters at the end of lines encoding access tokens.
  Having the harness in one VM, and tool use applied to user data in another, is about as safe as you can be at present. You can mount filesystem fragments from the data VM into the harness VM, but tool execution remains painful.
  Having all authorisation and access control exist outside of the harness layer is essential. It should only have narrowly scoped and time limited credentials that are bound to its IP, and even then that is problematic.
- aluzzardian hour ago
  Author here.
  I should have made it more clear that the article is about agent / harness building (not about running third party agents).
  > I barely trust the harness more than the LLM
  Since we built it, I trust it just as much as I trust our API server :)
  The latter gets untrusted inputs from the internet, while the former gets untrusted inputs from the LLM
MrDarcy3 hours ago
> A lot of what an agent does doesn't need a sandbox at all: thinking, calling APIs, summarizing, waiting for CI.
I don’t get it. Calling an API requires a sandbox in most cases. The others could be abused in service of an un-sandboxed agent with API access.
If the harness is outside the sandbox then it’s just an ambiguous and confusing security model and boundary.
- nvader2 hours ago
  > Calling an API requires a sandbox in most cases.
  I'm not following why this would this be the case? The purpose of calling the API is to get data or effect a state transition on some remote service, but I don't follow why the originating machine matters.
  Or is your objection about auth?
- aluzzardi2 hours ago
  Author here.
  I think the confusion is that “agent” is used for two very different things:
  - building an agent
  - an “agent” product/runtime (Claude Code, etc)
  In the first case, the model never executes anything. It just outputs something like “call this API”. Your code is the one doing it, with whatever validation you want. There’s no need for a sandbox there because there’s no arbitrary execution.
- shad422 hours ago
  No, for example a tool call calling an API. So the llm does not have access to the API keys, the tool does. For example an API call that fetches some data remotely and return it to the llm. You don’t need a sandbox for it. It’s faster and more efficient to keep this out of the sandbox.
skybrian3 hours ago
They didn't make a clear argument in favor of that architecture and I'm not really convinced.
On exe.dev the agent (Shelley) runs in a Linux VM, which is the security boundary. All the conversations are saved to a sqlite database, and it knows how to read it, so you can refer to a previous conversation in the database. It's also handy for asking the AI to do random sysadmin stuff, since it can use sudo.
A downside is that there's nowhere in the VM where secrets are safe from possibly getting exfiltrated via an injection attack. But they have "integrations" where you can put secrets into an http proxy server instead of having them locally.
Also, you don't need to use AI at all. You can use the VM as a VM.
NJL30003 hours ago
Two points:
-What remains unsolved is what should an Agent reasonably have access to in what context and for how long (etc).
Probabilistic code that can run far faster than human driven code, we don’t have a great model yet. We all should spend our energy there…
- Separating / putting controls on the FS resource is no different than putting the agent behind a firewall / allow-deny list.
It doesn’t invalidate running a sandbox in a sandbox to have better security.
qudat38 minutes ago
Interesting idea. Tangentially related I’ve been using my local agent to interact with remote shells via zmx, described here: https://bower.sh/zmx-ai-portal
The use case is different but this article strikes some vague similarities around an agent API to remotely execute commands.
saltcured4 hours ago
Sure, the experimental, agentically-developed code should be tested in a sandbox. This sandbox should contain the damage of the code execution when it goes wrong.
But shouldn't there really be another sandbox where the agentic tool calls execute? This is to contain the damage of the tool execution when it goes wrong.
And, the agent harness itself should either implement or be contained in a third sandbox, which should contain the damage of the agent. There should be a firewall layer to limit what tool requests the agent can even make. This is to contain the damage of the agent when it formulates inappropriate requests.
The agent also should not possess credentials, so it cannot leak them to the LLM and allow them to be transformed into other content that might leak out via covert channels.
- shad424 hours ago
  Yes, it's also because the agent described in the post is doing some operations on the user code (fix CI pipelines, rerun tests, fix them, etc...). So another big reason to use the sandbox is to run things like bash on a user code. you don't want credentials or anything trusted inside that sandbox, including the LLM api key.
- aluzzardi4 hours ago
  Author here. Depending on how it’s designed, the harness itself doesn’t need any sandboxing.
  At the end of the day, it’s a “simple” loop that calls an external API (LLM) and receives requests to execute stuff on its behalf.
  It’s not the agent running bash commands: you (the harness author) are, and you’re in full control of where and how those commands get executed.
  In the article’s case, bash commands are forwarded to a sandbox, nothing ever runs on the harness itself (it physically can’t, local execution is not even implemented in the harness).
trjordan4 hours ago
Nah. Worse is better.
The reason agents work is because they have access to stuff by default. The whole world is context engineering at this point, and this proposal is to intermediate the context with a bespoke access layer. I put the bare minimum into getting my dev instance into a state where I can develop, because doing stuff (and these days: getting my agent to do stuff) is the goal.
This makes slightly more sense if you're building a SaaS and trying to get others to give you access to their code, their documents, and the rest so you can run agents against it. But the easiest, most powerful way is to just hook the agents up to the place that's already set up.
- ossa-ma4 hours ago
  They are building exactly what you described and this is their architectural solution to ensuring their YOLO agents do not nuke their customers code/documents/databases by sandboxing everything in the workspace — the git checkout the agent is working on, plus whatever's needed to run commands against it (compilers, package managers, etc.).
blcknight4 hours ago
I am not sure anyone knows what a harness is at this point. I've heard 17 different definitions of it at this point. It's almost like a buzzword in search of a problem.
- aluzzardi4 hours ago
  Author here. My definition is: you take an agent, remove the model and you’re left with the harness.
  Tools, memories, sandboxing, steering, etc
  - ossa-ma4 hours ago
    Clean definition, stealing it. Way better than mine: "Now imagine Claude as Shinji and Claude Code as Eva..."
    TeMPOraL2 hours ago
    Huh. My definition - or rather, explanation - has always been, "The model is just a big bag of floats you multiply with some numbers to get some numbers out, plus a regular program that runs a loop which, at minimum, turns inputs (text, images) into a stream of numbers, pushes it through those multiplications against the bag of floats, and turns results back into text/images/whatnot. That regular program is called a harness[0]. Now, the trick to make LLMs into agents, is to add another loop in the harness that reads the output and decides whether to send it out to user, or do something else, like executing more code (that's what tools are), or feeding it back to input with some commentary (that's how you get "thinking"), or both (that's how you get the "agentic loop")".
    Because there isn't really much more to it. And ever since we, i.e. those of us who played with ChatGPT API early on, bolted tools to it, some half a year before OpenAI woke up and officially named it "function calling" - ever since then, we knew that harness was the key. What kept changing was which logic (and how much of it) to put in explicitly, vs. pushing it back to the model on the "main thread", vs. pushing it to a model on a separate conversation track. But the basic insight remains the same.
    --
    [0] - Well, today - until recently you'd call it a "runner" or "runtime".
  - Dotnaught2 hours ago
    So, client?
  - beepbooptheory4 hours ago
    But what is an agent without tools?
    tomrod3 hours ago
    Code.
    beepbooptheory3 hours ago
    Like as in what its made out of, or what it makes? Neither really makes sense here? Lots of things are made out of code and not necessarily agents, but also (from my decidedly outside observer perspective) "agents" are not limited to being code producers either.
- nextaccountic3 hours ago
  If you use cloud models.. the harness is what runs in your computer
  AI companies would love if everything ran in their cloud, but arguably there are latency reasons or other reasons to run at least some stuff in your own computer
- 4 hours ago
  undefined
- brazukadev3 hours ago
  the agent harness is the REPL. The evaluation + loop.
- irishcoffee4 hours ago
  I don’t even know what an agent means, let alone harness.
  - TeMPOraL2 hours ago
    I'd say the core is that the harness/runtime/${whatever you call it} doesn't just unconditionally sends model output to the user, and user input to the model, +/- some post-processing, but instead runs a loop that feeds the output back to the model if some conditions are met. That gives you basic "thinking" and single "function calling" a-la early ChatGPT. However, if you allow it to loop arbitrary number of times and allow the output to decide whether to loop or to stop, you get a basic agent.
  - IgorPartola4 hours ago
    There is an LLM API. You send it a system prompt and the conversation history. If the last message is a user message the agent will send back a response. It can also send back a “thinking” message before it sends a response and it can also send back a structured message with one or more function calls for functions you defined in your API request (things like “ls(): list files”).
    The harness is the part that makes the API calls, interacts with the user, makes the function calls, and keeps track of the conversation memory.
    You can also use the LLM to summarize the conversation into a single shorter message so you get compaction. And instead of statically defining which functions are available to the LLM you can create an MCP server which allows the LLM to auto-discover functions it can call and what they do.
    That’s the whole magic of something like Claude Code. The rest is details.
  - zmmmmm3 hours ago
    Agent is currently defined as "what I want it to mean given whatever I am talking about".
    Personally, for me it embodies a level of autonomy. I define that as, an AI model with potential to interact with something external to itself based on its output, where that includes its own future behavior.
nvader2 hours ago
Hey aluzzardi, thanks for sharing this article!
I'm really intrigued by your point on read-memory vs a dedicated read interface, because it is a real insight about success rates in harness design.
How did you come to the conclusion you did? Could you speak a little to the evaluations you ran, or the data or anecdotes you collected to validate that decision?
I'm also curious about the overall framing of the question, which I'll challenge with, does the agent have to have a where?
An agent could be modeled by a set of states and transitions. I don't think that there's anything inherently necessary about the current "one process claude" approach for harnesses, other than convenience. Why hasn't a fully distributed harness, built on functions and tables, gained more mindshare?
lwansbrough2 hours ago
I had an idea that devs could build wasm modules that would define tools and instructions, and a harness could load them. Kind of like MCP but with certain assurances about the sandboxing. You could build a package manager around these behaviours.
I still kind of think it’s a decent idea but it’s too close to MCP with drawbacks that make it a harder sell than MCP. It’s hard to compete on functionality from a secure sandbox if users decide they don’t care about security.
vursekar2 hours ago
> Three engineers trigger the agent on the same incident, and they all see stale state until their sessions end. Conflict resolution, eventual consistency, cache invalidation.
Arguably this is a feature not a bug. Conflict resolution forces the need for a process to come to agreement on a common source of truth - one of the reasons why most Git repos don’t allow users to push to main directly. Writing directly to a shared memory database seems like it would result in chaos and a host of side effects once the number of users scales.
spankalee3 hours ago
This is angling in the right direction, but I think it has two problems:
1) It's still assuming agents have CLIs. This is a very developer-centric concept of agents, and doesn't map well to either consumer or enterprise agents that aren't primarily working with files. Skills, plans, TODO lists, and memory are good, but don't have to be modeled as raw file access. Many harnesses have tools for them.
2) It's talking about a singular sandbox. That's not good enough for prompt injection prevention, secure credential management, and limiting the blast radius of attacks.
Koffiepoeder4 hours ago
Slightly related: I am looking for:
- Easy single command CLI agent spawning with templates
- Automatic context transfer (i. e. a bit like git worktrees)
- Fully containerised, but remote (a bit like pods)
- Central, mitm-proxy zero trust authn/authz management (no keys or credentials inside the agents), rather enrichment in the hypervisor/encapsulation
- Multi agent follow-up functionalities
- Fully self hosted/FOSS
Basically a very dev-friendly, secure, "kubernetes"-like solution for running remote agents.
Anyone has an idea of how to achieve this or potential technologies?
- nvader2 hours ago
  Yeah, have you tried `mngr` by Imbue? It seems to have a bunch of the features you're looking for.
  https://github.com/imbue-ai/mngr
sudb3 hours ago
Is secretly rerouting reads/writes/edits of skills and memory any easier than just dumping the actual skills and memory files on disk at sandbox startup?
Another benefit of moving the harness outside the sandbox is you get to avoid accidentally creating a massive distributed system and you therefore don't have to think so much about events/communication between your main API and your sandboxes.
solidasparagus4 hours ago
Why are two concurrent sessions updating the same memory key with different values? IMO it probably points to a fundamental flaw in how memory is being thought about and built.
- aluzzardi4 hours ago
  Author here. Because of parallelism and non determinism.
  This problem is quite common and not limited to memories. For instance, Claude Code will block write attempts and steer the agent to perform a read first (because the file might have been modified in the meantime by the user or another agent).
  Same principle here: rather than trying to deterministically “merge” concurrent writes, you fail the last write and let the agent read again and try another write
Retr0id4 hours ago
It took me a while to grok why this made any sense, I think the context is that this is for hosting many agents as a service.
- qezz4 hours ago
  Exactly, my understanding is also that they host agents as a service. The actual use case is mentioned in the end of the article, which makes it hard to reason about.
  Anyway. General advice: treat harnesses as any other (third-party) software that you run on your server. Modern harnesses (the ones from big companies, you need to subscribe to) are black boxes. Would you run a random binary you fetched from the internet on your server? Claude code, codex etc. are exactly this.
  - shad424 hours ago
    We don't host 3rd party agents (I don't know if this what you implied). We built an agent that monitors CI pipelines, tests failures, performance and auto opens PR to address issues we find. We host our agent loop on a backend (it's in go), and we call to the sandbox when we run operations involving the user code.
4 hours ago
undefined
8thcross4 hours ago
we are running a harness outside the sandbox, inside a sandobx.
thinkneo_ai4 hours ago
[flagged]
eddyaipt2 hours ago
[dead]
kweiza4 hours ago
[dead]