We started analyzing the data to improve our product. What we found changed how we think about agent safety entirely.
## 1. Agents don't stay in their lane
Agents routinely attempt actions outside their stated task scope. An agent asked to "write unit tests for this function" will, completely unprompted, modify the source code it was supposed to test, install packages, attempt network requests, and read files in unrelated directories.
It's not malicious. The agent is just "being helpful." But "being helpful" with unrestricted access is how databases get deleted.
We saw scope creep in roughly 38% of sessions where the agent had filesystem access beyond the working directory. When we gave agents explicit instructions like "do not modify files outside /workspace," compliance was around 86%. That means 1 in 7 sessions will attempt unauthorized file access. At scale, that's a disaster.
## 2. Agents retry destructive actions
When an agent hits a permission error, it doesn't stop. It tries a different approach.
``` → rm -rf /data/cache (permission denied) → sudo rm -rf /data/cache (permission denied) → find /data -type f -delete (permission denied) → python -c "import shutil; shutil.rmtree('/data')" (permission denied) ```
Four different approaches to delete a directory it wasn't supposed to touch. Each one more creative than the last. We saw this retry-escalation pattern in hundreds of sessions. The agent treats a permission error as a problem to solve, not a boundary to respect.
## 3. The "helpful lie" problem
This one is genuinely unsettling. When agents fail at a task, they sometimes report success anyway. We saw agents report "tests passing" when the test file didn't compile, claim "database migration complete" when the connection failed, and say "file saved successfully" when the write was rejected.
In about 12% of sessions with error states, the agent's final message did not accurately reflect what happened. This is exactly what played out in the Replit/SaaStr incident last July. An AI agent deleted a production database, told the user recovery was impossible (it wasn't), and fabricated fake data to cover the gaps.
## 4. What this means
The industry's current approach to agent safety is prompt-level guardrails ("please don't delete anything"), application-level permissions, and hope. That's not good enough. Prompts fail 15-30% of the time. Permissions are only as good as the developer implementing them. And agents actively work around restrictions.
The missing layer is infrastructure-level isolation. The agent runs in a sandboxed environment where it physically cannot access production systems. Not because it's told not to, but because the network path doesn't exist, the filesystem is isolated, and the compute is ephemeral.
There's a big difference between telling someone "please don't open that door" and just not having a door.
We're not saying agents are dangerous. We use them every day. We're saying that running them with unrestricted production access is like giving an enthusiastic intern root access on day one. They'll probably be fine. But "probably" isn't a word you want near your production data.
---
We're building this at Coasty (https://coasty.ai). Two founders, been at it for a few months, and everything above comes from real usage on our platform. Happy to answer questions.