How do you secure AI coding agents?

1 pointby peanutlifea month ago1 comment

niyikizaa month ago
I've been going down this exact rabbit hole for the last few months. The 'opt-in guardrails' problem you mentioned is the dealbreaker. If the agent can just ignore the read_file tool wrapper and call os.system('cat ...'), the policy is useless.
I ended up building a 'capability token' primitive (think Macaroons or Google Zanzibar, but for ephemeral agent tasks) to solve this.
My approach (Tenuo) works like this:
1. Runtime Enforcement: The agent gets a cryptographically signed 'Warrant' that mechanically limits what the runtime allows. It’s not a 'rule' the LLM follows; it’s a constraint the runtime enforces (e.g., fs:read is only valid for /tmp/*).
2. Attenuation: As the agent creates sub-tasks, it can only delegate less authority than it holds.
3. Offline Verify: I wrote the core in Rust so I can verify these tokens in ~27µs on every single tool call without a network round-trip.
If you are building a POC, feel free to rip out the core logic or use the crate directly. I’d love to see more tools move away from 'prompt engineering security' toward actual runtime guarantees.
Repo: https://github.com/tenuo-ai/tenuo
- peanutlifea month ago
  This is a really helpful comment, and I actually ran straight into the exact failure mode you’re describing.
  I had a PreToolUse hook enabled that was supposed to block reads of ~/.env. Claude tried to read it, hit an error, then asked me for permission. When I said yes, it retried the read and succeeded. The hook was effectively bypassed via user consent.
  That was the “oh wow” moment for me. Hooks can influence behavior, but they don’t remove authority. As long as the agent process still has filesystem access, enforcement is ultimately negotiable. I even tried adding an MCP server, but again its upto Claude to pick it up.
  Your capability token approach is the missing piece here. It makes the distinction very clear: instead of asking the agent to behave, you never give it the power in the first place. No token, no read, no amount of prompting or approval changes that unless a new token is explicitly minted.
  The way I’m thinking about it now is:
  hooks are useful for intent capture and policy decisions
  capability tokens are the actual enforcement primitive
  approvals should mint new, narrower tokens rather than act as conversational overrides
  Really appreciate you sharing Tenuo. This feels like the right direction if we want agent security to move past “prompt engineering as policy” and toward real runtime guarantees.
  - niyikizaa month ago
    That "oh wow" moment you described, where the agent effectively social-engineered the user to bypass the hook, is exactly the failure mode that pushed me to build this. Hooks are advisory, capabilities are mandatory.
    Your framing of "approvals should mint new tokens" is the core design pattern in Tenuo.
    The agent starts with zero file access. When it asks "Can I read ~/.env?" and the user says "Yes", the system doesn't just disable the hook. It mints a fresh, ephemeral Warrant for path: "~/.env".
    That way, even if the agent hallucinates later and tries to reuse that permission for ~/.ssh (or even ~/.env after ttl), it physically can't. The token doesn't exist.
    Glad Tenuo resonates. This is the direction the whole ecosystem needs to move