1 pointby nielsprovos4 hours ago1 comment

nielsprovos4 hours ago
Hi! I have spent decades building security primitives such as bcrypt and Safe Browsing. The current rise in AI agents scares me due to them leveraging our full ambient authority. When proxying your full permissions, a single prompt injection can lead to sensitive data being exfiltrated or all your emails being deleted. I built IronCurtain as a research prototype to “secure” agents assuming that eventually they will go fully rogue. Instead of broad access, the agent is required to write TypeScript inside a V8 isolate. This code can only issue function calls that translate into Model Context Protocol (MCP) tool calls. Every call passes through a trusted policy engine driven by a plain English constitution compiled into deterministic rules. Credentials live exclusively in the MCP servers and remain invisible to the agent. It currently focuses on securing developer experiences like running Claude Code, with personal AI assistants as a longer term goal. I am looking for technical critiques of this trust model. If you see potential bypasses in the architecture, let me know.