Content guardrails (NeMo, LlamaGuard) control what models say, not what agents do. Agent sandboxes scope directories but don't back anything up. Checkpoint tools provide rollback, but the agent can delete the checkpoints. OPA evaluates policy in microseconds, but nobody has bridged it to AI agent frameworks yet.
Agent Gate sits in that gap. It classifies tool calls against pre-computed policy, enforces directory boundaries, and vault-backs every destructive target to an agent-unreachable location before the action proceeds. If the backup fails, the action is blocked.
Live tested with Claude Code in fully autonomous mode via PreToolUse hooks. 18/18 tests passing. The vault creates per-operation timestamped snapshots, so multiple overwrites of the same file produce separate recovery points.
Background: I spent years in nuclear command and control where Permissive Action Links verified authorization, not judgment, before any action could proceed. Same architectural principle applied here.
Honest about the limitations: the bash parser is naive, shell expansion isn't evaluated, and this is a safety net for well-intentioned agents, not a security boundary against adversarial escape. More detail in the README.
Python, YAML policy definitions, Apache 2.0. Roadmap includes MCP proxy integration and OPA/Rego support.
Happy to answer questions about the architecture.