Most agent frameworks treat governance as an afterthought — log what happened, review it later, hope nothing went wrong. I built the opposite: a middleware that evaluates every tool call before execution and blocks it if the policy says no.
After shipping this and getting feedback from people running thousands of multi-agent dispatches, the pattern that keeps coming up is the same: teams build agents, deploy them, something breaks, and then they start thinking about what the agent should have been allowed to do.
How it works in practice:
An agent wants to call approve_payment with amount: 12000. Before the tool executes, the policy engine checks: is this agent allowed to call this tool with these parameters? The answer is deterministic — no LLM involved, no prompt engineering, just JSON rules evaluated in <5ms.
typescriptconst result = await governance.evaluate({
agentId: 'claims-agent',
tool: 'approve_payment',
params: { amount: 12000 },
});
// result.allowed = false
// result.reason = "Payments over 5000 require manual approval"
Every decision — allowed or denied — gets recorded in a SHA-256 hash-chained audit trail. If someone tampers with an entry, the chain breaks and verification fails.
If something goes seriously wrong, the kill switch blocks all tool calls synchronously in <100ms. Works even when your LLM provider is down, because the governance layer doesn't depend on it.
What I learned from production:
Auto-Discovery turned out to be more useful than I expected. Agents call tools you didn't plan for. The system registers unknown tools automatically — so you get a live inventory of what your agents actually do, then tighten policies based on real data instead of guessing upfront.
Permission Score (ratio of allowed vs. actually-used tools per agent) surfaces over-permissioning fast. Most agents have access to way more tools than they need.
The compound error problem is real. 90% accuracy per step sounds great until you chain 4 steps and you're at 66%. Per-step governance catches failures that end-to-end testing misses.
Stack: TypeScript, 1,264 tests, works with LangChain / Vercel AI SDK / CrewAI / n8n / anything that calls tools via HTTP. Framework-agnostic by design.
Source-available under Elastic License v2 (use it, modify it, embed it — just can't offer it as a competing hosted service).
Code: https://github.com/agentbouncr/agentbouncr
Site: https://agentbouncr.com
npm: npm install @agentbouncr/core