But there's only so much that you can do with that, if you need to call third party APIs on the user's behalf I'd recommend going with a solution revolving around custom token exchange (https://datatracker.ietf.org/doc/html/rfc8693).
You can check something like Auth0 offers: https://auth0.com/ai
Which would cover things like token exchange for third party APIs, human in the loop, and also authorization methods.
Also, You should never connect an agent directly to a sensitive database server or an order/fulfillment system, etc. Rather, you'd use "middleware proxy" to arbitrate the requests, consult with a policy engine, log processing context, etc before relaying the requests on to the target system.
Also consider subtleties in the threat model and types of attack vector. how many systems the agent(s) connect to concurrently. See the lethal trifecta https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/
I built a small authority gateway that sits between agents and downstream systems and forces all high-risk actions through deterministic policy before execution.
In a v2 iteration I just shipped, the gateway returns:
• risk scores on attempted actions • the policy path that fired • highlighted spans in the agent output that triggered the rule • a preview of the approval chain required • admin endpoints to review and approve pending actions
The key thing I learned: teams don’t just need allow/deny. They need explainable enforcement so when something breaks they can see whether policy failed or the agent bypassed intent.
Curious whether people here treat message drafting and API execution differently, or if everything funnels through the same enforcement layer.
It is instructive to consider why the same does not apply in this case.
And see https://www.schneier.com/blog/archives/2026/01/why-ai-keeps-... .
At keypost.ai we're building exactly this for MCP pipelines. The proxy evaluates every tool call against deterministic policy before it reaches the actual tool. No LLM in the decision path, so no reasoning around the rules.
Re: chrisjj's point about "fix the program" - the challenge is that agents are non-deterministic by design. You can't unit test every possible action sequence. So you need runtime enforcement as a safety net, similar to how we use IAM even for well-tested code.
The human-in-the-loop suggestion works but doesn't scale. What we're seeing teams want is conditional human approval - only trigger review when the action crosses a risk threshold (first time deleting in prod, spend over $X, etc.), not for every call.
The audit trail gap is real. Most teams log tool calls but not the policy decision. When something goes wrong, you want to know: was it allowed by policy, or did policy not cover this case?