Two questions on the threat model:
1. Can the LLM influence the capability presented to the tool? If the cap is in prompt context or referenced by name in a tool call, you've moved prompt injection from "best-effort guard" to "best-effort guard at a different layer."
2. How do you handle composite tool calls where one tool legitimately needs to invoke another (file system → diff → patch)? The capability has to flow but not amplify.
Three modules, each its own repo, glued by a single Rust
dispatcher CLI:
- Find (mcp-recon) — discovers an MCP server's tool surface,
emits a
structured findings document. Six deterministic rules today
(R1
unconstrained string input, R2 missing auth on side-effecting
tools,
R3 side-effect/name mismatch, R4 unbounded numeric on
money-ish params,
R5 money in description but no money side-effect, R6
indirect-injection
surface from external-fetch tools).
- Bind (capnagent) — mints macaroon-style capability tokens.
Ed25519
holder-of-key, attenuable by holders without contacting the
issuer,
revocable, signed denial receipts (HMAC-SHA256).
- Guard (mcp-guard) — deterministic policy evaluator with three
modes
(synthesize / evaluate / backtest). Pure-stdlib Python,
microsecond
decision path.
The wire format binding them is a public JSON Schema (Draft
2020-12,
additionalProperties:false, regex-validated OWASP LLM / NIST AI
RMF / MITRE
ATLAS IDs): https://capframe.ai/schema . I'd like this to become
the
SARIF-equivalent for AI agent security — happy to take
suggestions and PRs.
Install: `curl -fsSL capframe.ai/install | sh` —
sha256-verified, native
binaries on GitHub Releases for linux/macos/windows ×
x86_64/aarch64.
Source: https://github.com/capframe/capframe (MIT).
A "Pro" tier on the landing page is a waitlist, not a product —
ignore it
for now.
Three things I'd love feedback on:
1. The schema shape. If you've shipped or reviewed anything in
this space,
does the findings.v1 envelope work, or am I missing fields?
2. The Find / Bind / Guard decomposition. Is that how you'd want
to adopt
this incrementally in an existing agent stack, or are the
lines drawn
wrong?
3. The caveat DSL (`tool in [...]`, `max_refund <= 50`, `region
== "eu"`).
Reasonable on top of macaroons, or reinventing badly?
Happy to answer anything.