I built Agent Trust to explore a different approach: treat agents like services with identity, permissions, and track records. It's a Python SDK (MIT, 135 tests) that adds several layers:
Identity – Each agent gets an Ed25519 keypair and a DID (did:agent:). Every action is cryptographically signed.
Delegation – Scoped permissions with caveats and expiry. An agent can only do what it's been explicitly allowed to do.
Reputation – Computed from verified outcomes, not self-reported metrics.
Routing – UCB-based selection picks agents based on past performance, balancing exploration and exploitation.
Enforcement – Permissions can be restricted or revoked at runtime. Cryptographic, not advisory.
pip install kanoniv-trust
from agent_trust import TrustAgent
trust = TrustAgent() # SQLite, zero setup
trust.register("researcher", capabilities=["search", "analyze"]) trust.register("writer", capabilities=["draft", "edit"])
trust.delegate("researcher", scopes=["search"], expires_in=3600) trust.delegate("writer", scopes=["draft", "edit"])
trust.observe("researcher", action="search", result="success", reward=0.9)
trust.authorized("researcher", "search") # True trust.authorized("researcher", "analyze") # False — not delegated
best = trust.select(["researcher", "writer"], action="search")
The part I've found most useful: agents can read their own verified track record before acting.
ctx = trust.recall("researcher") print(ctx.guidance) "researcher excels at search (95% success). Weaknesses: none observed. Recommendation: High confidence for search tasks."
This injects a summary of past outcomes — success rate, strengths, weaknesses — directly into the prompt. Agents adapt based on actual performance without retraining. It's a simple form of in-context reinforcement learning, and it's the thing that surprised me most: agents genuinely behave differently when they can see their own track record.
There's also a dashboard (Observatory) for visualizing reputation scores and delegation graphs, and integrations for LangChain and CrewAI.
I looked at Langfuse, AgentOps, and similar tools — they're good at tracing but stop at observation. This tries to close the loop: identity, verified history, and decision-making in one system.
It's early but usable today. I'd especially appreciate feedback from people running multi-agent systems in production.