Signal fidelity audit — checks evidence staleness, completeness, weight divergence, confidence calibration Pattern classification — matches fidelity flags against learned failure signatures with historical accuracy data Reliability scoring — rolling per-agent × per-payer accuracy with drift detection and ECE tracking Authority gate — issues verdicts: full autonomy, act-and-notify, human-required, or quarantine
The key insight: an agent's stated confidence and its actual reliability are different things, and the gap grows silently as upstream data drifts. SENTINEL tracks that gap at the MCP protocol layer. Built for healthcare prior auth but the architecture is domain-agnostic — anything where an agent reasons over retrieved evidence before making a consequential decision. agentgateway handles RBAC (CEL policies per MCP tool), session management, and audit logging. SENTINEL handles the reasoning quality audit. Integrations with Datadog (drift monitors), Braintrust (eval scoring), and Cleric (incident escalation). Repo: https://github.com/espirado/agent-secure Blog post with architecture details and demo walkthrough: https://espiradev.org/blog/sentinel-ai-reasoning-observatory...