I was deploying LLM agents for business processes and kept hitting the same problem: every agent framework defaults to "allow everything." No policy configured? All tools available. No audit? Hope your logs are enough. No trust model? Same permissions on day one as day one thousand.
Orkia flips every default.
Fail-closed by default. No policy rule matching a tool call = denied. Not "allowed until someone writes a deny rule." This is the opposite of how most frameworks work, and it's the single decision that shapes everything else.
Trust earned, not granted. Agents start restricted and gain autonomy through behavior. ATLAS tracks 4 dimensions (task completion, policy compliance, resource usage, audit completeness) and computes an autonomy level. The key insight: trust scores are keyed on SHA-256 of the canonical agent config. Change the model, tools, or instructions, trust resets to zero. No stale trust carries over.
Signed evidence, not logs. Every session produces a SEAL artifact, an ECDSA P-256 signature binding the runtime binary hash + config fingerprint + full governance event chain. It's not "we logged what happened." It's "we can prove which software version, running which config, produced which sequence of events." orkia verify checks it, orkia check gates your CI pipeline.
Sensitivity labels are monotone by construction. LabelSet wraps BTreeSet<DataLabel> and exposes insert/union but literally has no remove/clear method. Once data is classified, it stays classified. You can't break this property because the API won't let you compile code that tries.
MCP tool injection scanner. External MCP servers can embed prompt injections in tool descriptions (the text goes straight into the LLM system prompt). Orkia scans tool definitions for instruction overrides, exfiltration patterns, and zero-width characters before they're registered.
The loop guard has 6 detection layers running before policy evaluation: circuit breaker, outcome-aware dedup (same tool + same params + same result = faster escalation), ping-pong pattern detection (A-B-A-B cycles), proportional dominance (one tool consuming >80% of calls), per-tool rate limits, and warning escalation.
The architecture doc (ARCHITECTURE.md) goes deep on every design decision if you want to poke holes. Would love feedback, especially from people building agent systems in production or anyone who thinks the fail-closed default is wrong.