2 pointsby raffisk16 days ago1 comment

raffisk16 days ago
Introed Determinism-Faithfulness assurance harness (DFAH) in new paper "Replayable Financial Agents" along with the open-source code
A few findings: - Determinism and faithfulness are positively correlated (r = 0.45) for the tasks in my experiments - Schema-first Tier 1 (7–20B) stays near the 95% compliance threshold under stress. - Frontier models performed well on some tasks (e.g., strong action determinism in agentic triage), but the matrix helps define when HITL is still needed.
note: I didn't have control of inferencing engines, or infra for these experiments, leveraged local models/frontier APIs
Paper: https://arxiv.org/abs/2601.15322