2 pointsby raffisk2 hours ago1 comment
  • raffisk2 hours ago
    Introed Determinism-Faithfulness assurance harness (DFAH) in new paper "Replayable Financial Agents" along with the open-source code

    A few findings: - Determinism and faithfulness are positively correlated (r = 0.45) for the tasks in my experiments - Schema-first Tier 1 (7–20B) stays near the 95% compliance threshold under stress. - Frontier models performed well on some tasks (e.g., strong action determinism in agentic triage), but the matrix helps define when HITL is still needed.

    note: I didn't have control of inferencing engines, or infra for these experiments, leveraged local models/frontier APIs

    Paper: https://arxiv.org/abs/2601.15322