Show HN: Legal Action Boundary Eval for agentic legal workflows(github.com)

1 pointby kankouadio_vx3 hours ago1 comment

kankouadio_vx3 hours ago
LABE measures a seam most legal AI evals skip: the exact point where the system is about to do something real.
Same harness, same prompts, same playbooks, baseline vs VerifiedX.
Current result:
baseline executed 18 unjustified high-impact action points with VerifiedX that dropped to 0 false blocks in the current suite: 0 surviving-goal completion improved from 41.7% to 100% The repo includes methodology, raw artifacts, and repro steps.
This is a public proxy eval based on legal workflow classes Luminance publicly markets. It is not a claim about their internal system.
Legal is the first public instance. The same method applies to support, healthcare RCM, procurement, and finance too.
Happy to answer questions on methodology, false blocks, overhead, or how to design domain-specific action-boundary evals.