The idea is simple: instead of evaluating agents only by task performance, NEXO treats agent execution as a controlled system with policy gates, risk thresholds, and runtime constraints.
The runtime classifies behavior into regimes (healthy, defensive, unstable) and measures operational stability under different environment conditions.
In experiments, a sharp stability frontier appears: - below a minimum risk threshold → agents collapse - above the threshold → stability emerges - increasing uncertainty collapses stability again - no gradual degradation band observed (phase-transition behavior)
The repository includes: - governed execution runtime - reproducible experiment harness - stability frontier analysis - policy and capability controls
This is early research. Feedback and criticism welcome.