The cap is how nearly every verify-revise loop gets stopped today, and it's wrong in both directions. Stop too early and you clip a loop that was still improving. Stop too late and you pay for iterations after the loop already found its best answer — and ship the final attempt, which is sometimes worse than one it already had.
On a 2,000-trial benchmark (paired real-API runs, five loop patterns across six framework adapters, three model providers, pre-registered protocol with kill criteria), LoopGain cut total API spend by 92.8% vs max_iterations=20 ($27.05 → $1.94) and median wall-clock ~15× (30.9s → 2.1s). A cross-vendor judge preferred LoopGain's outputs on the weighted average (0.678 across 1,800 pairwise comparisons) — mostly because LoopGain returns the iteration with the lowest error it saw, while a fixed cap ships whatever the final iteration produced. The raw data and methodology are public, and the full run is browsable at dashboard.loopgain.ai/benchmark.
How it works: each iteration, LoopGain takes the ratio of the current error to the previous error — the loop's empirical loop gain (Aβ, borrowed from control theory). Aβ<1 means the error shrank — the loop is improving. Aβ≥1 means it held or grew — the loop is stuck or making things worse. A trajectory classifier reads the recent Aβ values, labels the loop (FAST_CONVERGE / CONVERGING / STALLING / OSCILLATING / DIVERGING), and decides whether to keep going, stop here, or stop and roll back to the lowest-error output so far.
Integration is a few lines around any loop that produces an error signal:
from loopgain import LoopGain
lg = LoopGain(target_error=0.0)
output = generate(task) # first attempt
while lg.should_continue():
errors = verify(output) # e.g. count of failing tests
lg.observe(errors, output=output) # the only LoopGain call in the loop
output = revise(output, errors)
result = lg.result # best_output, outcome, convergence_profile, savings_vs_fixed_cap
Honest limits, because they matter more than the headline: LoopGain detects convergence, not correctness — it inherits your verifier's blind spots. I re-graded my own benchmark and 4.5% of "converged" code-gen runs passed every check the loop ran but failed a fuller held-out test suite. And savings depend on workload: failure-heavy loops save ~78–84%, not 92.8%. There's a writeup on designing verifiers strong enough to trust on the blog.Apache-2.0, pip install loopgain. Adapters for LangGraph, CrewAI, AutoGen, LangChain, OpenAI Agents SDK, and the Claude Agent SDK; the raw API works for anything with a measurable error.
What I'd really love from HN: if you run production agent loops, I'm interested in whether the stop decisions match what you see empirically — and what your loops' error signals actually look like, since the verifier is what makes or breaks the stop. Happy to answer anything.