Replay debugger for AI agents (fix failures without rerunning everything)(github.com)

1 pointby whitepaper2712 hours ago1 comment

whitepaper2712 hours ago
Hi HN,
I recently spent ~6 hours debugging a multi-agent workflow.
Every time I fixed a bug, I had to re-run the entire pipeline — repeating LLM calls, hitting APIs again, and waiting minutes just to test one change.
So I built Flight Recorder.
It records agent execution, shows exactly what failed, and lets you replay from that point — skipping everything that already worked.
Demo (30 sec): https://github.com/whitepaper27/Flight-Recorder/blob/main/de...
Example:
```
    @fr.register("crm_lookup")
    def crm_lookup(email):
        return db.query(email)  # slow API call

    @fr.register("scorer") 
    def score_lead(contact):
        assert contact is not None  # bug here

    fr.run(pipeline, "unknown@example.com")

    # later:
    $ flight-recorder debug last
    root cause: crm_lookup returned None

    # fix the bug, then:
    $ flight-recorder replay last
    # crm_lookup is cached (not re-run)
    # scorer runs with your fix
    # SUCCESS in seconds
```
The key idea: Fix one step. Don’t rerun everything.
It’s open source (MIT) and still early — I’d really appreciate feedback from anyone building agent workflows.
GitHub: https://github.com/whitepaper27/Flight-Recorder