1 pointby whitepaper2712 hours ago1 comment
  • whitepaper2712 hours ago
    Hi HN,

    I recently spent ~6 hours debugging a multi-agent workflow.

    Every time I fixed a bug, I had to re-run the entire pipeline — repeating LLM calls, hitting APIs again, and waiting minutes just to test one change.

    So I built Flight Recorder.

    It records agent execution, shows exactly what failed, and lets you replay from that point — skipping everything that already worked.

    Demo (30 sec): https://github.com/whitepaper27/Flight-Recorder/blob/main/de...

    Example:

        @fr.register("crm_lookup")
        def crm_lookup(email):
            return db.query(email)  # slow API call
    
        @fr.register("scorer") 
        def score_lead(contact):
            assert contact is not None  # bug here
    
        fr.run(pipeline, "unknown@example.com")
    
        # later:
        $ flight-recorder debug last
        root cause: crm_lookup returned None
    
        # fix the bug, then:
        $ flight-recorder replay last
        # crm_lookup is cached (not re-run)
        # scorer runs with your fix
        # SUCCESS in seconds
    
    The key idea: Fix one step. Don’t rerun everything.

    It’s open source (MIT) and still early — I’d really appreciate feedback from anyone building agent workflows.

    GitHub: https://github.com/whitepaper27/Flight-Recorder