1 pointby lawouach6 hours ago1 comment
  • lawouach6 hours ago
    I spent the last few months analyzing marathon coding sessions (300 to 1,200 turns) with agents like Claude Code and OpenCode.

    I wanted to solve a simple performance problem: replaying these sessions for memory indexing was too slow and expensive because I extracted information using LLM on each turn.

    I ran into two learnings:

    1. You can’t just skip "noise" turns for search. Raw text embeddings crushed "smart" LLM-extracted summaries (88.5% vs 73.9% recall). The signal is distributed across every turn.

    2. Frustration is a lagging indicator. Failure wouldn't happen at turn 699; it would start at turn 400 (on one of my dataset). Basically, by the time you snap, the agent has been drifting for hundreds of turns. Makes sense I suppose, you're happy until you're not.

    I ended up building a "trajectory regulator" that measures structural instability (logic churn, symbol repetition) in real-time to intervene before the "death spiral" kicks in. Essentially, I wanted to reduce the number of times I'd grow frustrated against the agent.

    I wrote up the data and the mental model I used to build the controller. Curious if others have measured similar patterns in long-running agent sessions.