The pattern we landed on: agent runs in a background worker, emits events to Redis Streams, frontend consumes via SSE. Multiple clients can listen to the same stream. If the user navigates away or hits stop, we delete a key and the agent aborts gracefully at the next node boundary.
pydantic-ai gives you Agent.iter() and streaming primitives, but wiring this up - structured events, reconnection, history across turns - is a lot of glue.
This library wraps iter() and handles the lifecycle:
begin → [llm-begin → part-deltas → llm-end]+ → end
Each event is typed: text delta, tool call with args, tool return, error. Frontend gets structured JSON it can render directly.
It's thin (~400 LOC) and doesn't patch pydantic-ai internals. You bring your own session storage by implementing load/save.
Feedback welcome, especially on the event protocol.
The current implementation is deliberately thin: the stream operations are just xadd/xread/set/get/delete. Abstracting that into a protocol wouldn't be hard, and a model based on stream-per-session, durable and serverless fits the use case well.
Redis was the pragmatic choice for v0 since most teams already have it running and the latency is good for real-time streaming to frontends. But durability is a valid concern, especially if the agent run matters (billing, compliance, debugging), you want it persisted properly, not just in Redis memory.
If S2's self-hostable OSS version lands soon, that'd lower the barrier for people to try it.
Would love to hear if there are other backend preferences out there !