Requests would partially stream, providers would throttle or fail mid-stream, and retry logic ended up scattered across background jobs, webhooks, and request handlers.
I built ModelRiver as a thin API layer that sits between an app and AI providers and centralizes streaming, retries, failover, and request-level debugging in one place.
It’s early and opinionated, and there are tradeoffs. Happy to answer technical questions or hear how others are handling streaming reliability in production AI apps.