So I built a monkey-patching layer that intercepts LLM calls and runs them through guardrails:
import aegis; aegis.init()
Patches whatever frameworks you have installed. ~2.6ms overhead.
The nastiest find: streaming responses skip middleware entirely. Content leaks before any check runs. I wrote a streaming engine that auto-selects between windowed scanning and full buffering depending on what the guardrail needs — PII like "078-05-1120" can split across chunks, so regex won't catch it without the full buffer.
Context: https://github.com/langchain-ai/langchain/issues/35011 Source: https://github.com/Acacian/aegis