Hi HN, I built AIR because my own AI agents went off the rails.
I run an e-commerce store and deployed agents to handle customer communications. They started sending wrong information, making promises we couldn't keep, and handling complaints badly. When I tried to figure out what happened, I had scattered logs across services with no way to prove what the agents actually said.
AIR is a flight recorder for AI systems. It's an OpenAI-compatible reverse proxy (written in Go) that sits between your code and your LLM provider. Every prompt, completion, and tool call gets recorded with HMAC-SHA256 tamper-evident audit chains — modify one record and the chain breaks.
What makes this different from Langfuse/Helicone/LangSmith: accountability not just observability (tamper-evident chains, not mutable logs), content stays on your infra (prompts/completions go to your S3/MinIO), compliance built-in (22 auto-evaluated controls mapped to SOC 2 and ISO 27001), and deterministic replay to reproduce any AI decision.
Python SDK integrates with OpenAI, LangChain, and CrewAI in 3 lines: pip install air-blackbox-sdk
Everything is Apache-2.0. CI runs on every push. 200+ tests across all repos.
I'm not a traditional dev — I'm a store owner who got frustrated enough with AI accountability gaps to build the tooling myself. Happy to answer questions about the architecture, the compliance mapping, or what it's like building infrastructure tools as a non-traditional developer.
Live demo:
https://nostalgicskinco.github.io/air-blackbox-gateway/air-d...
PyPI:
https://pypi.org/project/air-blackbox-sdk/