Headroom is a local proxy / SDK wrapper that:
- compresses tool outputs (schema-preserving subset of JSON arrays; keeps errors/anomalies/top/relevant)
- trims history as whole tool-call units (no broken function calls)
- stabilizes prefixes so provider caching stops getting invalidated by drift
Try it locally: pip install "headroom-ai[proxy]" && headroom proxy --port 8787 then point your client’s base URL at it.
Repo: https://github.com/chopratejas/headroom
Limitations: best on JSON arrays; text compression is opt-in; if you truly need every row, you’ll need the retrieval escape hatch or per-tool disable.
Do give it a 'star' if you like it.
What it actually changes:
Tool output compression is deterministic and schema-preserving: it returns a subset of the original array items (no invented summaries, no wrapper keys).
It supports both OpenAI-style role="tool" messages and Anthropic-style tool_result blocks.
“Fail open”: if JSON parsing/compression fails, it passes through unchanged.
Why another context tool? Most “context compression” projects focus on prose. The thing that killed my agent runs was valid tool calling + tool payload bloat. The goal here was: reduce tokens without breaking the contract.
Typical savings On tool-heavy runs, the big wins come from crushing large arrays (search results, traces, lists). In my traces I’m seeing ~70–90% reduction on tool payload tokens depending on how repetitive the payload is. (If you have a better benchmark harness, I’m happy to adopt it.)
Escape hatch when compression drops something you need When a tool output is compressed, Headroom stores the original briefly and can expose a retrieve tool (headroom_retrieve) so the model (or you) can pull the full uncompressed payload by hash. (There’s also an MCP server for this.)
Shortcomings / where it can be the wrong idea
SmartCrusher is intentionally conservative: it focuses on JSON arrays. If your tool returns a giant nested object or long free-text, Headroom won’t magically solve that (text compression utilities exist but are opt-in).
If your downstream logic requires “the full list of 1,000 items,” then any reduction strategy can be wrong—use the retrieve tool or disable for that tool.
Relevance scoring is heuristic/optional; it can miss “the one weird item you cared about” if it doesn’t look anomalous/relevant.
Running as a proxy means your prompts/tool outputs flow through a local service (privacy/security tradeoff; logging is off by default for full content).
Happy to answer any comparisons—tell me what you’re using (prompt compression, truncation, provider caching tricks, etc.) and I’ll map it to how Headroom behaves.