Show HN: Headroom (OSS): Cuts LLM costs by 85%(github.com)

3 pointsby chopratejas21 days ago4 comments

Some results from real world data so far:

  ┌─────────────────┬─────────────┬──────────────────────────────┐
  │    Data Type    │ Compression │             Why              │
  ├─────────────────┼─────────────┼──────────────────────────────┤
  │ Server logs     │ 90%+        │ Highly repetitive patterns   │
  ├─────────────────┼─────────────┼──────────────────────────────┤
  │ MCP tool output │ 70%+        │ JSON structure overhead      │
  ├─────────────────┼─────────────┼──────────────────────────────┤
  │ Database rows   │ 50-70%      │ Same schema, many records    │
  ├─────────────────┼─────────────┼──────────────────────────────┤
  │ File trees      │ 40-50%      │ Repeated metadata            │
  ├─────────────────┼─────────────┼──────────────────────────────┤
  │ Code diffs      │ 0%          │ Every line unique            │
  ├─────────────────┼─────────────┼──────────────────────────────┤
  │ Dense prose     │ -0.3%       │ No patterns, slight overhead │
  ├─────────────────┼─────────────┼──────────────────────────────┤
  │ Encrypted       │ 0%          │ Incompressible               │
  └─────────────────┴─────────────┴──────────────────────────────┘

chopratejas21 days ago
What is it?
- Context Compression (with Reversibility - this part is the difference) for LLMs
- very different than any compression or summarization tools that promise cost savings and speed!
- claude code costs / cursor costs - reduced by 50-60%
- ideal for startups and Enterprises!!
- integration with LangChain
- Memory as a first class citizen
- its OSS! So Free!
Give it a try, Its OSS - if you love it, star it. If you don't, lets make it better, together!
goeb118 days ago
Seems very useful. I tried it on my Claude code and it was saving approximately 50% Do you know how I can push it to save more? Do you also have plans to make it Enterprise ready?
niux21 days ago
Not a single example of what it does or how it works
- chopratejas20 days ago
  Fair enough. Trying to keep it concise here - This is how you install it:
  pip install "headroom-ai[proxy]"
  headroom proxy --port 8787
  It will:
  * Check all the data going into the LLM and apply intelligent compression based on the content type - different for JSONs, code etc.
  * If the LLM is not getting what it is seeking, there is reversible compression - so the LLM will not lose accuracy
  * When you think of MCP tools, code function calls etc. that fill up the context window and cause needle in haystack problems - they get eliminated.
  There is also an SDK which works like this:
  from langchain_openai import ChatOpenAI from headroom.integrations import HeadroomChatModel
  # Wrap your model - that's it!
  llm = HeadroomChatModel(ChatOpenAI(model="gpt-4o"))
  # Use exactly like before response = llm.invoke("Hello!")
  Ive personally used it with Claude Code and Cursor and seen the benefits.