101 pointsby Iamkkdasari743 hours ago1 comment
  • Iamkkdasari743 hours ago
    We built an open-source token compression engine for LLM API calls. It runs a 14-stage pipeline that detects content type (code, JSON, logs, diffs) and applies specialized compression for each. No ML models needed, runs purely on heuristics.

    Numbers from our benchmarks: 54% average reduction across mixed content, 82% on JSON payloads, 25% on source code. ROUGE-L 0.653 at aggressive settings.

    We use it as middleware in our agent gateway to compress tool traces and system prompts before they hit the API. Cuts our weekly spend roughly in half.

    Compression is reversible, original content goes into a hash-addressed cache so the LLM can request uncompressed sections via tool calls if needed.

    Zero required dependencies. Optional tiktoken and tree-sitter for better results.

    Would love feedback, especially from anyone running agents at scale.