1 pointby wattsonme3 hours ago1 comment
  • wattsonme3 hours ago
    Hey HN – I built TokenShrink, an npm package that compresses AI prompts to save tokens. Zero dependencies, runs locally in <1ms.

      After posting v1.0, r/LocalLLaMA tore it apart. They were right.
    
      The problem: v1.0 estimated tokens as `words × 1.3`. But BPE tokenizers don't work that way. "database" is 1 token.
      "db" is also 1 token. That replacement saves exactly nothing. Worse — "should" → "shd" goes from 1 token to 2. We were
       making prompts MORE expensive.
    
      What v2.0 does differently:
    
      - Precomputed every dictionary entry against cl100k_base (GPT-4's tokenizer)
      - Removed 130 entries that saved zero tokens
      - Removed 45 entries that actually increased token count
      - Replaced the word heuristic with a real token cost lookup table
      - Added pluggable tokenizer support: `compress(text, { tokenizer })`
    
      What it still does well — phrase compression. "In order to" → "to" saves 2 tokens. "Due to the fact that" → "because"
      saves 4. "It is important to" → removed entirely. These multi-word filler phrases are where the real savings are.
    
      Benchmarks (verified with gpt-tokenizer):
    
        Verbose dev prompt:    408 → 349 tokens (14.5%)
        Code review prompt:    210 → 183 tokens (12.9%)
        Medical notes:         151 → 134 tokens (11.3%)
        Business requirements: 143 → 121 tokens (15.4%)
        Minimal filler:         77 →  77 tokens (0.0%)
    
      No prompt had its token count increase. Zero false savings.
    
      npm: npm install tokenshrink
      Web: https://tokenshrink.com
      GitHub: https://github.com/chatde/tokenshrink