Hey HN – I built TokenShrink, an npm package that compresses AI prompts to save tokens. Zero dependencies, runs
locally in <1ms.
After posting v1.0, r/LocalLLaMA tore it apart. They were right.
The problem: v1.0 estimated tokens as `words × 1.3`. But BPE tokenizers don't work that way. "database" is 1 token.
"db" is also 1 token. That replacement saves exactly nothing. Worse — "should" → "shd" goes from 1 token to 2. We were
making prompts MORE expensive.
What v2.0 does differently:
- Precomputed every dictionary entry against cl100k_base (GPT-4's tokenizer)
- Removed 130 entries that saved zero tokens
- Removed 45 entries that actually increased token count
- Replaced the word heuristic with a real token cost lookup table
- Added pluggable tokenizer support: `compress(text, { tokenizer })`
What it still does well — phrase compression. "In order to" → "to" saves 2 tokens. "Due to the fact that" → "because"
saves 4. "It is important to" → removed entirely. These multi-word filler phrases are where the real savings are.
Benchmarks (verified with gpt-tokenizer):
Verbose dev prompt: 408 → 349 tokens (14.5%)
Code review prompt: 210 → 183 tokens (12.9%)
Medical notes: 151 → 134 tokens (11.3%)
Business requirements: 143 → 121 tokens (15.4%)
Minimal filler: 77 → 77 tokens (0.0%)
No prompt had its token count increase. Zero false savings.
npm: npm install tokenshrink
Web: https://tokenshrink.com
GitHub: https://github.com/chatde/tokenshrink