1 pointby base768 hours ago2 comments
  • base768 hours ago
    I built a two-stage prompt compressor that runs entirely locally before your prompt hits any frontier model API.

      How it works:
      1. llama3.2:1b (via Ollama) compresses the prompt to its semantic minimum
      2. nomic-embed-text validates that the compressed version preserves the original meaning (cosine ≥ 0.85)
      3. If validation fails → original is returned unchanged. No silent corruption.
    
      When it actually helps:
      The effect is meaningful only on longer inputs. Short prompts are skipped entirely — no cost, no risk.
    
      ┌─────────────────────────────────┬────────────┬────────┐
      │              Input              │   Tokens   │ Saving │
      ├─────────────────────────────────┼────────────┼────────┤
      │ < 80 tokens                     │ skipped    │ 0%     │
      ├─────────────────────────────────┼────────────┼────────┤
      │ Academic abstract (207t)        │ 207 → 78   │ 62%    │
      ├─────────────────────────────────┼────────────┼────────┤
      │ Structured research doc (1116t) │ 1116 → 275 │ 75%    │
      ├─────────────────────────────────┼────────────┼────────┤
      │ Short command (4t)              │ skipped    │ 0%     │
      └─────────────────────────────────┴────────────┴────────┘
    
      If you're sending short one-liners, this won't help. If you're injecting long context, research text, or system prompts — it pays off from the first call.
    
      Known limitation:
      Cosine similarity is blind to negation. "way smaller" vs "way larger" scores 0.985. The LLM stage handles this by explicitly preserving negations and conditionals, but it's an open
      research question — tracked in issue #1.
    
      Install as MCP (Claude Code):
      {
        "mcpServers": {
          "token-compressor": {
            "command": "python3",
            "args": ["/path/to/token-compressor/mcp_server.py"]
          }
        }
      }
    
      Requires: Ollama + llama3.2:1b + nomic-embed-text
    
      Repo: https://github.com/base76-research-lab/token-compressor-
  • base767 hours ago
    would love to hear what you say abot it