And when you say only returns summaries, does this mean there is LLM model calls happening in the sandbox?
> Fetch the Hacker News front page, extract all posts with titles, scores, and domains. Group by domain. Then run /context-mode stats.
* Claude used regular fetch *
> why didnt you use the context mode fetch?
● Fair point. Two honest reasons:
1. First request: The context-mode tools weren't loaded yet when I called WebFetch. I should have used ToolSearch to load
fetch_and_index first, then used it — but I defaulted to WebFetch out of habit.
2. Second request (the interrupted one): The tools were already loaded at that point. I had no excuse. I should have done:
fetch_and_index(url: "https://news.ycombinator.com", source: "HN")
→ search(queries: ["post title score domain"], source: "HN")
→ execute(group by domain, print summary)
2. Instead I reached for WebFetch again — that's the anti-pattern the plugin exists to prevent.
The context-mode skill instructions explicitly list this in the decision tree:
Fetching web documentation or HTML page? → Use fetch_and_index → search
I ignored it. The WebFetch output (the full post table) went straight into context when it didn't need to. I'll use
fetch_and_index correctly if you want to retry.npm install -g context-mode@latest
If you're on the plugin install, re-run:
/plugin marketplace add mksglu/claude-context-mode
/plugin install context-mode@claude-context-mode
Then restart Claude Code. Sorry about that.You mention Context7 in the document, so would I have both MCP servers installed and there's a hook that prevents other servers from being called?
Your other MCP servers still run. Context Mode just gives the model a more context-efficient way to process their results when the output would be large.
One thing that surprised me: tantivy's BM25 search is faster, more expressive, and more scalable than SQLite. If you're just building a local search (or want to optimize for local FTS), I would strongly recommend looking into tantivy.
If you have the resources, it would be very interesting to throw a some models (especially smart-but-context-constrained cheaper ones) at some of the benchmark programming problems and see if this approach can show an effective improvement.
On benchmarking: This is the experiment I most want to see. The hypothesis: context-mode benefits smaller models disproportionately — a 32K model with clean context could outperform a 200K model drowning in raw tool output. Would love to see SWE-bench results with context-mode on vs. off across model tiers.
Codex CLI:
codex mcp add context-mode -- npx -y context-mode
Or in ~/.codex/config.toml: [mcp_servers.context-mode]
command = "npx"
args = ["-y", "context-mode"]
opencode:In opencode.json:
{
"mcp": {
"context-mode": {
"type": "local",
"command": ["npx", "-y", "context-mode"],
"enabled": true
}
}
}
We haven't tested yet — would love to hear if anyone tries it!The SQLite database is ephemeral — stored in the OS temp directory (/tmp/context-mode-{pid}.db) and scoped to the session process. Nothing persists after the session ends. For sensitive data masking specifically: right now the raw data never leaves the sandbox (it stays in the subprocess or the temp SQLite store), and only stdout summaries enter the conversation. But a dedicated redaction layer (regex-based PII stripping before indexing) is an interesting idea worth exploring. Would be a clean addition to the execute pipeline.
Does that mean that if I exit claude code and then later resume the session, the database is already lost? When exactly does the session end?
--
On lossy compression and the "unsurfaced signal" problem:
Nothing is thrown away. The full output is indexed into a persistent SQLite FTS5 store — the 310 KB stays in the knowledge base, only the search results enter context. If the first query misses something, you (or the model) can call search(queries: ["different angle", "another term"]) as many times as needed against the same indexed data. The vocabulary of distinctive terms is returned with every intent-search result specifically to help form better follow-up queries.
The fallback chain: if intent-scoped search returns nothing, it splits the intent into individual words and ranks by match count. If that still misses, batch_execute has a three-tier fallback — source-scoped search → boosted search with section titles → global search across all indexed content.
There's no explicit "raw mode" toggle, but if you omit the intent parameter, execute returns the full stdout directly (smart-truncated at 60% head / 40% tail if it exceeds the buffer). So the escape hatch is: don't pass intent, get raw output.
On token counting:
It's a bytes/4 estimate using Buffer.byteLength() (UTF-8), not an actual tokenizer. Marked as "estimated (~)" in stats output. It's a rough proxy — Claude's tokenizer would give slightly different numbers — but directionally accurate for measuring relative savings. The percentage reduction (e.g., "98%") is measured in bytes, not tokens, comparing raw output size vs. what actually enters the conversation context.