The server uses mem0ai as a library, Qdrant for vector storage, and Ollama for local embeddings (bge-m3, 1024 dims). Optional Neo4j adds a knowledge graph. Everything including the LLM fact extraction step can run locally.
A few engineering decisions worth mentioning:
- Zero-config auth: reads your existing OAT token from ~/.claude/.credentials.json instead of requiring a separate API key. 3-tier fallback chain (env var → credentials file → API key).
- If you enable the graph layer, each add_memory triggers 3 extra LLM calls. To avoid burning your Claude subscription quota on entity extraction, you can route those to a local Ollama model (Qwen3:14b at Q4_K_M gets 0.971 tool-calling F1).
I patched mem0 upstream to support OAT token reuse (PR #4035). Their official MCP server is cloud-only, which is why I built this local version.
Happy to discuss the architecture or tradeoffs.
Full writeup with setup guide: https://dev.to/n3rdh4ck3r/how-to-give-claude-code-persistent...