1 pointby elvismdev3 hours ago1 comment
  • elvismdev3 hours ago
    Author here. I built this because Claude Code's built-in memory (CLAUDE.md, Auto Memory, Session Memory) works well for static rules and compressed summaries, but loses the detailed reasoning behind decisions between sessions.

    The server uses mem0ai as a library, Qdrant for vector storage, and Ollama for local embeddings (bge-m3, 1024 dims). Optional Neo4j adds a knowledge graph. Everything including the LLM fact extraction step can run locally.

    A few engineering decisions worth mentioning:

    - Zero-config auth: reads your existing OAT token from ~/.claude/.credentials.json instead of requiring a separate API key. 3-tier fallback chain (env var → credentials file → API key).

    - If you enable the graph layer, each add_memory triggers 3 extra LLM calls. To avoid burning your Claude subscription quota on entity extraction, you can route those to a local Ollama model (Qwen3:14b at Q4_K_M gets 0.971 tool-calling F1).

    I patched mem0 upstream to support OAT token reuse (PR #4035). Their official MCP server is cloud-only, which is why I built this local version.

    Happy to discuss the architecture or tradeoffs.

    Full writeup with setup guide: https://dev.to/n3rdh4ck3r/how-to-give-claude-code-persistent...