The core idea: every AI memory system I looked at (Mem0, Zep, Cognee, Letta/MemGPT) sends your text to an LLM for extraction and classification. That's fine for small scale, but at 100K memories/month you're looking at $1,000-3,000 in API calls just for the memory layer.
Mnemosyne does it differently — a deterministic 12-step pipeline using local embeddings (nomic-embed-text via Ollama) + rule-based classifiers. Same input always produces same output. No stochastic variance, no API bills.
The 5-layer architecture: - L1: Qdrant vectors + 2-tier cache (in-memory + Redis) + FalkorDB knowledge graph - L2: 12-step ingestion pipeline (security filter, dedup, entity extraction, 7-type taxonomy, priority scoring) - L3: Temporal entity graph with auto-linking and path finding - L4: Activation decay, 5-signal scoring, diversity reranking - L5: Reinforcement learning, autonomous consolidation, flash reasoning, Theory of Mind for multi-agent
Running in production: 13,000+ memories across a 10-agent mesh, sub-200ms retrieval, <50ms ingestion, >60% cache hit rate.
Tech: TypeScript, Qdrant, Ollama (nomic-embed-text), optional Redis/FalkorDB/MongoDB. MIT licensed.
npm: `npm install mnemosy-ai` Website: https://mnemosy.ai Discord: https://discord.gg/Sp6ZXD3X