How it works: SQLite-backed knowledge graph with scoped entities, relations, and slots. An MCP server exposes 9 tools so the agent proactively saves facts as you work. At session start, an LLM summarizes the relevant facts into a structured briefing instead of dumping raw data.
What makes it different from a context file: Scope chains with inheritance (same person, different roles per project), bitemporal history (old facts archived, not deleted), and AI briefings that scale beyond what you'd maintain by hand.
Where I need help:
If you use Cursor, Windsurf, or Cline — try the MCP config, tell me what breaks
PRs for other LLM backends (Ollama, local models) welcome
pip install 'agent-recall[mcp]'
Instead of query → rank → top-k, it loads all entities/slots/observations within the agent's scope chain at session start, then an LLM summarizes them into a structured briefing. Priority is scope relevance (your project > your org > global) and data type (people and active tasks first, historical logs last), with a token budget that truncates lower-priority sections.
For in-session recall, there's search_nodes — keyword matching, not embeddings. Less powerful but perfectly adequate for structured facts like "who works on project X" or "what did we decide about auth."
Cold start: first session has no briefing, but the package auto-discovers project files (CLAUDE.md, README.md) and includes them in context, so the agent isn't completely blind. The MCP tools come with proactive-saving instructions, so memory builds organically. After 2-3 sessions the briefing is already useful.
The tradeoff is explicit: optimized for structured scoped facts (people, decisions, roles), not fuzzy semantic recall. For a coding agent that needs "Alice is the lead on project X, we decided to use REST" — keyword search + scope filtering works. For "find me something vaguely related to that auth discussion" — you'd want embeddings, and that's not what this does.