Anchor Engine – deterministic semantic memory for LLMs, <1GB RAM runs on a phone(github.com)

1 pointby BERTmackliin6 hours ago3 comments

silentsvn5 hours ago
The inspectability angle is genuinely useful, being able to trace exactly why something was retrieved is something vector search can't offer, and the tag-receipt approach is clean for structured knowledge.
One thing I'm trying to understand: the README calls this "semantic" retrieval, but looking at the Unified Field Equation in the whitepaper, the core scoring is tag intersection with temporal decay: W(q,a) = (shared tags) × γ^(graph distance) × (recency). That's weighted keyword matching, which is deterministic precisely because it's lexical, not semantic.
The vector.ts also has MockSoulIndex as a no-op stub with a note saying dense vector search is "optional augmentation" that's currently disabled so no embeddings are running in practice.
I've been building in this space with hand-written TypeScript (no AI codegen) and the line between "semantic" and "keyword" matters a lot to users. If someone stores "the JWT conversation" they won't find it by querying "authentication."
Is the tag extraction smart enough to bridge that, or is explicit tagging on the user to handle?
BERTmackliin5 hours ago
@silentsvn - thank you for reading carefully enough to ask this. You're correct that the core scoring is tag‑based and deterministic, which is lexical, not "semantic" in the modern embedding sense. The terminology is worth unpacking.
We call it "semantic" in the broader sense of meaning‑bearing structure—the graph encodes relationships between concepts, and retrieval walks those relationships. But you're correct that at query time, it's matching on tags, not vector similarity.
Why not embeddings? We made a deliberate trade‑off: determinism and explainability over fuzziness. With vector search, you get a black‑box similarity score and no way to debug why something was retrieved. With tag‑based traversal, you can trace the exact path: "This result matched because it shares tags X, Y, Z and is within 2 hops of your query." That matters for agentic workflows where auditability is critical.
Tag extraction is where we do the work to bridge the lexical gap. The atomization pipeline uses: - Wink NLP for entity recognition and part‑of‑speech filtering (so "authentication" and "JWT" both get tagged with relevant concepts if they appear in context). - Co‑occurrence windows to infer relationships (e.g., if "JWT" and "authentication" repeatedly appear near each other, they get linked in the graph). - Synonym expansion (via Standard 111) so queries for "authentication" can surface nodes tagged with "JWT" if the system has learned that relationship from your corpus.
It's not magic - if you never mention "JWT" in the same context as "authentication," the graph won't connect them. But that's a feature, not a bug: the system reflects your actual usage, not a statistical average of the internet.
The trade‑off is real: you give up the fuzzy "close enough" retrieval of vectors in exchange for perfect traceability and no embedding drift. For many use cases (project memory, execution traces, personal knowledge bases), that's the right call.
I'd love to hear more about what you're building in this space. Always good to find others thinking about these trade‑offs.
- silentsvn4 hours ago
  Thanks for the response
  The determinism trade-off is genuinely interesting — auditability over fuzziness is a real design philosophy, not just a limitation.
  We've been building something that tries to avoid forcing that choice. Engram uses three strategies in parallel: vector embeddings (nomic-embed-text via Ollama, local-first), BM25 keyword, and temporal recency — merged with Reciprocal Rank Fusion. Each result comes back with an explicit similarity score and the tier it came from (working memory / long-term / archived), so the retrieval path is still traceable even when it's fuzzy.
  We also layer on a graph component similar to yours — entity-relationship extraction that augments top results with connected context. The difference is that graph is additive on top of embedding retrieval rather than the primary mechanism.
  The place your approach wins clearly is corpus-specific precision. If the graph is built from your actual usage (your JWT/authentication example), tag traversal will reliably surface relationships that vectors would miss or dilute with internet priors. That's a real advantage for execution traces and project memory.
  Still working through the right defaults for consolidation (when to summarize old working memories vs keep them granular). Curious whether you've thought about memory aging in your model.
  Repo if curious: github.com/Cartisien/engram (http://github.com/Cartisien/engram)
BERTmackliin6 hours ago
I built Anchor because I kept hitting the same wall: local LLMs are great, but every conversation is a fresh start. Vector search is the default hammer, but for structured memory—project decisions, entity relationships, temporal facts—it's often the wrong tool.
Live demo (in-browser, no setup): https://rsbalchii.github.io/anchor-engine-node/demo/index.ht...
Search Moby Dick or Frankenstein and see the tag-based receipts that show why each result matched.
How it works Anchor uses graph traversal (the STAR algorithm) instead of embeddings. Concepts become nodes, relationships become edges. The database stores only pointers (file paths + byte offsets); content stays on disk, so the index is small and rebuildable. PGlite (PostgreSQL in WASM) lets it run anywhere Node.js does – including a Pixel 7 in Termux, with <1GB RAM.
Performance - <200ms p95 search on a 28M-token corpus - <1GB RAM – runs on a $200 mini PC, a Raspberry Pi, or a phone - Pure JS/TS, compiled to WASM, no cloud dependencies
What’s new in v4.6 - distill: lossless compression of your corpus into a single deduplicated YAML file. I tested it on 8 months of my own chat logs: 2336 → 1268 unique lines, 1.84:1 compression, 5 minutes on a Pixel 7. - MCP server (v4.7.0) – exposes search and distillation to any MCP client (Claude Code, Cursor, Qwen tools) - Adaptive concurrency – automatic switching between sequential (mobile) and parallel (desktop) processing
The recursion I used Anchor to build itself. Every bug fix and design decision is in the graph – that's how I kept the complexity manageable.
Where it fits If you're building local agents, personal knowledge bases, or mobile assistants and want memory that's inspectable, deterministic, and lightweight – this is for you.
GitHub repo: https://github.com/RSBalchII/anchor-engine-node
Whitepaper: https://github.com/RSBalchII/anchor-engine-node/blob/main/do...
Happy to answer questions about the algorithm, the recursion, or the mobile optimizations.