Two orders of magnitude faster Persistent AI memory via a binary lattice(github.com)

1 pointby JosephjackJR5 hours ago2 comments

JosephjackJR5 hours ago
Retrieval performance is becoming a silent bottleneck for local first AI agents. While context windows are expanding, the latency involved in querying traditional cloud vector databases still sits in the 10ms to 50ms range due to network hops and pointer heavy graph structures.
I have spent the last few months building SYNRIX to see if we could reach sub microsecond retrieval by being extremely opinionated about hardware. Instead of a flexible graph, the engine uses a binary lattice—a rigid structure that relies on arithmetic addressing instead of chasing pointers.
This architectural rigidity leads to several unique properties:
Query time scales with the number of results you want rather than the total size of your database. We have validated this at 50 million nodes running smoothly on a standard 8GB RAM machine using memory mapped storage to scale beyond physical memory.
Because it runs entirely on your own hardware, there are no per query fees or subscription costs. This makes it a viable local first alternative for high volume applications that would otherwise face six figure cloud bills at scale.
The system is built for production reliability with ACID style guarantees. It uses a Write Ahead Log and deterministic recovery to ensure 100% success in surviving restarts and crashes without data loss.
The engine is designed for cache line alignment and CPU prefetching. This approach ensures the software works with hardware realities to maintain sub microsecond hot path retrieval even as the memory substrate grows.
We have built compatibility layers for LangChain and Qdrant so it can act as a drop in replacement for existing stacks. The project has already seen about 40 clones since yesterday, so the need for low latency, offline first memory seems to be hitting a nerve. I am curious to hear from others working on high frequency agent queries—is retrieval latency currently a bottleneck for your workflows, or are you more concerned with inference time?
JosephjackJR5 hours ago
If anyone is working on something similar please drop link below so i can check it out!