2 pointsby bozbuilds6 hours ago2 comments
  • bozbuilds6 hours ago
    Aingram was created to be a local-first memory layer for AI agents/loops. The original intent was for agent memory to use with Karpathy's autoresearch loop, but it's ended up being very capable in general. It stores structured reasoning that will persist across your sessions, which allows separate agents to coordinate tasks, as well as piggy-back off of each other's research loops.

    There's just one SQLite file for storage. There's no server, no API, no cloud dependencies. The retrieval mechanism uses a few different systems: FTS5 for keyword matching, sqlite-vec for semantic similarity using nomic-embed-text-v1.5, a QJL two-pass vector compression layer which helps to keep latency manageable for larger databases, and knowledge graph traversal (recursive CTEs) if you want entity-linked results. Those signals are processed via reciprocal ranked fusion, re-ranked, and then give the search results. Everything in the pipeline runs locally.

    Benchmarked with LongMemEval, the oracle split gave a recall@3 of 1.000. On LongMemEval S split, recall@10 was 0.955. Median retrieval latency was 22ms/27ms on oracle/s splits respectively, on a laptop with an RTX 4060. The library has an Apache 2.0 license. There's also a 'Pro' tier in development with added optimization features. Let me know what you think!

    pip install aingram

    GitHub: [https://github.com/bozbuilds/AIngram] | [https://aingram.dev]

    • 6 hours ago
      undefined
  • bozbuilds6 hours ago
    A bit of background on why I built this, rather than extending an existing library:

    Mem0, Letta, Zep, etc, all target memory as 'personalization' or 'personal context' — they store user preferences, conversation history, individual agent state. None of them use reasoning structure, or are designed for multi-agent memory sharing. The problem I tried to solve was that my agent loops know what changed, but not why it worked or failed.

    The retrieval architecture ended up being an interesting engineering problem. Three signals (FTS5, QJL-accelerated sqlite-vec cosine similarity, knowledge graph traversal) are each fast independently, but are for different things. FTS5 is fast and exact. Vector search handles semantic similarity. The knowledge graph surfaces entries connected to entities, even when there's no semantic overlap. RRF fuses the three ranked lists. This hybrid consistently outperforms any single signal on recall quality.

    Happy to go deep on any of the architectural decisions.