3 pointsby RikardoB4 hours ago1 comment
  • RikardoB4 hours ago
    Hi HN, I'm Ricardo from Colombia.

    I got frustrated trying to run local RAG pipelines on standard hardware. Booting up traditional vector databases usually meant sacrificing gigabytes of RAM before even ingesting the first vector. I needed something that could run on the Edge (specifically, for a local Point of Sale system) without choking the OS.

    So, I built DeraineDB. It's a hyper-optimized, embedded vector engine designed for extreme low footprint and sub-millisecond latencies.

    Some architectural decisions I’d love your feedback on:

    - Zig Core + Go Orchestrator: I used Zig for the bare-metal memory-mapped storage (mmap) and SIMD math, and Go for the gRPC network orchestration.

    - The "Zero-Copy" Bridge: Go's Garbage Collector was initially killing my latency when passing 1536-dimensional arrays to C/Zig. I fixed this by using unsafe.Pointer to map Protobuf slices directly into Zig's memory space. Zero copies, massive latency drop.

    - HNSW Graph Segregation: I completely separated the payload (.drb files with strict cache-line alignment) from the HNSW navigation graph (.dridx files). This solved a nasty buffer overflow I was fighting and made the $O(\log N)$ search rock solid.

    - Bitwise Metadata Filtering: Instead of JSON parsing, I implemented a 64-bit metadata_mask evaluated via Bitwise AND directly inside the HNSW Greedy Routing loop.

    The result: It handles 1536D vectors in a 33MB Docker image, consumes ~21MB of RAM under load, and hits 0.89ms for HNSW warm searches.

    The repo includes the Python benchmark and a local True RAG demo using Llama 3.2.

    I would absolutely love it if you guys could audit the architecture, review the CGO/Zig bridge, and roast my code. Let me know where I can improve!