2 pointsby el_dockerr3 hours ago1 comment
  • el_dockerr3 hours ago
    Hi HN,

    I wrote a C++ library that implements transparent memory compression in user-space. Up to 100% if you use it right ^^

    The core idea is to catch memory access violations (using AddVectoredExceptionHandler on Windows or userfaultfd/Signals on Linux) to implement a custom paging mechanism. Instead of swapping to disk, it compresses cold pages using LZ4 and stores them in a reserved heap area.

    How it works:

    It allocates virtual memory but sets protections to PAGE_NOACCESS.

    When the app tries to access the memory, the library catches the CPU trap.

    It allocates physical RAM, decompresses the data (if it existed), and resumes execution.

    An LRU strategy freezes cold pages back to the compressed store when a limit is reached.

    Why? Aside from the technical challenge, my main use case is embedded/IoT systems (like Raspberry Pi). Swapping to SD cards kills them quickly due to write wear. By compressing in RAM, we can extend the lifespan of the hardware and prevent OOM kills in constrained environments without touching the kernel. And you can set it up to store memory right on the IO of the Harddrive instead. So you have a application that make use of no ram (mostly).

    In future it will add an optional AES-128 encryption layer for "ephemeral security" (data is encrypted while cold).

    It's a PoC / Alpha right now. I'd love to hear your thoughts on the implementation or potential edge cases with specific C++ STL containers.

    Link to code:https://github.com/el-dockerr/ghostmem

    • DenisDolya2 hours ago
      Nice project! One question: decompression and page-fault handling also add latency. How do you avoid thrashing in practice? Also, for such low-level memory management, why C++ instead of C? C might give more predictable control without hidden runtime behavior.
      • el_dockerran hour ago
        Thanks for the feedback! You hit the nail on the head regarding the trade-offs.

        1. Latency & Thrashing: You are absolutely right, there is overhead (context switch + LZ4). The intended use case isn't high-frequency access to hot data, but rather increasing density for "warm/cold" data in memory-constrained environments (like embedded/IoT) where the alternative would be an OOM kill or swapping to slow flash storage.

        To mitigate thrashing, I'm using a configurable LRU (Least Recently Used) strategy. If the working set fits within Physical Limit + Compression Ratio, it works smoothly. If the active working set exceeds physical RAM, it will indeed thrash—just like OS paging would. It's a trade-off: CPU cycles vs. Capacity.

        2. Why C++? Valid point regarding runtime opacity. However, I chose C++ for RAII and Templates.

        RAII: Managing the life-cycle of VirtualAlloc/VirtualFree and the exception handlers is much safer with destructors, ensuring we don't leak reserved pages or leave handlers dangling.

        Templates: To integrate seamlessly with C++ containers (like std::vector), I needed to write a custom Allocator (GhostAllocator<T>). C++ templates make this zero-overhead abstraction possible, whereas in C, I'd have to rely on void* casting macros or manual memory management for generic structures.

        I try to stick to a "C with Classes" subset + Templates, avoiding heavy runtime features where possible to keep it predictable.