4 pointsby DeusCodexa month ago2 comments
  • DeusCodexa month ago
    Hi HN, author here. I built BlockFrame because I wanted the durability of distributed object storage (erasure coding, bit-rot protection) but for local, single-node archives.

    It is a storage engine that shards files into Reed-Solomon blocks (RS(30,3) or RS(1,3)) to guarantee mathematical recovery from disk corruption. It then exposes this engine via a FUSE/WinFSP interface so you can access the data using standard tools (read, seek) without needing custom APIs.

    Key features:

    Engine Layer: Handles the heavy lifting of parity calculation and Merkle tree verification.

    Access Layer: Virtual filesystem driver allows zero-copy access and random seeking on multi-gigabyte datasets.

    Self-Healing: The engine transparently reconstructs corrupted sectors during reads.

    I’m graduating in May and aiming for systems roles in the UK. I'd love feedback on the architecture and any feedback is highly appreciated, Thank you.

  • compressedgasa month ago
    RS(1,3) is a slow way to store four copies.
    • DeusCodexa month ago
      I'm open to suggestions for better erasure coding storage ratio :)
      • compressedgasa month ago
        There is nothing wrong with the ratio. It is that for this ratio it is less computation to use the Merkle tree and plain replication rather than erasure coding. My suggestion is already stated. Store four copies and use the Merkle tree to determine which is valid.
        • DeusCodexa month ago
          You know what, that might be a very good idea, if encoding speeds on my really crappy hard-drive weren't under a second. But no I do see where you're coming from, and for larger files that concept does apply, but blockframe wants to protect your files from circumstances that you can't control, and the concept is still, have the file as its own entity, and to keep it planted, we use RS to make sure nothing funky happens. Small files do tend to cause random access speeds, that's why they're not split up.
          • compressedgas23 days ago
            As a post script, I should add that I had a thought that the RS(1,3) was less redundant than copies=4 as I think that RS(1,3) requires more than 1 of its four symbols to be present.

            Based on https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_cor... which gives (n=1 - k=3)/2 is 1 as the answer to the question of the number of missing symbols that RS(1,3) can recover which means that RS(1,3) is not a worse way of storing four copies but a worse way of storing 2 copies. It takes the space of four copies to store what only has the redundancy of two copies.