1 pointby coryschwartz3 hours ago1 comment
  • coryschwartz3 hours ago
    We have updated our storage-proofs implementations, and the change includes some breaking changes to our line API. The main change is that we have done away with numerical block identifiers in favor of byte arrays.

    The primary benefit of byte array block identifiers is that it unlocks, at least for some POS protocols, the possibility of remotlely verifying the possession of data that cannot be expressed as a simple array of blocks. Let me explain.

    In the most straightforward POS system, the idea we have in mind is that we have some large file we want an untrusted third party. We break up our file into evenly sized blocks number them, and we perform some mathematical magic on each block. Just like any cryptographic system, the details really matter here and they vary by implementation.

    In the simple case, a file broken into evenly sized blocks, then the block identifier is simply an integer, in order, as the blocks appear in a particular file. In this case, changing our block IDs from an integer harmless. Okay, a uint64 instead becomes a big-endian 64-bit array. All the math works out the same either way.

    But what happens if the data you want to verify is not simply an array of blocks? The data might be something in a database, a sparse array, a node in a graph, and so on. For complex structures, a determinsitic ordering might not be obvious, and yet we still could have a compelling interest in remote storage verification.

    So here's our discovery:

    Provable data possession by at Untrusted Stores (2007, by Atenisse and others) (pdp/atenisse in the repo) It's an oldie but a goodie. It performs masterfully. It is identical in features and performance, regardless of block ID order or missing blocks. This protocol is a riff on the RSA algorithm. Proofs degrade into what is list of tag-block tuples to be exponentiated. This means, if you have some particular interest in verifying a specific section of the data, you can do it.

    Dynamic data possession (2009, Erway, etc) (pdp/erway in the repo) This protocol does not work well on data that is not expressed as a block array. This protocol relies on an authenticated skip list. We were able to get this to work by filling empty spaces with dummy placeholder blocks but doing so degrades the performance of the protocol at catching bad blocks substantially, so it's realy unsuitable for non-sequential data. This is quite unfortunate because dynamic updates to remote stored data is really quite a useful feature to have on a storage system.

    Proof of Retrievability (2009, originally, Bowes, Juels, Oprea) (por/bjo in the repo) This one is also unsuitable for non-array structures. The entire protocol is built arround an ECC-encoded file broken up into sequential blocks. To use this particular POR, you'll need to somehow serialize the data first and treat the whole archive as a single retrievable object.

    Compact proofs of Retrievability (2009, Shacham and Waters) (por/sw in the repo) Similarly to the ateniese implementation, each block is tagged independently in such a mannor that support for sparse array verification is fully supported in our implementation.

    In the future, I will add a few other more recent POS implementations to the study.