1 pointby ila_b8 hours ago1 comment
  • ila_b8 hours ago
    We at CWI built a DuckDB community extension that transparently adds privacy protection to aggregate queries, needing no per-query privacy budget, no epsilon tuning, or domain knowledge, providing an automatic alternative to Differential Privacy.

    PAC is based on MIT's PAC Privacy framework: it hashes each privacy unit's key into a 64-bit value to create 64 sub-samples ("possible worlds"). Your aggregate runs on all 64 worlds independently; the result comes from one secret world, noised using the variance across all of them. Different hash + secret world per query makes membership inference provably hard.

    Those 64 possible worlds map perfectly to 64-bit SIMD registers. We bitslice the entire computation, evaluating all the worlds in a single pass over the data using CPU vector instructions. The average overhead is only ~2x even on large scale factors.

    Example:

      -- Generate TPC-H benchmark data
      INSTALL tpch;
      LOAD tpch;
      CALL dbgen(sf=1);
    
      -- Mark customer as the privacy unit
      ALTER TABLE customer ADD PAC_KEY (c_custkey);
      ALTER TABLE customer SET PU;
      
      -- Protect sensitive customer columns
      ALTER PU TABLE customer ADD PROTECTED (c_custkey);
      ALTER PU TABLE customer ADD PROTECTED (c_name);
      ALTER PU TABLE customer ADD PROTECTED (c_address);
      ALTER PU TABLE customer ADD PROTECTED (c_acctbal);
      
      -- Define join chain: lineitem -> orders -> customer
      ALTER TABLE orders ADD PAC_LINK (o_custkey) REFERENCES customer(c_custkey);
      ALTER TABLE lineitem ADD PAC_LINK (l_orderkey) REFERENCES orders(o_orderkey);
      
      -- Aggregates on linked tables are automatically noised
      SELECT l_returnflag, l_linestatus, SUM(l_extendedprice)
      FROM lineitem GROUP BY ALL;                                                                                                  
                                                                                                                                                                    
    Works with joins, subqueries (correlated & uncorrelated), CTEs, GROUP BY, HAVING, ORDER BY, LIMIT. Also runs in your browser via WASM (shell.duckdb.org).

    Paper: https://arxiv.org/abs/2603.15023

    Extension page: https://duckdb.org/community_extensions/extensions/pac

    We're looking for feedback, especially on edge cases, usability and query coverage.