SHOW HN: PAC – Automatic privatization of SQL queries(github.com)

1 pointby ila_b8 hours ago1 comment

ila_b8 hours ago

We at CWI built a DuckDB community extension that transparently adds privacy protection to aggregate queries, needing no per-query privacy budget, no epsilon tuning, or domain knowledge, providing an automatic alternative to Differential Privacy.

PAC is based on MIT's PAC Privacy framework: it hashes each privacy unit's key into a 64-bit value to create 64 sub-samples ("possible worlds"). Your aggregate runs on all 64 worlds independently; the result comes from one secret world, noised using the variance across all of them. Different hash + secret world per query makes membership inference provably hard.

Those 64 possible worlds map perfectly to 64-bit SIMD registers. We bitslice the entire computation, evaluating all the worlds in a single pass over the data using CPU vector instructions. The average overhead is only ~2x even on large scale factors.

Example:

  -- Generate TPC-H benchmark data
  INSTALL tpch;
  LOAD tpch;
  CALL dbgen(sf=1);

  -- Mark customer as the privacy unit
  ALTER TABLE customer ADD PAC_KEY (c_custkey);
  ALTER TABLE customer SET PU;
  
  -- Protect sensitive customer columns
  ALTER PU TABLE customer ADD PROTECTED (c_custkey);
  ALTER PU TABLE customer ADD PROTECTED (c_name);
  ALTER PU TABLE customer ADD PROTECTED (c_address);
  ALTER PU TABLE customer ADD PROTECTED (c_acctbal);
  
  -- Define join chain: lineitem -> orders -> customer
  ALTER TABLE orders ADD PAC_LINK (o_custkey) REFERENCES customer(c_custkey);
  ALTER TABLE lineitem ADD PAC_LINK (l_orderkey) REFERENCES orders(o_orderkey);
  
  -- Aggregates on linked tables are automatically noised
  SELECT l_returnflag, l_linestatus, SUM(l_extendedprice)
  FROM lineitem GROUP BY ALL;

Works with joins, subqueries (correlated & uncorrelated), CTEs, GROUP BY, HAVING, ORDER BY, LIMIT. Also runs in your browser via WASM (shell.duckdb.org).

Paper: https://arxiv.org/abs/2603.15023

Extension page: https://duckdb.org/community_extensions/extensions/pac

We're looking for feedback, especially on edge cases, usability and query coverage.