Skip to content

Add DP smooth sensitivity sample-median support#18

Open
ila wants to merge 2 commits into
mainfrom
dp-smooth-sensitivity
Open

Add DP smooth sensitivity sample-median support#18
ila wants to merge 2 commits into
mainfrom
dp-smooth-sensitivity

Conversation

@ila
Copy link
Copy Markdown
Member

@ila ila commented May 29, 2026

Summary

This PR adds the DP sample-median smooth-sensitivity path on top of the existing PAC rewrite machinery. The main goal is to stop maintaining a separate DP-only PU hash path and instead reuse PAC's FK-join insertion and privacy-unit hash propagation for sample-median DP queries.

It also extends sample-median DP from COUNT/SUM to AVG by reusing the existing DP AVG rewrite pattern: AVG is decomposed into SUM and COUNT, each component is noised separately, and the final projection computes the ratio. Hidden AVG components stay DOUBLE until the ratio projection so noisy denominators and sums are not truncated before division.

Code Flow

For dp_strategy = 'sample_median', compilation now follows this path:

  1. Validate DP settings (dp_epsilon, dp_delta) and the supported aggregate/query shape.
  2. Validate that the query has the same linear FK/self-join shape expected by DP, but do not build elastic-sensitivity FK-column metadata for sample-median.
  3. Rewrite AVG into SUM plus COUNT components.
  4. For SUM/AVG, require dp_sum_bound and pac_clip_support, then clip SUM inputs before aggregation.
  5. Call the shared PAC setup to insert any missing FK joins and propagate the PU hash to the target aggregate.
  6. Rewrite the aggregate to sample-median counters using that propagated PU hash.
  7. Apply privacy_min_group_count before noising when configured.
  8. Project dp_smooth_median_noise over each release, splitting both epsilon and delta across visible releases and AVG components.
  9. For AVG, add a final ratio projection above the noise projection.

Refactor

The second commit removes duplicated projection insertion/remapping code in the DP compiler. Elastic Laplace noise, sample-median smooth noise, and AVG ratio rewrites now share one helper for inserting a projection and remapping upstream column bindings. The helper keeps the aggregate-column source offset explicit, which matters for grouped AVG ratio projections where group columns and aggregate columns come from the same intermediate projection.

The refactor also separates aggregate validation from FK-chain extraction. Elastic still extracts the full FK chain needed for sensitivity computation; sample-median only validates the query shape and then delegates PU hash discovery to the PAC path.

Local Validation

Built the release extension and ran the focused SQL suites for dp_sample_median, dp_elastic, pac_plan, and pac_clip_sum. I also ran a real DuckDB CLI grouped noisy sample-median AVG query and verified both groups returned finite DOUBLE results.

ila added 2 commits May 29, 2026 15:42
Reuse PAC privacy-unit hash propagation for dp_strategy='sample_median' so DP smooth-sensitivity queries follow the same FK-join insertion and hash propagation path as PAC. This removes the old one-hop DP-only PU hash helper and lets sample-median cover the same linear FK chains exercised by PAC.

Add AVG support by rewriting AVG into SUM plus COUNT components, keeping hidden sample-median AVG components as DOUBLE until the final ratio projection. Split epsilon and delta across visible aggregate releases and AVG components before calling dp_smooth_median_noise.

Extend dp_sample_median SQL coverage for COUNT, SUM, AVG, grouped/HAVING/FILTER AVG, noisy AVG finiteness, multi-hop explicit and implicit FK chains, and unsupported DISTINCT, MIN, missing bounds, star/diamond joins, and self-joins.
Share the projection insertion and binding-remap path used by elastic Laplace noise, sample-median smooth noise, and AVG ratio rewrites. The helper keeps the aggregate-column source offset explicit so grouped AVG remapping continues to bind to the right projection columns.

Split DP aggregate validation from FK-chain extraction. Elastic sensitivity still materializes the full FK chain with FK columns, while sample-median only validates the linear FK/self-join shape before reusing PAC privacy-unit hash propagation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant