Add DP smooth sensitivity sample-median support#18
Open
ila wants to merge 2 commits into
Open
Conversation
Reuse PAC privacy-unit hash propagation for dp_strategy='sample_median' so DP smooth-sensitivity queries follow the same FK-join insertion and hash propagation path as PAC. This removes the old one-hop DP-only PU hash helper and lets sample-median cover the same linear FK chains exercised by PAC. Add AVG support by rewriting AVG into SUM plus COUNT components, keeping hidden sample-median AVG components as DOUBLE until the final ratio projection. Split epsilon and delta across visible aggregate releases and AVG components before calling dp_smooth_median_noise. Extend dp_sample_median SQL coverage for COUNT, SUM, AVG, grouped/HAVING/FILTER AVG, noisy AVG finiteness, multi-hop explicit and implicit FK chains, and unsupported DISTINCT, MIN, missing bounds, star/diamond joins, and self-joins.
Share the projection insertion and binding-remap path used by elastic Laplace noise, sample-median smooth noise, and AVG ratio rewrites. The helper keeps the aggregate-column source offset explicit so grouped AVG remapping continues to bind to the right projection columns. Split DP aggregate validation from FK-chain extraction. Elastic sensitivity still materializes the full FK chain with FK columns, while sample-median only validates the linear FK/self-join shape before reusing PAC privacy-unit hash propagation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds the DP sample-median smooth-sensitivity path on top of the existing PAC rewrite machinery. The main goal is to stop maintaining a separate DP-only PU hash path and instead reuse PAC's FK-join insertion and privacy-unit hash propagation for sample-median DP queries.
It also extends sample-median DP from COUNT/SUM to AVG by reusing the existing DP AVG rewrite pattern: AVG is decomposed into SUM and COUNT, each component is noised separately, and the final projection computes the ratio. Hidden AVG components stay DOUBLE until the ratio projection so noisy denominators and sums are not truncated before division.
Code Flow
For
dp_strategy = 'sample_median', compilation now follows this path:dp_epsilon,dp_delta) and the supported aggregate/query shape.dp_sum_boundandpac_clip_support, then clip SUM inputs before aggregation.privacy_min_group_countbefore noising when configured.dp_smooth_median_noiseover each release, splitting both epsilon and delta across visible releases and AVG components.Refactor
The second commit removes duplicated projection insertion/remapping code in the DP compiler. Elastic Laplace noise, sample-median smooth noise, and AVG ratio rewrites now share one helper for inserting a projection and remapping upstream column bindings. The helper keeps the aggregate-column source offset explicit, which matters for grouped AVG ratio projections where group columns and aggregate columns come from the same intermediate projection.
The refactor also separates aggregate validation from FK-chain extraction. Elastic still extracts the full FK chain needed for sensitivity computation; sample-median only validates the query shape and then delegates PU hash discovery to the PAC path.
Local Validation
Built the release extension and ran the focused SQL suites for
dp_sample_median,dp_elastic,pac_plan, andpac_clip_sum. I also ran a real DuckDB CLI grouped noisy sample-median AVG query and verified both groups returned finite DOUBLE results.