Skip to content

Avoid null-restrict evaluation for predicates that reference non-join columns in PushDownFilter#20961

Draft
kosiew wants to merge 6 commits intoapache:mainfrom
kosiew:push-down-02-20002
Draft

Avoid null-restrict evaluation for predicates that reference non-join columns in PushDownFilter#20961
kosiew wants to merge 6 commits intoapache:mainfrom
kosiew:push-down-02-20002

Conversation

@kosiew
Copy link
Contributor

@kosiew kosiew commented Mar 16, 2026

Which issue does this PR close?

Rationale for this change

PushDownFilter can spend a disproportionate amount of planning time inferring predicates across joins. One expensive path is is_restrict_null_predicate, which falls back to compiling and evaluating the predicate against a null-filled schema to decide whether a predicate is null-rejecting.

For predicates that reference columns outside the join-key set, that evaluation cannot succeed with the synthetic null schema built for join columns only. In practice, callers already treat evaluation failures as non-restricting, but we still pay the full cost of the physical-expression compilation and evaluation path first.

This change adds a cheap guard to detect predicates that reference columns outside the allowed join columns and returns false early. That preserves the existing behavior while avoiding unnecessary work in a hot optimizer path.

What changes are included in this PR?

This PR makes two focused changes:

  1. In is_restrict_null_predicate, collect the join columns into a HashSet and add a fast-path check that verifies whether the predicate only references those columns.
  2. If the predicate references any non-join column, return Ok(false) immediately instead of attempting null-evaluation.

Additionally:

  • The evaluated join-column set is reused for the fallback evaluate_expr_with_null_column path.
  • InferredPredicates::insert_inferred_predicate is simplified to use .unwrap_or(false) when consuming is_restrict_null_predicate, which matches the prior effective behavior of treating errors as non-restricting.
  • A regression test is added for a predicate like a > b, where b is outside the join-key set, to verify the fast path returns false.

Are these changes tested?

Yes.

A test case was added to cover the scenario where a predicate references a column outside the join key set:

  • a > b now explicitly verifies that is_restrict_null_predicate returns false.

This exercises the new early-return path and protects against regressions in predicate analysis behavior.

Are there any user-facing changes?

No.

This change is an internal optimizer performance improvement and does not change public APIs or intended query results.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

kosiew added 6 commits March 16, 2026 21:38
Introduce a test case to assert non-restricting behavior
when evaluating the predicate a > b, focusing on join
keys that only include a. This directly tests the new
early-return branch in the is_restrict_null_predicate
function in utils.rs, enhancing overall code coverage.
Extract the column-membership check into a new helper function
called `predicate_uses_only_columns` in utils.rs. Update the
current implementation at utils.rs:91 to use this new helper,
improving code readability and maintainability.
Add call-site contract comment in push_down_filter.rs to
specify that only Ok(true) is treated as null-restricting.
State that both Ok(false) and Err(_) are considered
non-restricting and will be skipped during processing.
Inline iterator predicate in utils.rs and streamline the
null-restrict handling in push_down_filter.rs. This
reduces indirections and lines of code while maintaining
the same logic and behavior. No public interface or
behavior changes intended.
@kosiew
Copy link
Contributor Author

kosiew commented Mar 16, 2026

run benchmark sql_planner_extended

@adriangbot
Copy link

🤖 Criterion benchmark running (GKE) | trigger
Linux bench-c4067930471-314-dhsls 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing push-down-02-20002 (3d3945c) to ab28234 (merge-base) diff
BENCH_NAME=sql_planner_extended
BENCH_COMMAND=cargo bench --features=parquet --bench sql_planner_extended
BENCH_FILTER=
Results will be posted here when complete

@github-actions github-actions bot added the optimizer Optimizer rules label Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants