Skip to content

PR #21743 disables Parquet pruning for = ANY([literals]) #22073

@cetra3

Description

@cetra3

Describe the bug

PR #21743 (commit 4bff17ebd) unified ANY/ALL planning under a CASE expression to align NULL semantics with PostgreSQL.

As a side effect, the new LogicalPlan shape is no longer recognized by PruningPredicate, which disables Parquet row-group statistics pruning, bloom-filter pruning, and page-index pruning for <expr> <op> ANY(<array literal>) queries.

To Reproduce

This will fail:

#[tokio::test]
async fn prune_int32_eq_any_literal_array() {
    RowGroupPruningTest::new()
        .with_scenario(Scenario::Int)
        .with_query("SELECT * FROM t where i32 = ANY([100])")
        .with_expected_errors(Some(0))
        .with_matched_by_stats(Some(0))
        .with_pruned_by_stats(Some(0))
        .with_pruned_files(Some(1))
        .with_matched_by_bloom_filter(Some(0))
        .with_pruned_by_bloom_filter(Some(0))
        .with_expected_rows(0)
        .test_row_group_prune()
        .await;
}

Running with the datafusion CLI, you can see it as well:

CREATE EXTERNAL TABLE alltypes_plain
STORED AS PARQUET
LOCATION 'parquet-testing/data/alltypes_plain.parquet';

EXPLAIN VERBOSE SELECT id FROM alltypes_plain WHERE id = ANY([1, 2, 3]);

Gives the following pruning predicate:

pruning_predicate=id_null_count@0 > 0 OR id_null_count@0 != row_count@1

Whereas an IN statement works fine

CREATE EXTERNAL TABLE alltypes_plain
STORED AS PARQUET
LOCATION 'parquet-testing/data/alltypes_plain.parquet';

EXPLAIN VERBOSE SELECT id FROM alltypes_plain WHERE id IN (1, 2, 3);
pruning_predicate=id_null_count@2 != row_count@3 AND id_min@0 <= 1 AND 1 <= id_max@1 OR id_null_count@2 != row_count@3 AND id_min@0 <= 2 AND 2 <= id_max@1 OR id_null_count@2 != row_count@3 AND id_min@0 <= 3 AND 3 <= id_max@1,

Expected behavior

The test will pass

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions