Skip to content

[AURON #1840] Preserve collect_set first-occurrence order#2285

Draft
peter941221 wants to merge 1 commit into
apache:masterfrom
peter941221:fix/auron-1840-collect-set-order
Draft

[AURON #1840] Preserve collect_set first-occurrence order#2285
peter941221 wants to merge 1 commit into
apache:masterfrom
peter941221:fix/auron-1840-collect-set-order

Conversation

@peter941221
Copy link
Copy Markdown

What changed

AccSet::merge no longer swaps accumulators by set size. That swap changed the encounter order of collect_set, so no-shuffle Spark checks could see values in rhs-first order instead of first-occurrence order.

Why

Spark's collect_set preserves first-occurrence order in the no-shuffle path used by the affected aggregate suite.

Testing

  • git diff --check
  • cargo +nightly test --manifest-path native-engine/datafusion-ext-plans/Cargo.toml test_acc_set_merge -- --nocapture (blocked here by rdkafka-sys Windows build failure: %1 is not a valid Win32 application. (os error 193))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant