feat: add fair_unified_task_shared memory pool to fix 2x memory allocation by andygrove · Pull Request #3924 · apache/datafusion-comet

andygrove · 2026-04-10T16:27:57Z

Which issue does this PR close?

Rationale for this change

When Comet executes a shuffle, it creates two separate native plans (the child plan and the shuffle writer plan) that run concurrently in a pipelined fashion. Previously, each plan got its own memory pool at the full per-task limit, effectively allowing 2x the intended memory to be consumed.

The new fair_unified_task_shared pool type shares a single CometFairMemoryPool across all native plans within the same Spark task. This ensures the total memory stays within the per-task limit while dynamically distributing memory among operators based on how many register as memory consumers.

This pool type is opt-in (not the default) to allow further testing and evaluation.

What changes are included in this PR?

Add fair_unified_task_shared memory pool type that shares a single pool across all native plans within the same Spark task, using a global HashMap<task_attempt_id, PerTaskMemoryPool> with reference counting
Fix a tracing bug where total_reserved_for_thread() and unregister_and_total() double-counted memory when multiple execution contexts shared the same pool Arc. Deduplicates by Arc data pointer before summing reserved()
Keep fair_unified as the default; fair_unified_task_shared is opt-in via spark.comet.exec.memoryPool
Update tuning docs to document all three off-heap pool types

How are these changes tested?

Tested with TPC-H queries using tracing to compare jemalloc and pool-tracked memory between fair_unified and fair_unified_task_shared. Confirmed that the tracing fix eliminates the false 2x reporting and that both pool types show equivalent actual memory usage.

…ation When Comet executes a shuffle, it creates two separate native plans (the child plan and the shuffle writer plan) that run concurrently in a pipelined fashion. Previously, each plan got its own memory pool at the full per-task limit, effectively allowing 2x the intended memory to be consumed. The new `fair_unified_task_shared` pool type shares a single CometFairMemoryPool across all native plans within the same Spark task. This ensures the total memory stays within the per-task limit while dynamically distributing memory among operators based on how many register as memory consumers (e.g. if the child plan is a simple scan+filter, the shuffle writer gets 100% of the pool). This is now the default for off-heap mode. Closes apache#3921 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When using fair_unified_task_shared, multiple execution contexts on the same thread share a single Arc<dyn MemoryPool>. The tracing code was summing pool.reserved() for each registered context, double-counting the shared pool and reporting 2x the actual memory reservation. Deduplicate pools by Arc data pointer before summing so each underlying pool is only counted once.

Make fair_unified_task_shared opt-in rather than the default to simplify review. Update docs to reflect the new default.

Add context about how Comet creates two concurrent native plans per Spark task during shuffle and why this matters for pool selection.

andygrove and others added 4 commits April 10, 2026 10:24

Merge remote-tracking branch 'apache/main' into task-shared-unified-pool

8096420

feat: change default memory pool back to fair_unified

fed7428

Make fair_unified_task_shared opt-in rather than the default to simplify review. Update docs to reflect the new default.

andygrove changed the title ~~feat: add fair_unified_task_shared memory pool to fix 2x memory allocation [experimental]~~ feat: add fair_unified_task_shared memory pool to fix 2x memory allocation Apr 11, 2026

docs: explain dual native plan architecture in tuning guide

f8b15e5

Add context about how Comet creates two concurrent native plans per Spark task during shuffle and why this matters for pool selection.

andygrove marked this pull request as ready for review April 11, 2026 16:06

andygrove requested review from comphead, mbutrovich and parthchandra April 11, 2026 16:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add fair_unified_task_shared memory pool to fix 2x memory allocation#3924

feat: add fair_unified_task_shared memory pool to fix 2x memory allocation#3924
andygrove wants to merge 5 commits intoapache:mainfrom
andygrove:task-shared-unified-pool

andygrove commented Apr 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andygrove commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

andygrove commented Apr 10, 2026 •

edited

Loading