Skip to content

Add mutable bitwise operations to BooleanArray and NullBuffer::union_many#9692

Open
mbutrovich wants to merge 6 commits intoapache:mainfrom
mbutrovich:union_many
Open

Add mutable bitwise operations to BooleanArray and NullBuffer::union_many#9692
mbutrovich wants to merge 6 commits intoapache:mainfrom
mbutrovich:union_many

Conversation

@mbutrovich
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich commented Apr 10, 2026

Which issue does this PR close?

Rationale for this change

Several DataFusion PRs (#21464, #21468, #21471, #21475, #21477, #21482, #21532) optimize NULL handling in scalar functions by replacing row-by-row null buffer construction with bulk NullBuffer::union. When 3+ null buffers need combining, they chain binary union calls, each allocating a new BooleanBuffer.

NullBuffer::union_many reduces this to 1 allocation (clone + in-place ANDs). For example, from #21482:

Before:

[array.nulls(), from_array.nulls(), to_array.nulls(), stride.and_then(|s| s.nulls())]
    .into_iter()
    .fold(None, |acc, nulls| NullBuffer::union(acc.as_ref(), nulls))

After:

NullBuffer::union_many(&[
    array.nulls(),
    from_array.nulls(),
    to_array.nulls(),
    stride.and_then(|s| s.nulls()),
])

Per @alamb's suggestion, this PR also implements the general-purpose mutable bitwise operations on BooleanArray from #8809, following the PrimitiveArray::unary / unary_mut pattern. This builds on the BitAndAssign/BitOrAssign/BitXorAssign operators added to BooleanBuffer in #9567.

What changes are included in this PR?

NullBuffer::union_many(&[Option<&NullBuffer>]): combines multiple null buffers in a single allocation (clone + in-place &=). Used by DataFusion for bulk null handling.

BooleanArray bitwise operations (6 new public methods):

Unary (op: FnMut(u64) -> u64):

  • bitwise_unary(&self, op) — always allocates a new array
  • bitwise_unary_mut(self, op) -> Result<Self, Self> — in-place if uniquely owned, Err(self) if shared
  • bitwise_unary_mut_or_clone(self, op) — in-place if uniquely owned, allocates if shared

Binary (op: FnMut(u64, u64) -> u64):

  • bitwise_bin_op(&self, rhs, op) — always allocates, unions null buffers
  • bitwise_bin_op_mut(self, rhs, op) -> Result<Self, Self> — in-place if uniquely owned, Err(self) if shared, unions null buffers
  • bitwise_bin_op_mut_or_clone(self, rhs, op) — in-place if uniquely owned, allocates if shared, unions null buffers

Note: #8809 proposed the binary variants take a raw buffer and right_offset_in_bits. This PR takes &BooleanArray instead, which encapsulates both and matches existing patterns like BooleanArray::from_binary.

Are these changes tested?

Yes. 23 tests for the BooleanArray bitwise methods and 6 tests for union_many, covering:

  • Basic correctness (AND, OR, NOT)
  • Null handling (both nullable, one nullable, no nulls, null union)
  • Buffer ownership (uniquely owned → in-place, shared → Err / fallback)
  • Edge cases (empty arrays, sliced arrays with non-zero offset, misaligned left/right offsets)

Are there any user-facing changes?

Six new public methods on BooleanArray and one new public method on NullBuffer.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Apr 10, 2026
@neilconway
Copy link
Copy Markdown

Nice! I noticed this as well, should be a nice win.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 10, 2026

Thnak you @mbutrovich and @neilconway

Instead of a new kernel, would you be wiling to implement this intead?

I think ti would take a little more work, but would be more general

@mbutrovich
Copy link
Copy Markdown
Contributor Author

Thnak you @mbutrovich and @neilconway

Instead of a new kernel, would you be wiling to implement this intead?

I think ti would take a little more work, but would be more general

I'll take a look!

@mbutrovich mbutrovich changed the title Add BooleanBuffer::bitand_many and NullBuffer::union_many Add mutable bitwise operations to BooleanArray and NullBuffer::union_many Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Arrow] Add bitwise operations BooleanArray that potentially reuse the underlying allocation

3 participants