fix: bit_count function to report nullability correctly #19197
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #19147
Part of #19144 (EPIC: fix nullability report for spark expression)
Rationale for this change
The
bit_countUDF was using the defaultreturn_typeimplementation which does not preserve nullability information. This causes:The
bit_countfunction counts the number of set bits (ones) in the binary representation of a number and returns an Int32. The operation itself doesn't introduce nullability - if the input is non-nullable, the output will always be non-nullable. Therefore, the output nullability should match the input.What changes are included in this PR?
return_field_from_args: Creates a field with Int32 type and the same nullability as the input fieldreturn_type: Now returns an error directing users to usereturn_field_from_argsinstead (following DataFusion best practices)FieldRefandinternal_errimports to support the new implementationAre these changes tested?
Yes, this PR includes a new test
test_bit_count_nullabilitythat verifies:Test results:
Additionally, all existing
bit_counttests continue to pass, ensuring backward compatibility.Are there any user-facing changes?
Yes - Schema metadata improvement:
Users will now see correct nullability information in the schema:
Before (Bug):
After (Fixed):
This is a bug fix that corrects schema metadata only - it does not change the actual computation or introduce any breaking changes to the API.
Impact:
Code Changes Summary
Modified File:
datafusion/spark/src/function/bitwise/bit_count.rs1. Added Imports
2. Updated return_type Method
3. Added return_field_from_args Implementation
4. Added Test
Verification Steps
Run the new test:
cargo test -p datafusion-spark test_bit_count_nullability --libRun all bit_count tests:
cargo test -p datafusion-spark bit_count --libRun clippy checks:
All checks pass successfully!
Related Issues
bit_countneed to have custom nullability #19147shuffleshould report nullability correctly #19145 (shuffle function nullability)bitmap_countneed to have custom nullability #19146 (bitmap_count function nullability)