Skip to content

Null Handling in jsonExtractScalar and jsonExtractIndex SV transforms#18569

Open
siddharthteotia wants to merge 11 commits into
apache:masterfrom
siddharthteotia:json-extract-scalar-honor-null-handling
Open

Null Handling in jsonExtractScalar and jsonExtractIndex SV transforms#18569
siddharthteotia wants to merge 11 commits into
apache:masterfrom
siddharthteotia:json-extract-scalar-honor-null-handling

Conversation

@siddharthteotia
Copy link
Copy Markdown
Contributor

@siddharthteotia siddharthteotia commented May 22, 2026

Fixes #18568.

Improvement Summary

Both JsonExtractScalarTransformFunction and JsonExtractIndexTransformFunction had _nullHandlingEnabled plumbed through (or accessible from BaseTransformFunction) but their six typed SV methods (transformTo{Int,Long,Float,Double,BigDecimal,String}ValuesSV) ignored it and threw unconditionally when the JSON path didn't resolve and no default literal was supplied. This PR makes both transforms honor _nullHandlingEnabled: when query-level null handling is on, unresolved rows surface as SQL NULL (typed null placeholder + a bit in getNullBitmap()) instead of throwing.

See #18568 for the full behavioral matrix (before/after, NH on/off, with and without default value) and the worked country/clicks example.

MV transforms are unchanged. The legacy throw is preserved when null handling is off, so callers that haven't opted in see no behavior change. JsonIndexDistinctOperator already handles this correctly and is untouched.

Testing Done

  • Added new tests to JsonExtractScalarTransformFunctionTest and JsonExtractIndexTransformFunctionTest to get 100% coverage for null handling
  • Added multiple query shapes -- PROJECT, DISTINCT and GROUP BY
  • Added multiple Json column shapes -- flat, arbitrarily nested

When query-level null handling is enabled and no default literal is set,
the typed SV paths now emit SQL NULL for unresolved JSON paths instead of
throwing IllegalArgumentException.
@siddharthteotia siddharthteotia changed the title Honor enableNullHandling in jsonExtractScalar / jsonExtractIndex SV transforms [WIP] Honor enableNullHandling in jsonExtractScalar / jsonExtractIndex SV transforms May 22, 2026
@siddharthteotia siddharthteotia changed the title [WIP] Honor enableNullHandling in jsonExtractScalar / jsonExtractIndex SV transforms [WIP] Fix inconsistent NULL Handling in JsonExtractScalar and JsonExtractIndex May 22, 2026
Mirror the scalar fix: when query-level null handling is enabled and no
default literal is set, the six typed SV paths emit SQL NULL for unresolved
JSON paths instead of throwing RuntimeException.
@siddharthteotia siddharthteotia changed the title [WIP] Fix inconsistent NULL Handling in JsonExtractScalar and JsonExtractIndex Honor enableNullHandling in jsonExtractScalar and jsonExtractIndex SV transforms May 22, 2026
Named single-row JSON constants with pretty-printed comments, named fixture
rows, helper methods that eliminate the inline FluentQueryTest boilerplate,
and per-test Javadocs with a concrete example query and expected result.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 22, 2026

Codecov Report

❌ Patch coverage is 0% with 70 lines in your changes missing coverage. Please review.
✅ Project coverage is 36.79%. Comparing base (ed41f5e) to head (a76fc9e).
⚠️ Report is 26 commits behind head on master.

Files with missing lines Patch % Lines
...rm/function/JsonExtractIndexTransformFunction.java 0.00% 51 Missing ⚠️
...m/function/JsonExtractScalarTransformFunction.java 0.00% 19 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (ed41f5e) and HEAD (a76fc9e). Click for more details.

HEAD has 2 uploads less than BASE
Flag BASE (ed41f5e) HEAD (a76fc9e)
unittests1 1 0
unittests 2 1
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #18569       +/-   ##
=============================================
- Coverage     64.27%   36.79%   -27.49%     
- Complexity     1130     1136        +6     
=============================================
  Files          3311     3335       +24     
  Lines        203946   206088     +2142     
  Branches      31736    32158      +422     
=============================================
- Hits         131093    75835    -55258     
- Misses        62341   123402    +61061     
+ Partials      10512     6851     -3661     
Flag Coverage Δ
integration 100.00% <ø> (?)
integration1 100.00% <ø> (?)
java-21 36.79% <0.00%> (-27.49%) ⬇️
temurin 36.79% <0.00%> (-27.49%) ⬇️
unittests 36.79% <0.00%> (-27.49%) ⬇️
unittests1 ?
unittests2 36.79% <0.00%> (+1.25%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Replace scattered single-row JSON constants with one 4-row x 2-column
fixture (flatJson + nestedJson) appended to the end of the test class.
Tests are grouped by query shape: projection -> DISTINCT -> GROUP BY, each
with NH-on and NH-off variants. Per-test Javadocs include a concrete query
and expected result; result rows are ordered ASC NULLS LAST.
@siddharthteotia siddharthteotia changed the title Honor enableNullHandling in jsonExtractScalar and jsonExtractIndex SV transforms [WIP] Fix Null Handling in jsonExtractScalar and jsonExtractIndex SV transforms May 22, 2026
…ionTest

Replace the minimal unit-level null-handling tests with the same
fixture + projection/DISTINCT/GROUP BY structure as the scalar test, so
both test files cover the same query shapes against the same data model.
The helper configures a JSON index on both columns via FieldConfig.
In each SV transform, by the time control reaches the null-handling
branch, the earlier `if (_defaultValue != null)` check has already
filled defaults, so the remaining decision reduces to `_nullHandlingEnabled`.
getNullBitmap inlines the equivalent short-circuit condition. No
behavior change; 209 tests unchanged.
Three tests per file (projection, DISTINCT, GROUP BY) pin the SV
transform's priority order: a user-supplied default wins over the
null-handling placeholder. Without these, a future refactor could
silently swap the order so unresolved rows surface as SQL NULL instead
of the user's default.
Default-precedence tests in both files now parameterize over flatJson +
nestedJson (mirrors the projection/DISTINCT/GROUP BY data providers).
Scalar test adds Section 5 covering 4-arg with SQL NULL literal under
NH on, pinning the _defaultIsNull code path in getNullBitmap.
Both test files now have Section 1 (Projection), 2 (DISTINCT), 3 (GROUP BY)
where each section owns its own sub-cases (.1 NH-on 3-arg, .2 NH-on with
SQL-NULL default [scalar only], .3 NH-on with non-null default, .4 NH-off).
Previously, default-precedence and SQL-NULL-default tests lived in separate
sections 4/5 even though they belonged to the same three query shapes.
- Cache the per-ValueBlock getValuesSV result so the SV transform and
  getNullBitmap don't each hit the index reader under null handling
  (perf MAJOR).
- Add null-handling note to the class Javadoc (doc MAJOR).
- Use valueBlock.getNumDocs() for the loop bound in getNullBitmap to
  match sibling SV transforms (MINOR).
- Swap @OverRide @nullable to @nullable @OverRide to match the parent's
  convention (MINOR).
@siddharthteotia siddharthteotia marked this pull request as ready for review May 27, 2026 22:16
@siddharthteotia siddharthteotia changed the title [WIP] Fix Null Handling in jsonExtractScalar and jsonExtractIndex SV transforms Fix Null Handling in jsonExtractScalar and jsonExtractIndex SV transforms May 27, 2026
@siddharthteotia siddharthteotia changed the title Fix Null Handling in jsonExtractScalar and jsonExtractIndex SV transforms Null Handling in jsonExtractScalar and jsonExtractIndex SV transforms May 27, 2026
init now detects a SQL NULL literal as the 4-arg default via literal.isNull()
and leaves _defaultValue null, so 3-arg semantics apply (NH on -> SQL NULL,
NH off -> throw). Without this, STRING/JSON silently emitted "" and numerics
failed at init -- both inconsistent with the scalar function's fix.

Adds matching Section 1.2 / 2.2 / 3.2 tests in the index test file (6 new
parameterized cases) covering flat + nested paths under projection, DISTINCT,
and GROUP BY. Also fixes a comment typo (broker -> server) flagged by review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inconsistent Null Handling in JsonExtractScalar and JsonExtractIndex.

2 participants