Skip to content

Fix Parquet/Avro list struct MV expanded as array of maps#17708

Open
rsrkpatwari1234 wants to merge 3 commits intoapache:masterfrom
rsrkpatwari1234:radhika-patwari/issue-17420
Open

Fix Parquet/Avro list struct MV expanded as array of maps#17708
rsrkpatwari1234 wants to merge 3 commits intoapache:masterfrom
rsrkpatwari1234:radhika-patwari/issue-17420

Conversation

@rsrkpatwari1234
Copy link
Contributor

Problem

When Parquet/Avro store a list as an array of structs with a single field "element", the record extractor was leaving them as arrays of single-key maps instead of unwrapping to plain arrays. For example:
Expected: "metadata": {"tags": ["abc", "xyz"]}
Actual: "metadata": {"tags": [{"element":"abc"}, {"element":"xyz"}]}
This breaks schemas and queries that expect multi-value columns to be arrays of scalars.

Solution

BaseRecordExtractor: After building a multi-value array from a Collection or Object[], run it through a new helper unwrapElementMapsInArray(). If every element is a Map with exactly one key "element", replace the array with the values of that key; otherwise leave the array unchanged. Do not apply unwrapping to primitive arrays.

Tests

Added BaseRecordExtractorTest with 8 cases covering unwrap when all elements are single-key "element" maps, no unwrap for mixed/multi-key/different-key/primitive/empty, and a full-path extract() test.

Scope

Change is in pinot-spi; any record reader that extends BaseRecordExtractor gets the fix.
Primitive arrays and non–element-map arrays are unchanged.

Fixes #17420

@codecov-commenter
Copy link

codecov-commenter commented Feb 15, 2026

Codecov Report

❌ Patch coverage is 94.11765% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 63.20%. Comparing base (b0cb6cc) to head (42bd825).

Files with missing lines Patch % Lines
...he/pinot/spi/data/readers/BaseRecordExtractor.java 94.11% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17708      +/-   ##
============================================
- Coverage     63.20%   63.20%   -0.01%     
  Complexity     1499     1499              
============================================
  Files          3179     3179              
  Lines        190696   190709      +13     
  Branches      29151    29155       +4     
============================================
+ Hits         120525   120532       +7     
- Misses        60797    60802       +5     
- Partials       9374     9375       +1     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.16% <94.11%> (-0.02%) ⬇️
java-21 63.17% <94.11%> (+<0.01%) ⬆️
temurin 63.20% <94.11%> (-0.01%) ⬇️
unittests 63.19% <94.11%> (-0.01%) ⬇️
unittests1 55.63% <94.11%> (+0.01%) ⬆️
unittests2 34.04% <94.11%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ParquetNativeRecordReader and ParquetAvroRecordReader expand a json struct MV element as a MV element of maps

2 participants