feat(parquet-variant): add Dictionary and REE variant_to_arrow support#10014
feat(parquet-variant): add Dictionary and REE variant_to_arrow support#10014mneetika wants to merge 1 commit into
Conversation
scovich
left a comment
There was a problem hiding this comment.
LGTM, but I'd love a second review from @sdf-jkl or @codephage2020 as a sanity check
|
Hey @scovich, the PR introduces unshred support for Variants where typed_value is Dict/REE, which is not permitted by the spec. The issue I created is for Seems like AI slop to me. |
🤦 I keep forgetting that, thanks for catching it. |
7fcd57f to
87fbf4b
Compare
|
@scovich Thanks for catching this, and apologies for the incorrect update. You were right that I have updated the PR to target the actual issue instead: I also updated the PR title/body and added regression tests for string dictionary, numeric dictionary, and run-end encoded outputs. Again apologies for the incorrect PR. |
Which issue does this PR close?
variant_to_arrowDictionary/REEtype support #10013Rationale for this change
variant_get/variant_to_arrowcan already convert Variant values into many native Arrow array layouts, but requestingDataType::DictionaryorDataType::RunEndEncodedwas not supported.This PR adds support for those output encodings without changing Variant shredding semantics.
DictionaryandRunEndEncodedare produced as Arrow result arrays only; they are not introduced as valid Parquet Variant shreddedtyped_valuelayouts.What changes are included in this PR?
variant_to_arrowforDataType::DictionaryandDataType::RunEndEncoded.variant_getregression coverage for string dictionary, numeric dictionary, and run-end encoded outputs.Are these changes tested?
Yes:
cargo fmt --checkcargo test -p parquet-variant-computecargo test -p parquet-variantcargo clippy --workspace --all-targetsAre there any user-facing changes?
Yes.
variant_getwithas_typeset toDataType::DictionaryorDataType::RunEndEncodedcan now return those Arrow array encodings.