Skip to content

Flink: Add decimal write/read roundtrip test for FlinkParquetReaders#16346

Open
wombatu-kun wants to merge 2 commits into
apache:mainfrom
wombatu-kun:flink-parquet-decimal-roundtrip-test
Open

Flink: Add decimal write/read roundtrip test for FlinkParquetReaders#16346
wombatu-kun wants to merge 2 commits into
apache:mainfrom
wombatu-kun:flink-parquet-decimal-roundtrip-test

Conversation

@wombatu-kun
Copy link
Copy Markdown
Contributor

@wombatu-kun wombatu-kun commented May 15, 2026

Summary

  • Address the longstanding TODO in FlinkParquetReaders.BinaryDecimalReader by adding a Flink-writer → Flink-reader roundtrip test for decimals across precisions 9/2, 15/3 and 38/10 (covering INT32/INT64/FIXED_LEN_BYTE_ARRAY physical encodings).
  • Test added to existing TestFlinkParquetReader in v1.20, v2.0 and v2.1; the TODO comment is removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the flink label May 15, 2026
@pvary
Copy link
Copy Markdown
Contributor

pvary commented May 15, 2026

Should we test this in the BaseFormatModelTests?

…odelTests

Per review feedback on apache#16346, instead of a Flink-only decimal Parquet test, add a StructWithDecimals generator to DataGenerators.ALL and a testDataWriterEngineWriteEngineRead case to BaseFormatModelTests. This exercises decimal(9,2)/(15,3)/(38,10) write-read roundtrips (Parquet INT32/INT64/FIXED_LEN_BYTE_ARRAY encodings) for Flink and Spark across Avro, Parquet and ORC, and closes the missing engine-write -> engine-read symmetry in the shared harness. The duplicated TestFlinkParquetReader decimal test is removed from Flink v1.20/v2.0/v2.1; the FlinkParquetReaders TODO stays resolved by the broader shared coverage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the data label May 15, 2026
@wombatu-kun
Copy link
Copy Markdown
Contributor Author

Good call, thanks @pvary — done in 6d2f777.

Rather than a Flink-only test I added a StructWithDecimals generator (decimal(9,2) / (15,3) / (38,10), covering the Parquet INT32 / INT64 / FIXED_LEN_BYTE_ARRAY encodings) to DataGenerators.ALL, so decimals now flow through the whole BaseFormatModelTests matrix for Flink and Spark across Avro/Parquet/ORC.

One gap I hit while doing this: the generator-parameterized cases were only engineWrite→genericRead and genericWrite→engineRead, so the engine-writer→engine-reader roundtrip the original TODO asked for ("write-read-validate decimal via FlinkParquetWrite/Reader") wasn't actually covered by any single case. I added a testDataWriterEngineWriteEngineRead case to BaseFormatModelTests to close that symmetry — for all types/engines/formats, not just decimals.

The duplicated TestFlinkParquetReader decimal test is removed from v1.20/v2.0/v2.1; the FlinkParquetReaders TODO stays resolved by the broader shared coverage. Verified green: TestFlinkFormatModel (Flink 1.20/2.0/2.1) and TestSparkFormatModel (Spark 4.1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants