Labels: enhancement, VELOX
Description
Enable the GlutenParquetTypeWideningSuite test suite for Spark 4.0 and 4.1, which validates Parquet type widening support (SPARK-40876).
Background
GlutenParquetTypeWideningSuite has 84 tests covering two types of Parquet type conversions:
- Physical→Logical type restoration: Reading
int32 + INT(8) as TINYINT (safe, writer guarantees value range)
- Schema evolution widening: Reading old
IntegerType data as LongType, DoubleType, or DecimalType (Spark 4.0 feature)
Currently the suite is disabled with 74 out of 84 tests failing. The failures fall into four categories:
| Category |
Count |
Issue |
Fix |
| A |
13 |
Velox doesn't support INT→DOUBLE/REAL/DECIMAL widening |
Velox C++ convertType() extension |
| B |
29 |
Exception type mismatch + no Decimal precision check |
Exception translation + C++ precision check |
| C |
31 |
Parquet V2 encoding assertions + Decimal conversion limits |
Disable native writer + test overrides + Velox C++ |
| D |
1 |
parquet-mr only decimal narrowing overflow→null |
Exclude (cannot reproduce with native reader) |
Plan
This will be addressed in 3 PRs:
-
PR 1 — Exception translation: Add translateException() to convert Velox type errors to SchemaColumnConvertNotSupportedException. Enable the suite with appropriate excludes/overrides for tests that pass without C++ changes.
-
PR 2 — SPARK-18108 + Revert OAP: Fix partition column type conflicts. Import upstream Velox PR #15173.
-
PR 3 — Type widening implementation: Velox C++ changes for INT→DOUBLE/REAL/DECIMAL and Decimal→Decimal widening. Requires upstream Velox PR first, then enable remaining tests.
Test Results (Target)
|
Spark 4.0 |
Spark 4.1 |
| ✅ Passed |
46 |
46 |
| 🟢 Override (passed) |
35 |
35 |
| ❌ Excluded |
3 |
3 |
| Total |
84 |
84 |
Sub-issue of #11550.
This issue was written with the assistance of AI.
Labels: enhancement, VELOX
Description
Enable the
GlutenParquetTypeWideningSuitetest suite for Spark 4.0 and 4.1, which validates Parquet type widening support (SPARK-40876).Background
GlutenParquetTypeWideningSuitehas 84 tests covering two types of Parquet type conversions:int32 + INT(8)asTINYINT(safe, writer guarantees value range)IntegerTypedata asLongType,DoubleType, orDecimalType(Spark 4.0 feature)Currently the suite is disabled with 74 out of 84 tests failing. The failures fall into four categories:
convertType()extensionPlan
This will be addressed in 3 PRs:
PR 1 — Exception translation: Add
translateException()to convert Velox type errors toSchemaColumnConvertNotSupportedException. Enable the suite with appropriate excludes/overrides for tests that pass without C++ changes.PR 2 — SPARK-18108 + Revert OAP: Fix partition column type conflicts. Import upstream Velox PR #15173.
PR 3 — Type widening implementation: Velox C++ changes for INT→DOUBLE/REAL/DECIMAL and Decimal→Decimal widening. Requires upstream Velox PR first, then enable remaining tests.
Test Results (Target)
Sub-issue of #11550.
This issue was written with the assistance of AI.