duckdbtest.zip
What happens?
Title
Regression in 1.3.0+: union_by_name fails with "Can't change source type (NULL) to target type (VARCHAR[])" when reading parquet files with mixed NULL/LIST types
DuckDB Version
- Working version: 1.2.2
- Broken versions: 1.3.0, 1.3.1 (and later)
Environment
- OS: Linux
- Python: 3.12.9
- pandas: (latest)
Description
Starting with DuckDB 1.3.0, reading multiple parquet files with union_by_name=True fails when:
- Some parquet files have a column stored as NULL type (because all values are null in that file)
- Other parquet files have the same column properly typed as VARCHAR[] (array/list of strings)
This worked correctly in DuckDB 1.2.2 but now throws:
BinderException: Binder Error: Can't change source type ("NULL") to target type (VARCHAR[]), type conversion not allowed
Expected Behavior
When union_by_name=True is set, DuckDB should merge schemas gracefully, treating NULL-typed columns as compatible with any target type (similar to how pandas handles this).
Actual Behavior
DuckDB 1.3.0+ throws a BinderException and refuses to read the files, even though union_by_name=True is explicitly designed to handle schema variations across multiple files.
Root Cause Analysis
Investigation shows:
- When a parquet file has ALL NULL values for a column, it's stored with NULL type (e.g.,
INT32 with NullType() logical type)
- Other files with actual data store the same column as
BYTE_ARRAY with StringType() or complex types like ListType()
- The error specifically mentions
VARCHAR[] (array type) suggesting it happens with nested/complex types
- This regression appeared between versions 1.2.2 and 1.3.0
How to Reproduce
attached files to test see duckdbtest.zip
import duckdb
print(f"DuckDB version: {duckdb.__version__}")
# Fails with 1.3.0+
try:
result = duckdb.read_parquet(
"duckdb_bug_test_files/*.parquet",
union_by_name=True
).df()
print(f"SUCCESS: Read {len(result)} rows")
except Exception as e:
print(f"FAILED: {type(e).__name__}: {e}")
To Reproduce
this is only in python SDK
OS:
Linux x86
DuckDB Version:
v1.2.2, v1.3.0 and later
DuckDB Client:
Python
Hardware:
No response
Full Name:
Zack Dai
Affiliation:
Zack Dai
Did you include all relevant configuration (e.g., CPU architecture, Linux distribution) to reproduce the issue?
Did you include all code required to reproduce the issue?
Did you include all relevant data sets for reproducing the issue?
Yes
duckdbtest.zip
What happens?
Title
Regression in 1.3.0+:
union_by_namefails with "Can't change source type (NULL) to target type (VARCHAR[])" when reading parquet files with mixed NULL/LIST typesDuckDB Version
Environment
Description
Starting with DuckDB 1.3.0, reading multiple parquet files with
union_by_name=Truefails when:This worked correctly in DuckDB 1.2.2 but now throws:
Expected Behavior
When
union_by_name=Trueis set, DuckDB should merge schemas gracefully, treating NULL-typed columns as compatible with any target type (similar to how pandas handles this).Actual Behavior
DuckDB 1.3.0+ throws a
BinderExceptionand refuses to read the files, even thoughunion_by_name=Trueis explicitly designed to handle schema variations across multiple files.Root Cause Analysis
Investigation shows:
INT32withNullType()logical type)BYTE_ARRAYwithStringType()or complex types likeListType()VARCHAR[](array type) suggesting it happens with nested/complex typesHow to Reproduce
attached files to test see duckdbtest.zip
To Reproduce
this is only in python SDK
OS:
Linux x86
DuckDB Version:
v1.2.2, v1.3.0 and later
DuckDB Client:
Python
Hardware:
No response
Full Name:
Zack Dai
Affiliation:
Zack Dai
Did you include all relevant configuration (e.g., CPU architecture, Linux distribution) to reproduce the issue?
Did you include all code required to reproduce the issue?
Did you include all relevant data sets for reproducing the issue?
Yes