problem
when java writes manifests with partition field names that violate avro naming rules, it sanitizes them (e.g. 815d3b... → _815d3b...) and stores the original in an iceberg-field-name avro field property. iceberg-rust's avro reader uses the avro field name directly without checking this property, so it returns the sanitized name instead of the original.
this causes field name mismatches when reading java-written manifests — the partition field name in the manifest entry won't match the partition field name in the table's partition spec.
relevant code
crates/iceberg/src/avro/schema.rs — avro_schema_to_schema reads &avro_field.name without checking for iceberg-field-name custom property.
java's read path resolves fields by field-id (integer), but also provides the original name via ICEBERG_FIELD_NAME_PROP:
public static final String ICEBERG_FIELD_NAME_PROP = "iceberg-field-name";
reproduction
- create a table in java with a partition field name starting with a digit
- write data (java produces manifests with sanitized avro field names +
iceberg-field-name property)
- read the manifest in iceberg-rust — field names will be the sanitized form, not the original
expected behavior
when reading avro schemas from manifests, check each field for the iceberg-field-name property. if present, use that as the iceberg field name instead of the avro field name.
notes
- iceberg resolves fields by
field-id in most paths, so the impact is limited to code that uses field names from manifest entries
- this is a well-established convention in the java implementation (since 2019)
- pyiceberg implements its own avro reader that bypasses name validation entirely
problem
when java writes manifests with partition field names that violate avro naming rules, it sanitizes them (e.g.
815d3b...→_815d3b...) and stores the original in aniceberg-field-nameavro field property. iceberg-rust's avro reader uses the avro field name directly without checking this property, so it returns the sanitized name instead of the original.this causes field name mismatches when reading java-written manifests — the partition field name in the manifest entry won't match the partition field name in the table's partition spec.
relevant code
crates/iceberg/src/avro/schema.rs—avro_schema_to_schemareads&avro_field.namewithout checking foriceberg-field-namecustom property.java's read path resolves fields by
field-id(integer), but also provides the original name viaICEBERG_FIELD_NAME_PROP:reproduction
iceberg-field-nameproperty)expected behavior
when reading avro schemas from manifests, check each field for the
iceberg-field-nameproperty. if present, use that as the iceberg field name instead of the avro field name.notes
field-idin most paths, so the impact is limited to code that uses field names from manifest entries