Skip to content

avro schema reader ignores iceberg-field-name property — returns sanitized names from java manifests #2536

@SreeramGarlapati

Description

@SreeramGarlapati

problem

when java writes manifests with partition field names that violate avro naming rules, it sanitizes them (e.g. 815d3b..._815d3b...) and stores the original in an iceberg-field-name avro field property. iceberg-rust's avro reader uses the avro field name directly without checking this property, so it returns the sanitized name instead of the original.

this causes field name mismatches when reading java-written manifests — the partition field name in the manifest entry won't match the partition field name in the table's partition spec.

relevant code

crates/iceberg/src/avro/schema.rsavro_schema_to_schema reads &avro_field.name without checking for iceberg-field-name custom property.

java's read path resolves fields by field-id (integer), but also provides the original name via ICEBERG_FIELD_NAME_PROP:

public static final String ICEBERG_FIELD_NAME_PROP = "iceberg-field-name";

reproduction

  1. create a table in java with a partition field name starting with a digit
  2. write data (java produces manifests with sanitized avro field names + iceberg-field-name property)
  3. read the manifest in iceberg-rust — field names will be the sanitized form, not the original

expected behavior

when reading avro schemas from manifests, check each field for the iceberg-field-name property. if present, use that as the iceberg field name instead of the avro field name.

notes

  • iceberg resolves fields by field-id in most paths, so the impact is limited to code that uses field names from manifest entries
  • this is a well-established convention in the java implementation (since 2019)
  • pyiceberg implements its own avro reader that bypasses name validation entirely

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions