Skip to content

[rust] Port Java's typed Arrow column-vector pattern to scan path #543

@fresh-borzoni

Description

@fresh-borzoni

Search before asking

  • I searched in the issues and found nothing similar.

Description

The Rust scan path skips Java's typed-vector layer (VectorizedColumnBatch / ArrowRowColumnVector). Two consequences:

  • Top-level accessors do column.data_type() + downcast on every call.
  • Nested get_row eagerly materializes the whole struct and copies every leaf string and blob.

Java does the type dispatch once at construction and accessors are zero-copy.

  • typed dispatch: Add a TypedColumn enum + TypedBatch built once per ArrowReader from (RecordBatch, RowType), then ColumnarRow accessors become single-arm matches.
  • lazy nested ROW: with recursive TypedColumn::Row, the materialized nested GenericRow can hold borrowed Cows over Arrow buffers - no leaf copy. We wish keep the current trait + parameterize ColumnarRow with a lifetime most probably.

Willingness to contribute

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions