Skip to content

Async avro reader optimization #9668

@ariel-miculas

Description

@ariel-miculas

Note we have also been trying to separate the IO from the decoding in Parquet -- see https://docs.rs/parquet/58.0.0/parquet/arrow/push_decoder/struct.ParquetPushDecoder.html

Perhaps we could move Avro to that model too rather than implementing the async stuff first

Originally posted by @alamb in #9632 (comment)

Issue:
The async avro reader reads all the data upfront, even though the avro file format is serial, thus decoding and data fetching could happen in parallel (like datafusion's json scan, for example).

One potential solution: use an async stream, as presented in #9632

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions