Note we have also been trying to separate the IO from the decoding in Parquet -- see https://docs.rs/parquet/58.0.0/parquet/arrow/push_decoder/struct.ParquetPushDecoder.html
Perhaps we could move Avro to that model too rather than implementing the async stuff first
Originally posted by @alamb in #9632 (comment)
Issue:
The async avro reader reads all the data upfront, even though the avro file format is serial, thus decoding and data fetching could happen in parallel (like datafusion's json scan, for example).
One potential solution: use an async stream, as presented in #9632
Note we have also been trying to separate the IO from the decoding in Parquet -- see https://docs.rs/parquet/58.0.0/parquet/arrow/push_decoder/struct.ParquetPushDecoder.html
Perhaps we could move Avro to that model too rather than implementing the async stuff first
Originally posted by @alamb in #9632 (comment)
Issue:
The async avro reader reads all the data upfront, even though the avro file format is serial, thus decoding and data fetching could happen in parallel (like datafusion's json scan, for example).
One potential solution: use an async stream, as presented in #9632