feat: async stream read by ariel-miculas · Pull Request #9632 · apache/arrow-rs

ariel-miculas · 2026-03-31T10:35:35Z

Which issue does this PR close?

Rationale for this change

Implement an async streamed reader for avro, this is similar to how datafusion handles json and csv scanning.
There are two main advantages:

we don't have to read the entire data upfront, which could require a lot of memory
data fetching and decoding can happen in parallel, leading to performance improvements

What changes are included in this PR?

Change the avro reader implementation to use async streams.

Are these changes tested?

Yes, added tests.

Are there any user-facing changes?

Users can now implement get_stream as part of the AsyncFileReader trait

AndreaBozzo

Hi @ariel-miculas

while i'm very in favor of the idea behind this, there a few concerns about the implementation

AndreaBozzo · 2026-03-31T20:26:20Z

arrow-avro/src/reader/async_reader/mod.rs

+    mut state: AvroReaderState<R>,
+) -> impl Stream<Item = Result<RecordBatch, ArrowError>> + Send + 'static {
+    async_stream::try_stream! {
+        if state.meta.range.start >= state.meta.range.end {


try_stream! is sequential? There's no actual prefetching or concurrent work happening, did i miss something?

This implementation is entirely pull-based. The reader API exposes the stream, which can be driven by a dedicated async task (with the granularity of a record batch at a time) if beneficial to the application.

There is a comment about not being able to use the inner spawn helper because of other constraints on the trait design.

(Responding because I had an input on this change.)

In my end-to-end benchmark, prefetching is done by the object-store implementation, specifically in the call to get_opts

arrow-avro/Cargo.toml

ariel-miculas · 2026-04-06T13:44:42Z

@jecsand838 can you please take a look at this?

alamb · 2026-04-06T19:51:50Z

Can we also please file a ticket explaining this need / let us discuss this feature a bit?

Note we have also been trying to separate the IO from the decoding in Parquet -- see https://docs.rs/parquet/58.0.0/parquet/arrow/push_decoder/struct.ParquetPushDecoder.html

Perhaps we could move Avro to that model too rather than implementing the async stuff first

feat: async stream read

e47f6da

github-actions bot added arrow Changes to the arrow crate arrow-avro arrow-avro crate labels Mar 31, 2026

AndreaBozzo suggested changes Mar 31, 2026

View reviewed changes

ariel-miculas added 2 commits April 1, 2026 17:16

fix: feature-gate the async imports

76e2bcf

fix: compilation failures with object-store

d7febee

ariel-miculas mentioned this pull request Apr 6, 2026

Async avro reader optimization #9668

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: async stream read#9632

feat: async stream read#9632
ariel-miculas wants to merge 3 commits intoapache:mainfrom
ariel-miculas:async-stream-reader

ariel-miculas commented Mar 31, 2026

Uh oh!

AndreaBozzo left a comment

Uh oh!

AndreaBozzo Mar 31, 2026 •

edited

Loading

Uh oh!

mzabaluev Apr 1, 2026

Uh oh!

ariel-miculas Apr 1, 2026

Uh oh!

Uh oh!

ariel-miculas commented Apr 6, 2026

Uh oh!

alamb commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ariel-miculas commented Mar 31, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

AndreaBozzo left a comment

Choose a reason for hiding this comment

Uh oh!

AndreaBozzo Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mzabaluev Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

ariel-miculas Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ariel-miculas commented Apr 6, 2026

Uh oh!

alamb commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AndreaBozzo Mar 31, 2026 •

edited

Loading