Skip to content

[repo-monitor] Medium: Unbounded memory accumulation in fetch_dataset_items — OOM on large datasets #6

@Liohtml

Description

@Liohtml

Summary

fetch_dataset_items accumulates all pages into a single Vec<T> with no upper bound, causing OOM for large Apify datasets with millions of items.

Location

  • File: src/lib.rs
  • Line(s): 274–301

Severity

Medium

Details

let mut out: Vec<T> = Vec::new();
loop {
    let chunk: Vec<T> = resp.json().await?;
    out.extend(chunk); // unbounded growth
    if n < limit { break; }
}

For large datasets this consumes all available memory. There is no way to specify a maximum item count or process items as a stream.

Suggested Fix

Add a max_items parameter:

pub async fn fetch_dataset_items_limited<T: DeserializeOwned>(
    &self, max_items: usize
) -> Result<Vec<T>> {
    // break when out.len() >= max_items
}

Or provide a streaming/async-iterator interface.


Automated finding by repo-monitor

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions