Skip to content

Use streaming ZIP extraction#38

Draft
hagenw wants to merge 1 commit into
mainfrom
get-archive
Draft

Use streaming ZIP extraction#38
hagenw wants to merge 1 commit into
mainfrom
get-archive

Conversation

@hagenw
Copy link
Copy Markdown
Member

@hagenw hagenw commented Jan 13, 2026

Use streaming ZIP extraction from audeering/audbackend#279 to speed up downloading models (with a single worker).

Execution time to load 7289b57d-1.0.0 (4.2 GB).

Before After
0:02:21.247691 0:01:20.056608

Summary by Sourcery

Streamline model archive retrieval by leveraging backend-level streaming extraction instead of downloading and unpacking ZIP files locally.

Enhancements:

  • Replace local ZIP download and extraction with backend_interface.get_archive to support streaming extraction of model archives.

Build:

  • Update audbackend dependency to a git-based stream-extract branch to enable streaming archive support.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Jan 13, 2026

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Switches model archive handling to use audbackend’s streaming ZIP extraction API for performance, and temporarily pins audbackend to a Git branch that provides this functionality.

Sequence diagram for streaming ZIP extraction in get_archive

sequenceDiagram
    actor User
    participant AudmodelBackend as Audmodel_backend.get_archive
    participant BackendInterface as BackendInterface
    participant Storage as Remote_storage
    participant FS as Local_filesystem

    User->>AudmodelBackend: load_model(version, path)
    AudmodelBackend->>BackendInterface: enter backend context
    activate BackendInterface
    BackendInterface->>Storage: open_archive_stream(path, version)
    Storage-->>BackendInterface: ZIP_stream
    BackendInterface->>FS: get_archive(path, tmp_root, version, verbose)
    FS-->>BackendInterface: files_extracted_to_tmp_root
    deactivate BackendInterface
    AudmodelBackend->>FS: move tmp_root to final root
    FS-->>AudmodelBackend: root_ready
    AudmodelBackend-->>User: model_loaded
Loading

File-Level Changes

Change Details Files
Replace local download-then-extract flow with backend-provided streaming archive extraction to a temporary directory.
  • Remove explicit construction of a temporary ZIP path and separate get_file call.
  • Call backend_interface.get_archive with the remote path, temporary root directory, version, and verbosity to both download and extract in one step.
  • Drop the subsequent audeer.extract_archive call and its keep_archive handling, as extraction is now handled by the backend.
audmodel/core/backend.py
Update audbackend dependency to a Git-based reference that includes streaming extraction support.
  • Change audbackend dependency specifier from a version constraint (>=2.2.3) to an editable Git URL pointing to the stream-extract branch so the new get_archive behavior is available.
pyproject.toml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.0%. Comparing base (bf39284) to head (cdbf00c).

Additional details and impacted files
Files with missing lines Coverage Δ
audmodel/core/backend.py 100.0% <100.0%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant