Skip to content

correct download labels — show output dir, rename Skipped → Cached#1038

Open
nl917 wants to merge 2 commits into
mainfrom
download-fix
Open

correct download labels — show output dir, rename Skipped → Cached#1038
nl917 wants to merge 2 commits into
mainfrom
download-fix

Conversation

@nl917
Copy link
Copy Markdown
Contributor

@nl917 nl917 commented May 29, 2026

Summary

Fixes two cosmetic-but-misleading labels in kaggle b t download output that
described the function's internal control flow rather than the on-disk state
the user actually sees.

Bug A: File column shows a .zip path that doesn't exist on disk

Before:
Model File Size
Progress
────────────────────── ──────────────────────────────────────── ──────────
──────────
gemini-3-flash-preview gemini-3-flash-preview/271385/271385.zip 1.06KB
Done

After:
Model File Size
Progress
────────────────────── ──────────────────────────────────────── ──────────
──────────
gemini-3-flash-preview gemini-3-flash-preview/271385/ 1.06KB
Done

The .zip is the intermediate download archive that gets extracted and
then
removed (os.remove(zipfile_path) after zf.extractall). Showing it in the
output table sends users ls'ing for a file that isn't there. Now showing the
extracted directory that actually survives.

Bug B: re-runs show Skipped, suggesting the data isn't available

Before:
gemini-3-flash-preview gemini-3-flash-preview/271385/ 1.06KB
Skipped
Done: 1 run(s) skipped.

After:
gemini-3-flash-preview gemini-3-flash-preview/271385/ 1.06KB
Cached
Done: 1 run(s) cached.

When a run's output directory already exists and --force isn't passed, we
short-circuit the download (the right behavior). But Skipped reads as "the
download didn't happen, so I have nothing" — when in reality the data is
on disk from a previous run, ready to use. Cached describes the user-visible
outcome (the data is locally cached, no fetch needed) instead of leaking the
function's internal control-flow language.

Files changed

  • src/kaggle/api/kaggle_api_extended.py — 4 small edits in
    benchmarks_tasks_download_cli: display_files list, the skip branch,
    counter init, summary line. Added 2 short comments explaining the rationale.
  • src/kaggle/test/test_benchmarks_cli.py
    test_download_skips_existing_output and test_download_summary_counts
    updated to assert the new wording and pin the new behavior (asserts the .zip
    path is not in the output, and the directory path is).

Test plan

  • pytest src/kaggle/test/test_benchmarks_cli.py::TestDownload → 21/21
    pass
  • Live: kaggle b t download <task> shows directory path in File
    column, no .zip
  • Live: kaggle b t download <task> (re-run without -f) shows Cached
    row and Done: 1 run(s) cached. summary
  • Live: kaggle b t download <task> -f (force re-download) shows Done
    row and Done: 1 run(s) downloaded. summary
  • Other pre-existing test failures on this branch are unrelated to this PR
    (verified by running the suite on the unmodified branch base)

@nl917
Copy link
Copy Markdown
Contributor Author

nl917 commented May 29, 2026

Screenshot 2026-05-29 at 1 40 24 PM

@nl917 nl917 requested a review from dolaameng May 29, 2026 20:41
parts = [f"{n} run(s) {label}" for n, label in ((downloaded, "downloaded"), (skipped, "skipped")) if n]
# Summary — counts are runs (one run may include multiple files, e.g. with --include-source).
# "Cached" = already on disk from a previous download; the data is available without a fetch.
parts = [f"{n} run(s) {label}" for n, label in ((downloaded, "downloaded"), (cached, "cached")) if n]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skipped -> cached. leave to @simsryan-google for a decision

Comment thread src/kaggle/api/kaggle_api_extended.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In line 7586 we use _dir_size and here we use zip file size, any reason why?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! This is an inconsistency and will be fixed!

Comment thread src/kaggle/test/test_benchmarks_cli.py
Comment thread src/kaggle/api/kaggle_api_extended.py Outdated
display_files = [
f"{self._normalize_model_slug(r.model_version_slug)}/{r.id}/{r.id}.zip" for r in downloadable
]
# Show the extracted output directory (what is actually left on disk after
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to comment on "why" here, and the "what" part is also obvious. Suggested removed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread src/kaggle/api/kaggle_api_extended.py Outdated
@nl917 nl917 requested a review from dolaameng May 29, 2026 22:32
Copy link
Copy Markdown
Contributor

@dolaameng dolaameng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants