correct download labels — show output dir, rename Skipped → Cached by nl917 · Pull Request #1038 · Kaggle/kaggle-cli

nl917 · 2026-05-29T20:39:22Z

Summary

Fixes two cosmetic-but-misleading labels in kaggle b t download output that
described the function's internal control flow rather than the on-disk state
the user actually sees.

Bug A: `File` column shows a `.zip` path that doesn't exist on disk

Before:
Model File Size
Progress
────────────────────── ──────────────────────────────────────── ──────────
──────────
gemini-3-flash-preview gemini-3-flash-preview/271385/271385.zip 1.06KB
Done

After:
Model File Size
Progress
────────────────────── ──────────────────────────────────────── ──────────
──────────
gemini-3-flash-preview gemini-3-flash-preview/271385/ 1.06KB
Done

The .zip is the intermediate download archive that gets extracted and
then
removed (os.remove(zipfile_path) after zf.extractall). Showing it in the
output table sends users ls'ing for a file that isn't there. Now showing the
extracted directory that actually survives.

Bug B: re-runs show `Skipped`, suggesting the data isn't available

Before:
gemini-3-flash-preview gemini-3-flash-preview/271385/ 1.06KB
Skipped
Done: 1 run(s) skipped.

After:
gemini-3-flash-preview gemini-3-flash-preview/271385/ 1.06KB
Cached
Done: 1 run(s) cached.

When a run's output directory already exists and --force isn't passed, we
short-circuit the download (the right behavior). But Skipped reads as "the
download didn't happen, so I have nothing" — when in reality the data is
on disk from a previous run, ready to use. Cached describes the user-visible
outcome (the data is locally cached, no fetch needed) instead of leaking the
function's internal control-flow language.

Files changed

src/kaggle/api/kaggle_api_extended.py — 4 small edits in
benchmarks_tasks_download_cli: display_files list, the skip branch,
counter init, summary line. Added 2 short comments explaining the rationale.
src/kaggle/test/test_benchmarks_cli.py —
test_download_skips_existing_output and test_download_summary_counts
updated to assert the new wording and pin the new behavior (asserts the .zip
path is not in the output, and the directory path is).

Test plan

pytest src/kaggle/test/test_benchmarks_cli.py::TestDownload → 21/21
pass
Live: kaggle b t download <task> shows directory path in File
column, no .zip
Live: kaggle b t download <task> (re-run without -f) shows Cached
row and Done: 1 run(s) cached. summary
Live: kaggle b t download <task> -f (force re-download) shows Done
row and Done: 1 run(s) downloaded. summary
Other pre-existing test failures on this branch are unrelated to this PR
(verified by running the suite on the unmodified branch base)

nl917 · 2026-05-29T20:40:58Z

dolaameng · 2026-05-29T21:36:50Z

-            parts = [f"{n} run(s) {label}" for n, label in ((downloaded, "downloaded"), (skipped, "skipped")) if n]
+            # Summary — counts are runs (one run may include multiple files, e.g. with --include-source).
+            # "Cached" = already on disk from a previous download; the data is available without a fetch.
+            parts = [f"{n} run(s) {label}" for n, label in ((downloaded, "downloaded"), (cached, "cached")) if n]


skipped -> cached. leave to @simsryan-google for a decision

dolaameng · 2026-05-29T21:39:40Z

In line 7586 we use _dir_size and here we use zip file size, any reason why?

Good catch! This is an inconsistency and will be fixed!

dolaameng · 2026-05-29T21:41:59Z

-            display_files = [
-                f"{self._normalize_model_slug(r.model_version_slug)}/{r.id}/{r.id}.zip" for r in downloadable
-            ]
+            # Show the extracted output directory (what is actually left on disk after


I don't think we need to comment on "why" here, and the "what" part is also obvious. Suggested removed

dolaameng

LGTM. Thanks for the fix!

correct download labels — show output dir, rename Skipped → Cached

9d1d059

nl917 requested a review from dolaameng May 29, 2026 20:41

dolaameng reviewed May 29, 2026

View reviewed changes

dolaameng requested review from rosbo and stevemessick May 29, 2026 21:42

dolaameng mentioned this pull request May 29, 2026

Fix(benchmarks): hint when cached download lacks source notebooks #1039

Open

address comments

0401e09

nl917 requested a review from dolaameng May 29, 2026 22:32

dolaameng approved these changes May 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

correct download labels — show output dir, rename Skipped → Cached#1038

correct download labels — show output dir, rename Skipped → Cached#1038
nl917 wants to merge 2 commits into
mainfrom
download-fix

nl917 commented May 29, 2026 •

edited

Loading

Uh oh!

nl917 commented May 29, 2026

Uh oh!

dolaameng May 29, 2026

Uh oh!

dolaameng May 29, 2026

Uh oh!

nl917 May 29, 2026

Uh oh!

Uh oh!

dolaameng May 29, 2026

Uh oh!

nl917 May 29, 2026

Uh oh!

Uh oh!

dolaameng left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nl917 commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bug A: File column shows a .zip path that doesn't exist on disk

Bug B: re-runs show Skipped, suggesting the data isn't available

Files changed

Test plan

Uh oh!

nl917 commented May 29, 2026

Uh oh!

dolaameng May 29, 2026

Choose a reason for hiding this comment

Uh oh!

dolaameng May 29, 2026

Choose a reason for hiding this comment

Uh oh!

nl917 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dolaameng May 29, 2026

Choose a reason for hiding this comment

Uh oh!

nl917 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dolaameng left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nl917 commented May 29, 2026 •

edited

Loading

Bug A: `File` column shows a `.zip` path that doesn't exist on disk

Bug B: re-runs show `Skipped`, suggesting the data isn't available