Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@ ARG PYTHON_VERSION

FROM python:${PYTHON_VERSION}

RUN pip install hatch==1.14.0
# virtualenv 21 moves some functions that 1.14 hatch depended on. For now, just
# pin to 20.X so hatch env continues working.
RUN pip install hatch==1.14.0 "virtualenv<21"

# Add only the minimal files required to be able to pre-create the hatch environments.
# If any of these files changes, a new Docker build is necessary. This is why we need
Expand Down
4 changes: 2 additions & 2 deletions docs/datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ Use this command to update an existing dataset with new files or metadata change

## `kaggle datasets metadata`

Downloads metadata for a dataset or updates existing local metadata.
Downloads metadata for a dataset or updates existing from local metadata.

**Usage:**

Expand All @@ -238,7 +238,7 @@ kaggle datasets metadata <DATASET> [options]
**Options:**

* `-p, --path <PATH>`: Directory to download/update metadata file (`dataset-metadata.json`). Defaults to current working directory.
* `--update`: Update existing local metadata file instead of downloading anew.
* `--update`: Update the existing dataset version's metadata using the contents of the local metadata JSON file. (e.g. "push" from local)

**Example:**

Expand Down
18 changes: 17 additions & 1 deletion docs/datasets_metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,9 @@ Here's an example containing file metadata:
"keywords": [
"beginner",
"tutorial"
]
],
"expectedUpdateFrequency": "monthly",
"userSpecifiedSources": "World Bank and OECD ([link](http://data.worldbank.org/indicator/NY.GDP.MKTP.CD))",
Copy link
Copy Markdown
Contributor Author

@jmasukawa jmasukawa Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the internal field name. open to better alternatives.

I had wanted to just call it sources, but given we follow the Data Package format as much as possible, it conflicts with their "sources". which are objects that have a Title (required) and Path (optional): https://specs.frictionlessdata.io//data-package/#sources

However, DatasetVersion.UserSpecifiedSources is a markdown string, so that's not feasible to turn into an array of objects and back, without some heavy assumptions that could be wrong (comma separation, etc).

It seemed like too much work atm to refactor how we store UserSpecifiedSources to use a collection of objects just for this, so i went with a name that was significantly deviated from the Data Package spec to make it clear it's something else.

Open to other suggestions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goeffthomas for your thoughts here, given you mentioned Data Package adherence

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for laying all of that out. +1 to your assessment and naming. I think my earlier comment about adhering and using this property was before knowing how they recommend extending. So yeah, if we want something that more matches how we let users specify the sources, this all makes sense to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! SG, i'll continue on this way to match our UI

}
```

Expand Down Expand Up @@ -156,3 +158,17 @@ You can specify the following data types
* `integer`
* `decimal`
* `city`

## Expected update frequencies
You can specify the following values for `expectedUpdateFrequency`:
* `not specified`
* `never`
* `annually`
* `quarterly`
* `monthly`
* `weekly`
* `daily`
* `hourly`

## Sources
You can report your dataset sources in a markdown string for `userSpecifiedSources`. Most basic markdown features are supported.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ keywords = ["Kaggle", "API"]
requires-python = ">= 3.11"
dependencies = [
"bleach",
"kagglesdk >= 0.1.16, < 1.0", # sync with kagglehub
"kagglesdk >= 0.1.17, < 1.0", # sync with kagglehub
"python-slugify",
"requests",
"python-dateutil",
Expand Down
2 changes: 1 addition & 1 deletion src/kaggle/api/kaggle_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -338,7 +338,7 @@ def kernel_push(self, kernel_push_request): # noqa: E501
"""
with tempfile.TemporaryDirectory() as tmpdir:
meta_file = os.path.join(tmpdir, "kernel-metadata.json")
(fd, code_file) = tempfile.mkstemp("code", "py", tmpdir, text=True)
fd, code_file = tempfile.mkstemp("code", "py", tmpdir, text=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

j/w - did you do these formatting changes manually? I know that @stevemessick has some autoformatting set up in this repo, so just want to make sure we're not messing with that.

Copy link
Copy Markdown
Contributor Author

@jmasukawa jmasukawa Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, this is from the tooling. it came from ./docker-hatch run lint:fmt which should be the correct tool path.

something i did notice is that if someone else uses hatch run lint:fmt then it will use their machine's python version, which might format differently.

i used the docker one because i assumed it was the way we keep it consistent. but, the docker python version is a bit outdated. so i'm guessing what happened is someone used the linter w/local python instead of docker one time, and it slipped in.

fd.write(json.dumps(kernel_push_request.code))
os.close(fd)
with open(meta_file, "w") as f:
Expand Down
21 changes: 14 additions & 7 deletions src/kaggle/api/kaggle_api_extended.py
Original file line number Diff line number Diff line change
Expand Up @@ -721,7 +721,7 @@ def _authenticate_with_legacy_apikey(self) -> bool:
return True

def _authenticate_with_access_token(self):
(access_token, source) = get_access_token_from_env()
access_token, source = get_access_token_from_env()
if not access_token:
return False

Expand Down Expand Up @@ -1901,7 +1901,7 @@ def dataset_metadata_update(self, dataset, path):
dataset: The dataset to update.
path: The path to the metadata file.
"""
(owner_slug, dataset_slug, effective_path) = self.dataset_metadata_prep(dataset, path)
owner_slug, dataset_slug, effective_path = self.dataset_metadata_prep(dataset, path)
meta_file = self.get_dataset_metadata_file(effective_path)
with open(meta_file, "r") as f:
metadata = json.load(f)
Expand All @@ -1921,14 +1921,21 @@ def dataset_metadata_update(self, dataset, path):
else []
)
update_settings.data = metadata.get("data")
# This *should* be a list of sources, but we store them as a single string in dataset version metadata,
# so we treat it as a different / special property than Data Package's "sources" for now:
# https://specs.frictionlessdata.io//data-package/#sources
update_settings.user_specified_sources = metadata.get("userSpecifiedSources") or ""
expected_update_frequency = metadata.get("expectedUpdateFrequency")
if expected_update_frequency:
update_settings.expected_update_frequency = expected_update_frequency
request = ApiUpdateDatasetMetadataRequest()
request.owner_slug = owner_slug
request.dataset_slug = dataset_slug
request.settings = update_settings
with self.build_kaggle_client() as kaggle:
response = kaggle.datasets.dataset_api_client.update_dataset_metadata(request)
if len(response.errors) > 0:
[print(e["message"]) for e in response.errors]
[print(error_message) for error_message in response.errors]
exit(1)

@staticmethod
Expand All @@ -1954,7 +1961,7 @@ def dataset_metadata(self, dataset, path):
Returns:
The path to the downloaded metadata file.
"""
(owner_slug, dataset_slug, effective_path) = self.dataset_metadata_prep(dataset, path)
owner_slug, dataset_slug, effective_path = self.dataset_metadata_prep(dataset, path)

if not os.path.exists(effective_path):
os.makedirs(effective_path)
Expand Down Expand Up @@ -3433,7 +3440,7 @@ def kernels_output(
token = response.next_page_token

outfiles = []
for item in (response.files or []):
for item in response.files or []:
if compiled_pattern and not compiled_pattern.search(item.file_name):
continue

Expand Down Expand Up @@ -3473,7 +3480,7 @@ def kernels_output_cli(self, kernel, kernel_opt=None, path=None, force=False, qu
file_pattern: Regex pattern to match against filenames. Only files matching the pattern will be downloaded.
"""
kernel = kernel or kernel_opt
(_, token) = self.kernels_output(kernel, path, file_pattern, force, quiet)
_, token = self.kernels_output(kernel, path, file_pattern, force, quiet)
if token:
print(f"Next page token: {token}")

Expand Down Expand Up @@ -4609,7 +4616,7 @@ def files_upload_cli(self, local_paths, inbox_path, no_resume, no_compress):
files_to_create = []
with ResumableUploadContext(no_resume) as upload_context:
for local_path in local_paths:
(upload_file, file_name) = self.file_upload_cli(local_path, inbox_path, no_compress, upload_context)
upload_file, file_name = self.file_upload_cli(local_path, inbox_path, no_compress, upload_context)
if upload_file is None:
continue

Expand Down
4 changes: 1 addition & 3 deletions src/kaggle/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -1344,9 +1344,7 @@ class Help(object):
command_model_instances_update = "Update a model variation"

# Model Instance Versions params
param_model_instance_version = (
"Model variation version URL suffix in format <owner>/<model-name>/<framework>/<variation-slug>/<version-number>"
)
param_model_instance_version = "Model variation version URL suffix in format <owner>/<model-name>/<framework>/<variation-slug>/<version-number>"

# Model Instance Versions params
command_model_instance_versions_new = "Create a new model variation version"
Expand Down
Loading