-
Notifications
You must be signed in to change notification settings - Fork 10
Finalize 256 #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finalize 256 #36
Conversation
📝 WalkthroughWalkthroughAdds API-key support ( Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI
participant Client
participant DatabusAPI
User->>CLI: download --vault-token=file --databus-key=KEY ...
CLI->>Client: download(vault_token=file, databus_key=KEY, ...)
Client->>DatabusAPI: GET resource
alt 401 with WWW-Authenticate: Bearer
DatabusAPI-->>Client: 401 + WWW-Authenticate: Bearer
Client->>Client: read Vault token from file
Client->>DatabusAPI: GET resource (Authorization: Bearer <token>)
DatabusAPI-->>Client: 200 OK
else 401 without Bearer
DatabusAPI-->>Client: 401
Client->>DatabusAPI: GET resource (X-API-KEY: KEY)
DatabusAPI-->>Client: 200 OK
end
Client-->>CLI: files saved / download complete
CLI-->>User: exit
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
databusclient/client.py (1)
791-823: Critical: databus_key not propagated to metadata and collection retrieval functions.The
databus_keyparameter is not being passed to__handle_databus_collection__and__get_json_ld_from_databus__calls, which means API-key-protected collections and artifact metadata cannot be accessed even when the key is provided.Apply this diff to fix all missing databus_key parameters:
# databus collection if "/collections/" in databusURI: # TODO "in" is not safe! there could be an artifact named collections, need to check for the correct part position in the URI - query = __handle_databus_collection__(databusURI) + query = __handle_databus_collection__(databusURI, databus_key) res = __handle_databus_file_query__(endpoint, query) __download_list__(res, localDir, vault_token_file=token, databus_key=databus_key, auth_url=auth_url, client_id=client_id) # databus file elif file is not None: __download_list__([databusURI], localDir, vault_token_file=token, databus_key=databus_key, auth_url=auth_url, client_id=client_id) # databus artifact version elif version is not None: - json_str = __get_json_ld_from_databus__(databusURI) + json_str = __get_json_ld_from_databus__(databusURI, databus_key) res = __handle_databus_artifact_version__(json_str) __download_list__(res, localDir, vault_token_file=token, databus_key=databus_key, auth_url=auth_url, client_id=client_id) # databus artifact elif artifact is not None: - json_str = __get_json_ld_from_databus__(databusURI) + json_str = __get_json_ld_from_databus__(databusURI, databus_key) latest = __get_databus_latest_version_of_artifact__(json_str) print(f"No version given, using latest version: {latest}") - json_str = __get_json_ld_from_databus__(latest) + json_str = __get_json_ld_from_databus__(latest, databus_key) res = __handle_databus_artifact_version__(json_str) __download_list__(res, localDir, vault_token_file=token, databus_key=databus_key, auth_url=auth_url, client_id=client_id) # databus group elif group is not None: - json_str = __get_json_ld_from_databus__(databusURI) + json_str = __get_json_ld_from_databus__(databusURI, databus_key) artifacts = __get_databus_artifacts_of_group__(json_str) for artifact_uri in artifacts: print(f"Processing artifact {artifact_uri}") - json_str = __get_json_ld_from_databus__(artifact_uri) + json_str = __get_json_ld_from_databus__(artifact_uri, databus_key) latest = __get_databus_latest_version_of_artifact__(json_str) print(f"No version given, using latest version: {latest}") - json_str = __get_json_ld_from_databus__(latest) + json_str = __get_json_ld_from_databus__(latest, databus_key) res = __handle_databus_artifact_version__(json_str) __download_list__(res, localDir, vault_token_file=token, databus_key=databus_key, auth_url=auth_url, client_id=client_id)
🧹 Nitpick comments (1)
databusclient/client.py (1)
717-748: Function signatures properly extended with databus_key parameter.The propagation of
databus_keythrough__handle_databus_collection__,__get_json_ld_from_databus__, and__download_list__is correctly implemented, with API key headers properly set when provided.For improved type safety, consider using explicit
OptionalorT | Nonesyntax for parameters that can be None, as flagged by static analysis (RUF013).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
README.md(7 hunks)databusclient/cli.py(1 hunks)databusclient/client.py(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
databusclient/cli.py (1)
databusclient/client.py (1)
download(759-839)
🪛 LanguageTool
README.md
[style] ~157-~157: To form a complete sentence, be sure to include a subject or ‘there’.
Context: ...LOADTARGET ``` - $DOWNLOADTARGET - Can be any Databus URI including collection...
(MISSING_IT_THERE)
🪛 markdownlint-cli2 (0.18.1)
README.md
69-69: Bare URL used
(MD034, no-bare-urls)
70-70: Bare URL used
(MD034, no-bare-urls)
86-86: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🪛 Ruff (0.14.5)
databusclient/client.py
543-543: Avoid specifying long messages outside the exception class
(TRY003)
546-546: Probable use of requests call without timeout
(S113)
717-717: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
722-722: Probable use of requests call without timeout
(S113)
725-725: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
729-729: Probable use of requests call without timeout
(S113)
734-734: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
735-735: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
736-736: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
737-737: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
741-741: Unpacked variable host is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (3)
README.md (1)
1-476: Comprehensive documentation expansion looks great!The documentation is well-structured and covers all three deployment modes, CLI usage, and module usage effectively. The examples are clear and align with the code changes introducing API key authentication.
databusclient/cli.py (1)
97-113: Authentication parameter updates look correct.The renaming of
--tokento--vault-tokenand addition of--databus-keyproperly supports both Vault-based and API-key-based authentication workflows. The parameters are correctly passed toclient.download.databusclient/client.py (1)
526-546: Authentication logic correctly handles both Vault and API key flows.The implementation properly distinguishes between:
- 401 with Bearer challenge → Vault token authentication
- 401 without Bearer challenge → API key authentication
This dual-path approach aligns with the PR objectives.
However, consider adding a timeout to the requests call on line 546 to prevent hanging on slow or unresponsive servers:
- response = requests.get(url, headers=headers, stream=True) + response = requests.get(url, headers=headers, stream=True, timeout=30)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
databusclient/client.py (1)
759-839: Adddatabus_keyto function docstring and pass it to metadata endpoint calls.The
databus_keyparameter is present in the function signature and correctly passed to__download_list__calls. However:
Docstring:
databus_keyis missing from the parameter documentation. Add:databus_key: API key for accessing protected databus endpointsMetadata endpoints: Both
__handle_databus_collection__and__get_json_ld_from_databus__already support and use thedatabus_keyparameter (adding it to theX-API-KEYheader), butdownload()does not pass it to these calls. If your deployment protects collection/JSON-LD URLs with API keys, update these calls to includedatabus_key=databus_key.
♻️ Duplicate comments (2)
databusclient/client.py (1)
567-570: Content-Length mismatch should fail by default (duplicate of earlier concern).Previously, a mismatch between downloaded size and
Content-Lengthraised an error; now it only prints a warning and continues. This was called out in an earlier review and acknowledged as intentional but risky: corrupt or truncated downloads will silently appear “successful”.Please either:
- Restore strict failure by default, or
- Make the lenient behavior opt-in (e.g. flag/env var/parameter, possibly restricted to known-problematic hosts), and keep failure as the default.
A minimal strict version could look like:
- # TODO: could be a problem of github raw / openflaas - if total_size_in_bytes != 0 and progress_bar.n != total_size_in_bytes: - # raise IOError("Downloaded size does not match Content-Length header") - print(f"Warning: Downloaded size does not match Content-Length header:\nExpected {total_size_in_bytes}, got {progress_bar.n}") + if total_size_in_bytes != 0 and progress_bar.n != total_size_in_bytes: + raise IOError( + f"Downloaded size does not match Content-Length header: " + f"expected {total_size_in_bytes}, got {progress_bar.n}" + )README.md (1)
178-191: Fix typo in--databus-keyhelp text (duplicate).The description still says “Databus API key to donwload from protected databus”. Please fix the spelling:
- --databus-key TEXT Databus API key to donwload from protected databus + --databus-key TEXT Databus API key to download from protected databus
🧹 Nitpick comments (3)
databusclient/client.py (3)
494-547: Auth retry flow looks sound; add request timeouts for robustness.The new API-key retry branch and Vault/Bearer handling are consistent and correctly separated by the
WWW-Authenticateheader and 401 status. However, all three network calls in this function (requests.head, initialrequests.get, and the token/API-key retryrequests.get) are made without atimeout, which means a hung TCP connection can block the caller indefinitely.Consider introducing a module-level default (e.g.
REQUEST_TIMEOUT = 30) and passing it explicitly to eachrequests.*call in this function to avoid indefinite hangs, especially since this is on the main download path.
717-729: API-key headers wired in, but tighten typing and HTTP handling.Good: both
__handle_databus_collection__and__get_json_ld_from_databus__now acceptdatabus_keyand addX-API-KEYwhen provided.Two improvements:
- Typing: to satisfy Ruff and PEP 484, make the Optional explicit, e.g.:
-from typing import List, Dict, Tuple, Optional, Union +from typing import List, Dict, Tuple, Optional, Union ... -def __handle_databus_collection__(uri: str, databus_key: str = None) -> str: +def __handle_databus_collection__(uri: str, databus_key: Optional[str] = None) -> str: ... -def __get_json_ld_from_databus__(uri: str, databus_key: str = None) -> str: +def __get_json_ld_from_databus__(uri: str, databus_key: Optional[str] = None) -> str:
- HTTP robustness: both helpers call
requests.get(...).textwithouttimeoutorraise_for_status(). For protected or flaky endpoints, this can hang or return HTML error pages that are then treated as SPARQL/JSON-LD.Consider:
- return requests.get(uri, headers=headers).text + resp = requests.get(uri, headers=headers, timeout=REQUEST_TIMEOUT) + resp.raise_for_status() + return resp.text(and similarly for the JSON-LD helper).
732-749: Propagating databus_key looks correct; clean up typing and unused variable.The threading of
databus_keythrough__download_list__into__download_file__is consistent and keeps the local-directory behavior intact.Minor follow-ups:
- Explicit Optional types to satisfy Ruff:
-def __download_list__(urls: List[str], - localDir: str, - vault_token_file: str = None, - databus_key: str = None, - auth_url: str = None, - client_id: str = None) -> None: +def __download_list__( + urls: List[str], + localDir: Optional[str], + vault_token_file: Optional[str] = None, + databus_key: Optional[str] = None, + auth_url: Optional[str] = None, + client_id: Optional[str] = None, +) -> None:
hostfrom__get_databus_id_parts__is never used; rename to_host(or_) to silence the unused-variable warning:- host, account, group, artifact, version, file = __get_databus_id_parts__(url) + _host, account, group, artifact, version, file = __get_databus_id_parts__(url)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
README.md(7 hunks)databusclient/client.py(9 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md
[style] ~157-~157: To form a complete sentence, be sure to include a subject or ‘there’.
Context: ...LOADTARGET ``` - $DOWNLOADTARGET - Can be any Databus URI including collection...
(MISSING_IT_THERE)
🪛 markdownlint-cli2 (0.18.1)
README.md
69-69: Bare URL used
(MD034, no-bare-urls)
70-70: Bare URL used
(MD034, no-bare-urls)
86-86: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🪛 Ruff (0.14.7)
databusclient/client.py
523-523: Probable use of requests call without timeout
(S113)
543-543: Avoid specifying long messages outside the exception class
(TRY003)
546-546: Probable use of requests call without timeout
(S113)
717-717: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
722-722: Probable use of requests call without timeout
(S113)
725-725: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
729-729: Probable use of requests call without timeout
(S113)
734-734: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
735-735: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
736-736: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
737-737: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
741-741: Unpacked variable host is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (1)
README.md (1)
184-184: Fix persistent typo in CLI help text."donwload" should be "download".
This typo was flagged in a previous review. Apply this diff to fix it:
- --databus-key TEXT Databus API key to donwload from protected databus + --databus-key TEXT Databus API key to download from protected databus
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
README.md(7 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md
[style] ~157-~157: To form a complete sentence, be sure to include a subject or ‘there’.
Context: ...LOADTARGET ``` - $DOWNLOADTARGET - Can be any Databus URI including collection...
(MISSING_IT_THERE)
🪛 markdownlint-cli2 (0.18.1)
README.md
69-69: Bare URL used
(MD034, no-bare-urls)
70-70: Bare URL used
(MD034, no-bare-urls)
86-86: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: build
|
|
||
| **DBpedia Wikipedia Extraction Enriched** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert bold text to proper heading to improve structure and comply with MD036.
"DBpedia Wikipedia Extraction Enriched" uses emphasis instead of heading syntax. This violates MD036 and inconsistently structures the subsection. Convert to a proper heading (####) to maintain consistent hierarchy.
Apply this diff:
~#### **DBpedia Wikipedia Extraction Enriched**
~
~DBpedia-based enrichment of structured Wikipedia extractions (currently EN DBpedia only). [More information](https://databus.dev.dbpedia.link/fhofer/dbpedia-wikipedia-kg-enriched-dump)Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
86-86: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🤖 Prompt for AI Agents
In README.md around lines 85 to 86, the subsection title is using bold/emphasis
("**DBpedia Wikipedia Extraction Enriched**") instead of a proper markdown
heading; replace that bold text with a level-4 heading (prefix with "#### ") so
the subsection uses heading syntax (e.g., change the line to "#### DBpedia
Wikipedia Extraction Enriched") to fix MD036 and keep heading hierarchy
consistent.
| - `$DOWNLOADTARGET` | ||
| - Can be any Databus URI including collections OR SPARQL query (or several thereof). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Complete the grammatically incomplete bullet point.
The bullet point at line 157 lacks a complete predicate. "Can be any Databus URI..." forms an incomplete sentence when treated as a standalone bullet. Restructure for clarity.
Apply this diff:
- `$DOWNLOADTARGET`
- - Can be any Databus URI including collections OR SPARQL query (or several thereof).
+ - Can be a Databus URI (pointing to files, versions, artifacts, groups, or collections) or a SPARQL query. Multiple targets may be provided.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - `$DOWNLOADTARGET` | |
| - Can be any Databus URI including collections OR SPARQL query (or several thereof). | |
| - `$DOWNLOADTARGET` | |
| - Can be a Databus URI (pointing to files, versions, artifacts, groups, or collections) or a SPARQL query. Multiple targets may be provided. |
🧰 Tools
🪛 LanguageTool
[style] ~157-~157: To form a complete sentence, be sure to include a subject or ‘there’.
Context: ...LOADTARGET ``` - $DOWNLOADTARGET - Can be any Databus URI including collection...
(MISSING_IT_THERE)
🤖 Prompt for AI Agents
In README.md around lines 156-157, the bullet "- Can be any Databus URI
including collections OR SPARQL query (or several thereof)." is a grammatically
incomplete sentence; fix it by adding an explicit subject and making it a full
sentence—e.g., "The $DOWNLOADTARGET can be any Databus URI, including
collections or a SPARQL query (or several thereof)." Ensure punctuation and
capitalization are correct and preserve the parenthetical note.
Goal: DLF
Summary by CodeRabbit
Documentation
New Features
Chores
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.