Skip to content

Conversation

@Integer-Ctrl
Copy link
Contributor

@Integer-Ctrl Integer-Ctrl commented Nov 25, 2025

Goal: DLF

Summary by CodeRabbit

  • Documentation

    • Comprehensive README rewrite: TOC, distinct Python/Docker paths, expanded CLI and module usage, deploy/download examples, metadata examples, and terminology/capitalization fixes.
  • New Features

    • API-key authentication for protected Databus downloads via a new databus-key option.
    • Added examples for three deploy modes (classic, metadata-driven, Nextcloud/WebDAV).
  • Chores

    • CLI parameter renamed: --token → --vault-token; CLI help expanded and clarified.
  • Bug Fixes

    • Content-Length mismatches now reported as warnings rather than fatal errors.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 25, 2025

📝 Walkthrough

Walkthrough

Adds API-key support (--databus-key) alongside Vault-token auth, propagating databus_key through the CLI and client download pipeline. Download retry logic now selects Vault Bearer or API-key paths on 401 responses. README.md is substantially restructured and expanded (TOC, Python/Docker flows, CLI and module usage).

Changes

Cohort / File(s) Summary
Documentation
README.md
Full README restructuring and expansion: Table of Contents, separate Python/Docker quickstarts, expanded CLI and Module usage, clarified Download/Deploy commands and examples, metadata examples, and terminology fixes.
CLI interface
databusclient/cli.py
Replaced --token with --vault-token, added --databus-key; updated download() signature to accept vault_token and databus_key and forward them to client.download().
Download & auth logic
databusclient/client.py
Added databus_key through the download pipeline (__download_file__, __handle_databus_collection__, __get_json_ld_from_databus__, __download_list__, download). Implements API-key retry path for 401 responses without a Bearer challenge (uses X-API-KEY), preserves Vault Bearer flow for WWW-Authenticate: Bearer, and changes Content-Length mismatch from fatal exception to warning.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI
    participant Client
    participant DatabusAPI

    User->>CLI: download --vault-token=file --databus-key=KEY ...
    CLI->>Client: download(vault_token=file, databus_key=KEY, ...)
    Client->>DatabusAPI: GET resource
    alt 401 with WWW-Authenticate: Bearer
        DatabusAPI-->>Client: 401 + WWW-Authenticate: Bearer
        Client->>Client: read Vault token from file
        Client->>DatabusAPI: GET resource (Authorization: Bearer <token>)
        DatabusAPI-->>Client: 200 OK
    else 401 without Bearer
        DatabusAPI-->>Client: 401
        Client->>DatabusAPI: GET resource (X-API-KEY: KEY)
        DatabusAPI-->>Client: 200 OK
    end
    Client-->>CLI: files saved / download complete
    CLI-->>User: exit
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Verify auth-branching in __download_file__ correctly detects WWW-Authenticate Bearer and only falls back to API-key when appropriate.
  • Confirm databus_key is consistently propagated through __handle_databus_collection__, __get_json_ld_from_databus__, __download_list__, and download.
  • Review Content-Length mismatch change (exception → warning) for acceptable failure semantics.
  • Ensure README examples and CLI option names match implemented parameters.

Possibly related PRs

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Finalize 256' is vague and generic, providing no meaningful information about the actual changes made in the pull request. Revise the title to clearly describe the main changes, such as 'Add databus_key authentication parameter to download flow' or 'Support API key authentication for protected databus endpoints'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch finalize-256

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
databusclient/client.py (1)

791-823: Critical: databus_key not propagated to metadata and collection retrieval functions.

The databus_key parameter is not being passed to __handle_databus_collection__ and __get_json_ld_from_databus__ calls, which means API-key-protected collections and artifact metadata cannot be accessed even when the key is provided.

Apply this diff to fix all missing databus_key parameters:

             # databus collection
             if "/collections/" in databusURI:  # TODO "in" is not safe! there could be an artifact named collections, need to check for the correct part position in the URI
-                query = __handle_databus_collection__(databusURI)
+                query = __handle_databus_collection__(databusURI, databus_key)
                 res = __handle_databus_file_query__(endpoint, query)
                 __download_list__(res, localDir, vault_token_file=token, databus_key=databus_key, auth_url=auth_url, client_id=client_id)
             # databus file
             elif file is not None:
                 __download_list__([databusURI], localDir, vault_token_file=token, databus_key=databus_key, auth_url=auth_url, client_id=client_id)
             # databus artifact version
             elif version is not None:
-                json_str = __get_json_ld_from_databus__(databusURI)
+                json_str = __get_json_ld_from_databus__(databusURI, databus_key)
                 res = __handle_databus_artifact_version__(json_str)
                 __download_list__(res, localDir, vault_token_file=token, databus_key=databus_key, auth_url=auth_url, client_id=client_id)
             # databus artifact
             elif artifact is not None:
-                json_str = __get_json_ld_from_databus__(databusURI)
+                json_str = __get_json_ld_from_databus__(databusURI, databus_key)
                 latest = __get_databus_latest_version_of_artifact__(json_str)
                 print(f"No version given, using latest version: {latest}")
-                json_str = __get_json_ld_from_databus__(latest)
+                json_str = __get_json_ld_from_databus__(latest, databus_key)
                 res = __handle_databus_artifact_version__(json_str)
                 __download_list__(res, localDir, vault_token_file=token, databus_key=databus_key, auth_url=auth_url, client_id=client_id)

             # databus group
             elif group is not None:
-                json_str = __get_json_ld_from_databus__(databusURI)
+                json_str = __get_json_ld_from_databus__(databusURI, databus_key)
                 artifacts = __get_databus_artifacts_of_group__(json_str)
                 for artifact_uri in artifacts:
                     print(f"Processing artifact {artifact_uri}")
-                    json_str = __get_json_ld_from_databus__(artifact_uri)
+                    json_str = __get_json_ld_from_databus__(artifact_uri, databus_key)
                     latest = __get_databus_latest_version_of_artifact__(json_str)
                     print(f"No version given, using latest version: {latest}")
-                    json_str = __get_json_ld_from_databus__(latest)
+                    json_str = __get_json_ld_from_databus__(latest, databus_key)
                     res = __handle_databus_artifact_version__(json_str)
                     __download_list__(res, localDir, vault_token_file=token, databus_key=databus_key, auth_url=auth_url, client_id=client_id)
🧹 Nitpick comments (1)
databusclient/client.py (1)

717-748: Function signatures properly extended with databus_key parameter.

The propagation of databus_key through __handle_databus_collection__, __get_json_ld_from_databus__, and __download_list__ is correctly implemented, with API key headers properly set when provided.

For improved type safety, consider using explicit Optional or T | None syntax for parameters that can be None, as flagged by static analysis (RUF013).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0235055 and c62966b.

📒 Files selected for processing (3)
  • README.md (7 hunks)
  • databusclient/cli.py (1 hunks)
  • databusclient/client.py (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
databusclient/cli.py (1)
databusclient/client.py (1)
  • download (759-839)
🪛 LanguageTool
README.md

[style] ~157-~157: To form a complete sentence, be sure to include a subject or ‘there’.
Context: ...LOADTARGET ``` - $DOWNLOADTARGET - Can be any Databus URI including collection...

(MISSING_IT_THERE)

🪛 markdownlint-cli2 (0.18.1)
README.md

69-69: Bare URL used

(MD034, no-bare-urls)


70-70: Bare URL used

(MD034, no-bare-urls)


86-86: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🪛 Ruff (0.14.5)
databusclient/client.py

543-543: Avoid specifying long messages outside the exception class

(TRY003)


546-546: Probable use of requests call without timeout

(S113)


717-717: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


722-722: Probable use of requests call without timeout

(S113)


725-725: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


729-729: Probable use of requests call without timeout

(S113)


734-734: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


735-735: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


736-736: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


737-737: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


741-741: Unpacked variable host is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (3)
README.md (1)

1-476: Comprehensive documentation expansion looks great!

The documentation is well-structured and covers all three deployment modes, CLI usage, and module usage effectively. The examples are clear and align with the code changes introducing API key authentication.

databusclient/cli.py (1)

97-113: Authentication parameter updates look correct.

The renaming of --token to --vault-token and addition of --databus-key properly supports both Vault-based and API-key-based authentication workflows. The parameters are correctly passed to client.download.

databusclient/client.py (1)

526-546: Authentication logic correctly handles both Vault and API key flows.

The implementation properly distinguishes between:

  • 401 with Bearer challenge → Vault token authentication
  • 401 without Bearer challenge → API key authentication

This dual-path approach aligns with the PR objectives.

However, consider adding a timeout to the requests call on line 546 to prevent hanging on slow or unresponsive servers:

-        response = requests.get(url, headers=headers, stream=True)
+        response = requests.get(url, headers=headers, stream=True, timeout=30)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
databusclient/client.py (1)

759-839: Add databus_key to function docstring and pass it to metadata endpoint calls.

The databus_key parameter is present in the function signature and correctly passed to __download_list__ calls. However:

  1. Docstring: databus_key is missing from the parameter documentation. Add: databus_key: API key for accessing protected databus endpoints

  2. Metadata endpoints: Both __handle_databus_collection__ and __get_json_ld_from_databus__ already support and use the databus_key parameter (adding it to the X-API-KEY header), but download() does not pass it to these calls. If your deployment protects collection/JSON-LD URLs with API keys, update these calls to include databus_key=databus_key.

♻️ Duplicate comments (2)
databusclient/client.py (1)

567-570: Content-Length mismatch should fail by default (duplicate of earlier concern).

Previously, a mismatch between downloaded size and Content-Length raised an error; now it only prints a warning and continues. This was called out in an earlier review and acknowledged as intentional but risky: corrupt or truncated downloads will silently appear “successful”.

Please either:

  • Restore strict failure by default, or
  • Make the lenient behavior opt-in (e.g. flag/env var/parameter, possibly restricted to known-problematic hosts), and keep failure as the default.

A minimal strict version could look like:

-    # TODO: could be a problem of github raw / openflaas
-    if total_size_in_bytes != 0 and progress_bar.n != total_size_in_bytes:
-        # raise IOError("Downloaded size does not match Content-Length header")
-        print(f"Warning: Downloaded size does not match Content-Length header:\nExpected {total_size_in_bytes}, got {progress_bar.n}")
+    if total_size_in_bytes != 0 and progress_bar.n != total_size_in_bytes:
+        raise IOError(
+            f"Downloaded size does not match Content-Length header: "
+            f"expected {total_size_in_bytes}, got {progress_bar.n}"
+        )
README.md (1)

178-191: Fix typo in --databus-key help text (duplicate).

The description still says “Databus API key to donwload from protected databus”. Please fix the spelling:

-  --databus-key TEXT  Databus API key to donwload from protected databus
+  --databus-key TEXT  Databus API key to download from protected databus
🧹 Nitpick comments (3)
databusclient/client.py (3)

494-547: Auth retry flow looks sound; add request timeouts for robustness.

The new API-key retry branch and Vault/Bearer handling are consistent and correctly separated by the WWW-Authenticate header and 401 status. However, all three network calls in this function (requests.head, initial requests.get, and the token/API-key retry requests.get) are made without a timeout, which means a hung TCP connection can block the caller indefinitely.

Consider introducing a module-level default (e.g. REQUEST_TIMEOUT = 30) and passing it explicitly to each requests.* call in this function to avoid indefinite hangs, especially since this is on the main download path.


717-729: API-key headers wired in, but tighten typing and HTTP handling.

Good: both __handle_databus_collection__ and __get_json_ld_from_databus__ now accept databus_key and add X-API-KEY when provided.

Two improvements:

  • Typing: to satisfy Ruff and PEP 484, make the Optional explicit, e.g.:
-from typing import List, Dict, Tuple, Optional, Union
+from typing import List, Dict, Tuple, Optional, Union
...
-def __handle_databus_collection__(uri: str, databus_key: str = None) -> str:
+def __handle_databus_collection__(uri: str, databus_key: Optional[str] = None) -> str:
...
-def __get_json_ld_from_databus__(uri: str, databus_key: str = None) -> str:
+def __get_json_ld_from_databus__(uri: str, databus_key: Optional[str] = None) -> str:
  • HTTP robustness: both helpers call requests.get(...).text without timeout or raise_for_status(). For protected or flaky endpoints, this can hang or return HTML error pages that are then treated as SPARQL/JSON-LD.

Consider:

-    return requests.get(uri, headers=headers).text
+    resp = requests.get(uri, headers=headers, timeout=REQUEST_TIMEOUT)
+    resp.raise_for_status()
+    return resp.text

(and similarly for the JSON-LD helper).


732-749: Propagating databus_key looks correct; clean up typing and unused variable.

The threading of databus_key through __download_list__ into __download_file__ is consistent and keeps the local-directory behavior intact.

Minor follow-ups:

  • Explicit Optional types to satisfy Ruff:
-def __download_list__(urls: List[str],
-                      localDir: str,
-                      vault_token_file: str = None,
-                      databus_key: str = None,
-                      auth_url: str = None,
-                      client_id: str = None) -> None:
+def __download_list__(
+    urls: List[str],
+    localDir: Optional[str],
+    vault_token_file: Optional[str] = None,
+    databus_key: Optional[str] = None,
+    auth_url: Optional[str] = None,
+    client_id: Optional[str] = None,
+) -> None:
  • host from __get_databus_id_parts__ is never used; rename to _host (or _) to silence the unused-variable warning:
-            host, account, group, artifact, version, file = __get_databus_id_parts__(url)
+            _host, account, group, artifact, version, file = __get_databus_id_parts__(url)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c62966b and 833872d.

📒 Files selected for processing (2)
  • README.md (7 hunks)
  • databusclient/client.py (9 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md

[style] ~157-~157: To form a complete sentence, be sure to include a subject or ‘there’.
Context: ...LOADTARGET ``` - $DOWNLOADTARGET - Can be any Databus URI including collection...

(MISSING_IT_THERE)

🪛 markdownlint-cli2 (0.18.1)
README.md

69-69: Bare URL used

(MD034, no-bare-urls)


70-70: Bare URL used

(MD034, no-bare-urls)


86-86: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🪛 Ruff (0.14.7)
databusclient/client.py

523-523: Probable use of requests call without timeout

(S113)


543-543: Avoid specifying long messages outside the exception class

(TRY003)


546-546: Probable use of requests call without timeout

(S113)


717-717: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


722-722: Probable use of requests call without timeout

(S113)


725-725: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


729-729: Probable use of requests call without timeout

(S113)


734-734: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


735-735: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


736-736: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


737-737: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


741-741: Unpacked variable host is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: build

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
README.md (1)

184-184: Fix persistent typo in CLI help text.

"donwload" should be "download".

This typo was flagged in a previous review. Apply this diff to fix it:

-  --databus-key TEXT  Databus API key to donwload from protected databus
+  --databus-key TEXT  Databus API key to download from protected databus
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 833872d and 603f2db.

📒 Files selected for processing (1)
  • README.md (7 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md

[style] ~157-~157: To form a complete sentence, be sure to include a subject or ‘there’.
Context: ...LOADTARGET ``` - $DOWNLOADTARGET - Can be any Databus URI including collection...

(MISSING_IT_THERE)

🪛 markdownlint-cli2 (0.18.1)
README.md

69-69: Bare URL used

(MD034, no-bare-urls)


70-70: Bare URL used

(MD034, no-bare-urls)


86-86: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: build

Comment on lines +85 to +86

**DBpedia Wikipedia Extraction Enriched**
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Convert bold text to proper heading to improve structure and comply with MD036.

"DBpedia Wikipedia Extraction Enriched" uses emphasis instead of heading syntax. This violates MD036 and inconsistently structures the subsection. Convert to a proper heading (####) to maintain consistent hierarchy.

Apply this diff:

~#### **DBpedia Wikipedia Extraction Enriched**
~
~DBpedia-based enrichment of structured Wikipedia extractions (currently EN DBpedia only). [More information](https://databus.dev.dbpedia.link/fhofer/dbpedia-wikipedia-kg-enriched-dump)

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

86-86: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🤖 Prompt for AI Agents
In README.md around lines 85 to 86, the subsection title is using bold/emphasis
("**DBpedia Wikipedia Extraction Enriched**") instead of a proper markdown
heading; replace that bold text with a level-4 heading (prefix with "#### ") so
the subsection uses heading syntax (e.g., change the line to "#### DBpedia
Wikipedia Extraction Enriched") to fix MD036 and keep heading hierarchy
consistent.

Comment on lines +156 to +157
- `$DOWNLOADTARGET`
- Can be any Databus URI including collections OR SPARQL query (or several thereof).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Complete the grammatically incomplete bullet point.

The bullet point at line 157 lacks a complete predicate. "Can be any Databus URI..." forms an incomplete sentence when treated as a standalone bullet. Restructure for clarity.

Apply this diff:

 - `$DOWNLOADTARGET`
-  - Can be any Databus URI including collections OR SPARQL query (or several thereof).
+  - Can be a Databus URI (pointing to files, versions, artifacts, groups, or collections) or a SPARQL query. Multiple targets may be provided.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- `$DOWNLOADTARGET`
- Can be any Databus URI including collections OR SPARQL query (or several thereof).
- `$DOWNLOADTARGET`
- Can be a Databus URI (pointing to files, versions, artifacts, groups, or collections) or a SPARQL query. Multiple targets may be provided.
🧰 Tools
🪛 LanguageTool

[style] ~157-~157: To form a complete sentence, be sure to include a subject or ‘there’.
Context: ...LOADTARGET ``` - $DOWNLOADTARGET - Can be any Databus URI including collection...

(MISSING_IT_THERE)

🤖 Prompt for AI Agents
In README.md around lines 156-157, the bullet "- Can be any Databus URI
including collections OR SPARQL query (or several thereof)." is a grammatically
incomplete sentence; fix it by adding an explicit subject and making it a full
sentence—e.g., "The $DOWNLOADTARGET can be any Databus URI, including
collections or a SPARQL query (or several thereof)." Ensure punctuation and
capitalization are correct and preserve the parenthetical note.

@Integer-Ctrl Integer-Ctrl merged commit b70f59a into main Dec 4, 2025
3 checks passed
@Integer-Ctrl Integer-Ctrl self-assigned this Dec 4, 2025
@coderabbitai coderabbitai bot mentioned this pull request Dec 4, 2025
@Integer-Ctrl Integer-Ctrl deleted the finalize-256 branch December 9, 2025 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants