Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 81 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Command-line and Python client for downloading and deploying datasets on DBpedia
- [CLI Usage](#cli-usage)
- [Download](#cli-download)
- [Deploy](#cli-deploy)
- [Delete](#cli-delete)
- [Module Usage](#module-usage)
- [Deploy](#module-deploy)

Expand Down Expand Up @@ -66,8 +67,8 @@ Commands to download the [DBpedia Knowledge Graphs](#dbpedia-knowledge-graphs) g

To download BUSL 1.1 licensed datasets, you need to register and get an access token.

1. If you do not have a DBpedia Account yet (Forum/Databus), please register at https://account.dbpedia.org
2. Log in at https://account.dbpedia.org and create your token.
1. If you do not have a DBpedia Account yet (Forum/Databus), please register at [https://account.dbpedia.org](https://account.dbpedia.org)
2. Log in at [https://account.dbpedia.org](https://account.dbpedia.org) and create your token.
3. Save the token to a file, e.g. `vault-token.dat`.

### DBpedia Knowledge Graphs
Expand Down Expand Up @@ -181,7 +182,7 @@ Options:
--databus TEXT Databus URL (if not given, inferred from databusuri,
e.g. https://databus.dbpedia.org/sparql)
--vault-token TEXT Path to Vault refresh token file
--databus-key TEXT Databus API key to donwload from protected databus
--databus-key TEXT Databus API key to download from protected databus
--authurl TEXT Keycloak token endpoint URL [default:
https://auth.dbpedia.org/realms/dbpedia/protocol/openid-
connect/token]
Expand All @@ -190,7 +191,7 @@ Options:
--help Show this message and exit.
```

### Examples of using the download command
#### Examples of using the download command

**Download File**: download of a single file
```bash
Expand Down Expand Up @@ -396,6 +397,82 @@ docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy \
./data_folder
```

<a id="cli-delete"></a>
### Delete

With the delete command you can delete collections, groups, artifacts, and versions from the Databus. Deleting files is not supported via API.

**Note**: Deleting datasets will recursively delete all data associated with the dataset below the specified level. Please use this command with caution. As security measure, the delete command will prompt you for confirmation before proceeding with any deletion.

```bash
# Python
databusclient delete [OPTIONS] DATABUSURIS...
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete [OPTIONS] DATABUSURIS...
```

**Help and further information on delete command:**
```bash
# Python
databusclient delete --help
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete --help

# Output:
Usage: databusclient delete [OPTIONS] DATABUSURIS...

Delete a dataset from the databus.

Delete a group, artifact, or version identified by the given databus URI.
Will recursively delete all data associated with the dataset.

Options:
--databus-key TEXT Databus API key to access protected databus [required]
--dry-run Perform a dry run without actual deletion
--force Force deletion without confirmation prompt
--help Show this message and exit.
```

To authenticate the delete request, you need to provide an API key with `--databus-key YOUR_API_KEY`.

If you want to perform a dry run without actual deletion, use the `--dry-run` option. This will show you what would be deleted without making any changes.

As security measure, the delete command will prompt you for confirmation before proceeding with the deletion. If you want to skip this prompt, you can use the `--force` option.

#### Examples of using the delete command

**Delete Version**: delete a specific version
```bash
# Python
databusclient delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01 --databus-key YOUR_API_KEY
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01 --databus-key YOUR_API_KEY
```

**Delete Artifact**: delete an artifact and all its versions
```bash
# Python
databusclient delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals --databus-key YOUR_API_KEY
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals --databus-key YOUR_API_KEY
```

**Delete Group**: delete a group and all its artifacts and versions
```bash
# Python
databusclient delete https://databus.dbpedia.org/dbpedia/mappings --databus-key YOUR_API_KEY
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/mappings --databus-key YOUR_API_KEY
```

**Delete Collection**: delete collection
```bash
# Python
databusclient delete https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12 --databus-key YOUR_API_KEY
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12 --databus-key YOUR_API_KEY
```

## Module Usage

<a id="module-deploy"></a>
Expand Down
190 changes: 190 additions & 0 deletions databusclient/api/delete.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
import json
import requests
from typing import List

from databusclient.api.utils import get_databus_id_parts_from_uri, get_json_ld_from_databus

def _confirm_delete(databusURI: str) -> str:
"""
Confirm deletion of a Databus resource with the user.

Parameters:
- databusURI: The full databus URI of the resource to delete

Returns:
- "confirm" if the user confirms deletion
- "skip" if the user chooses to skip deletion
- "cancel" if the user chooses to cancel the entire deletion process
"""
print(f"Are you sure you want to delete: {databusURI}?")
print("\nThis action is irreversible and will permanently remove the resource and all its data.")
while True:
choice = input("Type 'yes'/'y' to confirm, 'skip'/'s' to skip this resource, or 'cancel'/'c' to abort: ").strip().lower()
if choice in ("yes", "y"):
return "confirm"
elif choice in ("skip", "s"):
return "skip"
elif choice in ("cancel", "c"):
return "cancel"
else:
print("Invalid input. Please type 'yes'/'y', 'skip'/'s', or 'cancel'/'c'.")


def _delete_resource(databusURI: str, databus_key: str, dry_run: bool = False, force: bool = False):
"""
Delete a single Databus resource (version, artifact, group).

Equivalent to:
curl -X DELETE "<databusURI>" -H "accept: */*" -H "X-API-KEY: <key>"

Parameters:
- databusURI: The full databus URI of the resource to delete
- databus_key: Databus API key to authenticate the deletion request
- dry_run: If True, do not perform the deletion but only print what would be deleted
- force: If True, skip confirmation prompt and proceed with deletion
"""

# Confirm the deletion request, skip the request or cancel deletion process
if not (dry_run or force):
action = _confirm_delete(databusURI)
if action == "skip":
print(f"Skipping: {databusURI}\n")
return
if action == "cancel":
raise KeyboardInterrupt("Deletion cancelled by user.")

if databus_key is None:
raise ValueError("Databus API key must be provided for deletion")

headers = {
"accept": "*/*",
"X-API-KEY": databus_key
}

if dry_run:
print(f"[DRY RUN] Would delete: {databusURI}")
return

response = requests.delete(databusURI, headers=headers, timeout=30)

if response.status_code in (200, 204):
print(f"Successfully deleted: {databusURI}")
else:
raise Exception(f"Failed to delete {databusURI}: {response.status_code} - {response.text}")


def _delete_list(databusURIs: List[str], databus_key: str, dry_run: bool = False, force: bool = False):
"""
Delete a list of Databus resources.

Parameters:
- databusURIs: List of full databus URIs of the resources to delete
- databus_key: Databus API key to authenticate the deletion requests
"""
for databusURI in databusURIs:
_delete_resource(databusURI, databus_key, dry_run=dry_run, force=force)


def _delete_artifact(databusURI: str, databus_key: str, dry_run: bool = False, force: bool = False):
"""
Delete an artifact and all its versions.

This function first retrieves all versions of the artifact and then deletes them one by one.
Finally, it deletes the artifact itself.

Parameters:
- databusURI: The full databus URI of the artifact to delete
- databus_key: Databus API key to authenticate the deletion requests
- dry_run: If True, do not perform the deletion but only print what would be deleted
"""
artifact_body = get_json_ld_from_databus(databusURI, databus_key)

json_dict = json.loads(artifact_body)
versions = json_dict.get("databus:hasVersion")

# Single version case {}
if isinstance(versions, dict):
versions = [versions]
# Multiple versions case [{}, {}]

# If versions is None or empty skip
if versions is None:
print(f"No versions found for artifact: {databusURI}")
else:
version_uris = [v["@id"] for v in versions if "@id" in v]
if not version_uris:
print(f"No version URIs found in artifact JSON-LD for: {databusURI}")
else:
# Delete all versions
_delete_list(version_uris, databus_key, dry_run=dry_run, force=force)

# Finally, delete the artifact itself
_delete_resource(databusURI, databus_key, dry_run=dry_run, force=force)

def _delete_group(databusURI: str, databus_key: str, dry_run: bool = False, force: bool = False):
"""
Delete a group and all its artifacts and versions.

This function first retrieves all artifacts of the group, then deletes each artifact (which in turn deletes its versions).
Finally, it deletes the group itself.

Parameters:
- databusURI: The full databus URI of the group to delete
- databus_key: Databus API key to authenticate the deletion requests
- dry_run: If True, do not perform the deletion but only print what would be deleted
"""
group_body = get_json_ld_from_databus(databusURI, databus_key)

json_dict = json.loads(group_body)
artifacts = json_dict.get("databus:hasArtifact", [])

artifact_uris = []
for item in artifacts:
uri = item.get("@id")
if not uri:
continue
_, _, _, _, version, _ = get_databus_id_parts_from_uri(uri)
if version is None:
artifact_uris.append(uri)

# Delete all artifacts (which deletes their versions)
for artifact_uri in artifact_uris:
_delete_artifact(artifact_uri, databus_key, dry_run=dry_run, force=force)

# Finally, delete the group itself
_delete_resource(databusURI, databus_key, dry_run=dry_run, force=force)

def delete(databusURIs: List[str], databus_key: str, dry_run: bool, force: bool):
"""
Delete a dataset from the databus.

Delete a group, artifact, or version identified by the given databus URI.
Will recursively delete all data associated with the dataset.

Parameters:
- databusURIs: List of full databus URIs of the resources to delete
- databus_key: Databus API key to authenticate the deletion requests
- dry_run: If True, will only print what would be deleted without performing actual deletions
- force: If True, skip confirmation prompt and proceed with deletion
"""

for databusURI in databusURIs:
_host, _account, group, artifact, version, file = get_databus_id_parts_from_uri(databusURI)

if group == "collections" and artifact is not None:
print(f"Deleting collection: {databusURI}")
_delete_resource(databusURI, databus_key, dry_run=dry_run, force=force)
elif file is not None:
print(f"Deleting file is not supported via API: {databusURI}")
continue # skip file deletions
elif version is not None:
print(f"Deleting version: {databusURI}")
_delete_resource(databusURI, databus_key, dry_run=dry_run, force=force)
elif artifact is not None:
print(f"Deleting artifact and all its versions: {databusURI}")
_delete_artifact(databusURI, databus_key, dry_run=dry_run, force=force)
elif group is not None and group != "collections":
print(f"Deleting group and all its artifacts and versions: {databusURI}")
_delete_group(databusURI, databus_key, dry_run=dry_run, force=force)
else:
print(f"Deleting {databusURI} is not supported.")
37 changes: 37 additions & 0 deletions databusclient/api/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import requests
from typing import Tuple, Optional

def get_databus_id_parts_from_uri(uri: str) -> Tuple[Optional[str], Optional[str], Optional[str], Optional[str], Optional[str], Optional[str]]:
"""
Extract databus ID parts from a given databus URI.

Parameters:
- uri: The full databus URI

Returns:
A tuple containing (host, accountId, groupId, artifactId, versionId, fileId).
Each element is a string or None if not present.
"""
uri = uri.removeprefix("https://").removeprefix("http://")
parts = uri.strip("/").split("/")
parts += [None] * (6 - len(parts)) # pad with None if less than 6 parts
return tuple(parts[:6]) # return only the first 6 parts

def get_json_ld_from_databus(uri: str, databus_key: str | None = None) -> str:
"""
Retrieve JSON-LD representation of a databus resource.

Parameters:
- uri: The full databus URI
- databus_key: Optional Databus API key for authentication on protected resources

Returns:
JSON-LD string representation of the databus resource.
"""
headers = {"Accept": "application/ld+json"}
if databus_key is not None:
headers["X-API-KEY"] = databus_key
response = requests.get(uri, headers=headers, timeout=30)
response.raise_for_status()

return response.text
23 changes: 22 additions & 1 deletion databusclient/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from databusclient import client

from databusclient.rclone_wrapper import upload
from databusclient.api.delete import delete as api_delete

@click.group()
def app():
Expand Down Expand Up @@ -95,7 +96,7 @@ def deploy(version_id, title, abstract, description, license_url, apikey,
@click.option("--localdir", help="Local databus folder (if not given, databus folder structure is created in current working directory)")
@click.option("--databus", help="Databus URL (if not given, inferred from databusuri, e.g. https://databus.dbpedia.org/sparql)")
@click.option("--vault-token", help="Path to Vault refresh token file")
@click.option("--databus-key", help="Databus API key to donwload from protected databus")
@click.option("--databus-key", help="Databus API key to download from protected databus")
@click.option("--authurl", default="https://auth.dbpedia.org/realms/dbpedia/protocol/openid-connect/token", show_default=True, help="Keycloak token endpoint URL")
@click.option("--clientid", default="vault-token-exchange", show_default=True, help="Client ID for token exchange")
def download(databusuris: List[str], localdir, databus, vault_token, databus_key, authurl, clientid):
Expand All @@ -112,6 +113,26 @@ def download(databusuris: List[str], localdir, databus, vault_token, databus_key
client_id=clientid,
)

@app.command()
@click.argument("databusuris", nargs=-1, required=True)
@click.option("--databus-key", help="Databus API key to access protected databus", required=True)
@click.option("--dry-run", is_flag=True, help="Perform a dry run without actual deletion")
@click.option("--force", is_flag=True, help="Force deletion without confirmation prompt")
def delete(databusuris: List[str], databus_key: str, dry_run: bool, force: bool):
"""
Delete a dataset from the databus.

Delete a group, artifact, or version identified by the given databus URI.
Will recursively delete all data associated with the dataset.
"""

api_delete(
databusURIs=databusuris,
databus_key=databus_key,
dry_run=dry_run,
force=force,
)


if __name__ == "__main__":
app()
Loading