Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
FROM python:3.10-slim

WORKDIR /data

COPY . .

# Install dependencies
RUN pip install .

# Use ENTRYPOINT for the CLI
ENTRYPOINT ["databusclient"]
112 changes: 101 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,18 @@ Options:

Commands:
deploy
downoad
download
```

## Docker Image Usage

A docker image is available at [dbpedia/databus-python-client](https://hub.docker.com/r/dbpedia/databus-python-client). See [download section](#usage-of-docker-image) for details.

### Deploy command
```
databusclient deploy --help
```
```


Usage: databusclient deploy [OPTIONS] DISTRIBUTIONS...

Arguments:
Expand All @@ -40,23 +43,23 @@ Arguments:
content variants of a distribution, fileExt and Compression can be set, if not they are inferred from the path [required]

Options:
--versionid TEXT target databus version/dataset identifier of the form <h
--version-id TEXT Target databus version/dataset identifier of the form <h
ttps://databus.dbpedia.org/$ACCOUNT/$GROUP/$ARTIFACT/$VE
RSION> [required]
--title TEXT dataset title [required]
--abstract TEXT dataset abstract max 200 chars [required]
--description TEXT dataset description [required]
--license TEXT license (see dalicc.net) [required]
--apikey TEXT apikey [required]
--title TEXT Dataset title [required]
--abstract TEXT Dataset abstract max 200 chars [required]
--description TEXT Dataset description [required]
--license TEXT License (see dalicc.net) [required]
--apikey TEXT API key [required]
--help Show this message and exit.
```
Examples of using deploy command
```
databusclient deploy --versionid https://databus.dbpedia.org/user1/group1/artifact1/2022-05-18 --title title1 --abstract abstract1 --description description1 --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
databusclient deploy --version-id https://databus.dbpedia.org/user1/group1/artifact1/2022-05-18 --title title1 --abstract abstract1 --description description1 --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
```

```
databusclient deploy --versionid https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18 --title "Client Testing" --abstract "Testing the client...." --description "Testing the client...." --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
databusclient deploy --version-id https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18 --title "Client Testing" --abstract "Testing the client...." --description "Testing the client...." --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
```

A few more notes for CLI usage:
Expand All @@ -65,6 +68,93 @@ A few more notes for CLI usage:
* For complete inferred: Just use the URL with `https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml`
* If other parameters are used, you need to leave them empty like `https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml||yml|7a751b6dd5eb8d73d97793c3c564c71ab7b565fa4ba619e4a8fd05a6f80ff653:367116`

### Download command
```
databusclient download --help
```

```
Usage: databusclient download [OPTIONS] DATABUSURIS...

Arguments:
DATABUSURIS... databus uris to download from https://databus.dbpedia.org,
or a query statement that returns databus uris from https://databus.dbpedia.org/sparql
to be downloaded [required]

Download datasets from databus, optionally using vault access if vault
options are provided.

Options:
--localdir TEXT Local databus folder (if not given, databus folder
structure is created in current working directory)
--databus TEXT Databus URL (if not given, inferred from databusuri, e.g.
https://databus.dbpedia.org/sparql)
--token TEXT Path to Vault refresh token file
--authurl TEXT Keycloak token endpoint URL [default:
https://auth.dbpedia.org/realms/dbpedia/protocol/openid-
connect/token]
--clientid TEXT Client ID for token exchange [default: vault-token-
exchange]
--help Show this message and exit. Show this message and exit.
```

Examples of using download command

**File**: download of a single file
```
databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01/mappingbased-literals_lang=az.ttl.bz2
```

**Version**: download of all files of a specific version
```
databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01
```

**Artifact**: download of all files with latest version of an artifact
```
databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals
```

**Group**: download of all files with lates version of all artifacts of a group
```
databusclient download https://databus.dbpedia.org/dbpedia/mappings
```

If no `--localdir` is provided, the current working directory is used as base directory. The downloaded files will be stored in the working directory in a folder structure according to the databus structure, i.e. `./$ACCOUNT/$GROUP/$ARTIFACT/$VERSION/`.

**Collection**: download of all files within a collection
```
databusclient download https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12
```

**Query**: download of all files returned by a query (sparql endpoint must be provided with `--databus`)
```
databusclient download 'PREFIX dcat: <http://www.w3.org/ns/dcat#> SELECT ?x WHERE { ?sub dcat:downloadURL ?x . } LIMIT 10' --databus https://databus.dbpedia.org/sparql
```

#### Authentication with vault

For downloading files from the vault, you need to provide a vault token. See [getting-the-access-refresh-token](https://github.com/dbpedia/databus-vault-access?tab=readme-ov-file#step-1-getting-the-access-refresh-token) for details. You can come back here once you have a `vault-token.dat` file. To use it, just provide the path to the file with `--token /path/to/vault-token.dat`.

Example:
```
databusclient download https://databus.dbpedia.org/dbpedia-enterprise/live-fusion-snapshots/fusion/2025-08-23 --token vault-token.dat
```

If vault authentication is required for downloading a file, the client will use the token. If no vault authentication is required, the token will not be used.

#### Usage of docker image

A docker image is available at [dbpedia/databus-python-client](https://hub.docker.com/r/dbpedia/databus-python-client). You can use it like this:

```
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01
```
If using vault authentication, make sure the token file is available in the container, e.g. by placing it in the current working directory.
```
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia-enterprise/live-fusion-snapshots/fusion/2025-08-23/fusion_props=all_subjectns=commons-wikimedia-org_vocab=all.ttl.gz --token vault-token.dat
```

## Module Usage

### Step 1: Create lists of distributions for the dataset
Expand Down
82 changes: 50 additions & 32 deletions databusclient/cli.py
Original file line number Diff line number Diff line change
@@ -1,43 +1,61 @@
#!/usr/bin/env python3
import typer
import click
from typing import List
from databusclient import client

app = typer.Typer()

@click.group()
def app():
"""Databus Client CLI"""
pass


@app.command()
def deploy(
version_id: str = typer.Option(
...,
help="target databus version/dataset identifier of the form "
"<https://databus.dbpedia.org/$ACCOUNT/$GROUP/$ARTIFACT/$VERSION>",
),
title: str = typer.Option(..., help="dataset title"),
abstract: str = typer.Option(..., help="dataset abstract max 200 chars"),
description: str = typer.Option(..., help="dataset description"),
license_uri: str = typer.Option(..., help="license (see dalicc.net)"),
apikey: str = typer.Option(..., help="apikey"),
distributions: List[str] = typer.Argument(
...,
help="distributions in the form of List[URL|CV|fileext|compression|sha256sum:contentlength] where URL is the "
"download URL and CV the "
"key=value pairs (_ separated) content variants of a distribution. filext and compression are optional "
"and if left out inferred from the path. If the sha256sum:contentlength part is left out it will be "
"calcuted by downloading the file.",
),
):
typer.echo(version_id)
dataid = client.create_dataset(
version_id, title, abstract, description, license_uri, distributions
)
@click.option(
"--version-id", "version_id",
required=True,
help="Target databus version/dataset identifier of the form "
"<https://databus.dbpedia.org/$ACCOUNT/$GROUP/$ARTIFACT/$VERSION>",
)
@click.option("--title", required=True, help="Dataset title")
@click.option("--abstract", required=True, help="Dataset abstract max 200 chars")
@click.option("--description", required=True, help="Dataset description")
@click.option("--license", "license_url", required=True, help="License (see dalicc.net)")
@click.option("--apikey", required=True, help="API key")
@click.argument(
"distributions",
nargs=-1,
required=True,
)
def deploy(version_id, title, abstract, description, license_url, apikey, distributions: List[str]):
"""
Deploy a dataset version with the provided metadata and distributions.
"""
click.echo(f"Deploying dataset version: {version_id}")
dataid = client.create_dataset(version_id, title, abstract, description, license_url, distributions)
client.deploy(dataid=dataid, api_key=apikey)


@app.command()
def download(
localDir: str = typer.Option(..., help="local databus folder"),
databus: str = typer.Option(..., help="databus URL"),
databusuris: List[str] = typer.Argument(...,help="any kind of these: databus identifier, databus collection identifier, query file")
):
client.download(localDir=localDir,endpoint=databus,databusURIs=databusuris)
@click.argument("databusuris", nargs=-1, required=True)
@click.option("--localdir", help="Local databus folder (if not given, databus folder structure is created in current working directory)")
@click.option("--databus", help="Databus URL (if not given, inferred from databusuri, e.g. https://databus.dbpedia.org/sparql)")
@click.option("--token", help="Path to Vault refresh token file")
@click.option("--authurl", default="https://auth.dbpedia.org/realms/dbpedia/protocol/openid-connect/token", show_default=True, help="Keycloak token endpoint URL")
@click.option("--clientid", default="vault-token-exchange", show_default=True, help="Client ID for token exchange")
def download(databusuris: List[str], localdir, databus, token, authurl, clientid):
"""
Download datasets from databus, optionally using vault access if vault options are provided.
"""
client.download(
localDir=localdir,
endpoint=databus,
databusURIs=databusuris,
token=token,
auth_url=authurl,
client_id=clientid,
)


if __name__ == "__main__":
app()
Loading