Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
3f73738
added deploy script with uploading to given rclone remote
gg46ixav Jul 3, 2025
9edc0dc
added webdav-url argument
gg46ixav Jul 4, 2025
a56f01d
added deploying to the databus without upload to nextcloud
gg46ixav Jul 25, 2025
e71a886
ci: workflow for building and publishing docker image
Integer-Ctrl Oct 8, 2025
82de07f
fix: don't build docker image on pr
Integer-Ctrl Oct 8, 2025
d78f129
Merge pull request #12 from dbpedia/download-capabilities
Integer-Ctrl Oct 13, 2025
5fdf78b
Merge branch 'download-capabilities' into nextcloudclient
gg46ixav Oct 21, 2025
800256c
updated pyproject.toml and content-hash
gg46ixav Oct 21, 2025
e976cf3
updated README
kurzum Oct 27, 2025
1558391
Add Pylint workflow for Python code analysis
Integer-Ctrl Oct 27, 2025
cfdca3b
Removed Pylint workflow fro Python code analysis
Integer-Ctrl Oct 28, 2025
66f1c8e
Merge branch 'main' into nextcloudclient
gg46ixav Oct 28, 2025
4259229
Merge remote-tracking branch 'origin/main' into nextcloudclient
gg46ixav Oct 28, 2025
b179f90
updated README.md
gg46ixav Oct 28, 2025
a504b9d
Merge remote-tracking branch 'origin/nextcloudclient' into nextcloudc…
gg46ixav Oct 28, 2025
0ce0c24
added checksum validation
gg46ixav Oct 28, 2025
6596cbc
updated upload_to_nextcloud function to accept list of source_paths
gg46ixav Oct 28, 2025
b9f9854
only add result if upload successful
gg46ixav Oct 28, 2025
2f8493d
use os.path.basename instead of .split("/")[-1]
gg46ixav Oct 28, 2025
07359cc
added __init__.py and updated README.md
gg46ixav Oct 28, 2025
8047968
changed append to extend (no nested list)
gg46ixav Oct 28, 2025
0172450
fixed windows separators and added rclone error message
gg46ixav Oct 28, 2025
f957512
moved deploy.py to cli upload_and_deploy
gg46ixav Nov 3, 2025
607f527
changed metadata to dict list
gg46ixav Nov 3, 2025
6cb7e11
removed python-dotenv
gg46ixav Nov 3, 2025
7651c31
small updates
gg46ixav Nov 3, 2025
df17a7c
refactored upload_and_deploy function
gg46ixav Nov 3, 2025
7492531
updated README.md
gg46ixav Nov 3, 2025
c985603
updated metadata_string for new metadata format
gg46ixav Nov 3, 2025
62a3611
updated README.md
gg46ixav Nov 3, 2025
22ac02f
updated README.md
gg46ixav Nov 3, 2025
3faaf4d
Changed context url back
gg46ixav Nov 3, 2025
5dfebe5
added check for known compressions
gg46ixav Nov 3, 2025
f9367c0
updated checksum to sha256
gg46ixav Nov 3, 2025
5d474db
updated README.md
gg46ixav Nov 3, 2025
bef78ef
size check
gg46ixav Nov 3, 2025
529f2ae
updated checksum validation
gg46ixav Nov 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions .github/workflows/docker-publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: Build and push Docker image to DockerHub

on:
push:
branches: [ "main" ]
workflow_dispatch: # allows manual trigger

jobs:
push_to_registry:
name: Push Docker image to Docker Hub
runs-on: ubuntu-latest

steps:
- name: Check out the repo
uses: actions/checkout@v3

- name: Log in to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DBP_DOCKERHUB_CREDENTIAL_USERNAME }}
password: ${{ secrets.DBP_DOCKERHUB_CREDENTIAL_TOKEN_PUSHIMAGES }}

- name: Build and push Docker image
uses: docker/build-push-action@v6
with:
context: .
push: true
tags: dbpedia/databus-python-client:latest
242 changes: 200 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,77 @@
# Databus Client Python

## Install
## Quickstart Example
Commands to download the DBpedia Knowledge Graphs generated by Live Fusion.
DBpedia Live Fusion publishes two different kinds of KGs:

1. Open Core Knowledge Graphs under CC-BY-SA license, open with copyleft/share-alike, no registration needed
2. Industry Knowledge Graphs under BUSL 1.1 license, unrestricted for research and experimentation, commercial license for productive use, free registration needed.


### Registration (Access Token)

1. If you do not have a DBpedia Account yet (Forum/Databus), please register at https://account.dbpedia.org
2. Login at https://account.dbpedia.org and create your token.
3. Save the token to a file `vault-token.dat`.

### Docker vs. Python
The databus-python-client comes as **docker** or **python** with these patterns.
`$DOWNLOADTARGET` can be any Databus URI including collections OR SPARQL query (or several thereof). Details are documented below.
```bash
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download $DOWNLOADTARGET --token vault-token.dat
# Python
python3 -m pip install databusclient
databusclient download $DOWNLOADTARGET --token vault-token.dat
```

### Download Live Fusion KG Snapshot (BUSL 1.1, registration needed)
TODO One slogan sentence. [More information](https://databus.dbpedia.org/dbpedia-enterprise/live-fusion-kg-snapshot)
```bash
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia-enterprise/live-fusion-kg-snapshot --token vault-token.dat
```

### Download Enriched Knowledge Graphs (BUSL 1.1, registration needed)
**DBpedia Wikipedia Extraction Enriched**
TODO One slogan sentence and link
Currently EN DBpedia only.

```bash
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia-enterprise/dbpedia-wikipedia-kg-enriched-snapshot --token vault-token.dat
```
**DBpedia Wikidata Extraction Enriched**
TODO One slogan sentence and link

```bash
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia-enterprise/dbpedia-wikidata-kg-enriched-snapshot --token vault-token.dat
```

### Download DBpedia Wikipedia Knowledge Graphs (CC-BY-SA, no registration needed)
TODO One slogan sentence and link

```bash
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/dbpedia-wikipedia-kg-snapshot
```
### Download DBpedia Wikidata Knowledge Graphs (CC-BY-SA, no registration needed)
TODO One slogan sentence and link

```bash
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/dbpedia-wikidata-kg-snapshot
```

## Docker Image Usage

A docker image is available at [dbpedia/databus-python-client](https://hub.docker.com/r/dbpedia/databus-python-client). See [download section](#usage-of-docker-image) for details.


## CLI Usage

**Installation**
```bash
python3 -m pip install databusclient
```

**Running**
```bash
databusclient --help
```
Expand All @@ -26,47 +92,7 @@ Commands:
download
```

## Docker Image Usage

A docker image is available at [dbpedia/databus-python-client](https://hub.docker.com/r/dbpedia/databus-python-client). See [download section](#usage-of-docker-image) for details.

### Deploy command
```
databusclient deploy --help
```
```
Usage: databusclient deploy [OPTIONS] DISTRIBUTIONS...

Arguments:
DISTRIBUTIONS... distributions in the form of List[URL|CV|fileext|compression|sha256sum:contentlength] where URL is the
download URL and CV the key=value pairs (_ separted)
content variants of a distribution, fileExt and Compression can be set, if not they are inferred from the path [required]

Options:
--version-id TEXT Target databus version/dataset identifier of the form <h
ttps://databus.dbpedia.org/$ACCOUNT/$GROUP/$ARTIFACT/$VE
RSION> [required]
--title TEXT Dataset title [required]
--abstract TEXT Dataset abstract max 200 chars [required]
--description TEXT Dataset description [required]
--license TEXT License (see dalicc.net) [required]
--apikey TEXT API key [required]
--help Show this message and exit.
```
Examples of using deploy command
```
databusclient deploy --version-id https://databus.dbpedia.org/user1/group1/artifact1/2022-05-18 --title title1 --abstract abstract1 --description description1 --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
```

```
databusclient deploy --version-id https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18 --title "Client Testing" --abstract "Testing the client...." --description "Testing the client...." --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
```

A few more notes for CLI usage:

* The content variants can be left out ONLY IF there is just one distribution
* For complete inferred: Just use the URL with `https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml`
* If other parameters are used, you need to leave them empty like `https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml||yml|7a751b6dd5eb8d73d97793c3c564c71ab7b565fa4ba619e4a8fd05a6f80ff653:367116`

### Download command
```
Expand Down Expand Up @@ -132,6 +158,46 @@ databusclient download https://databus.dbpedia.org/dbpedia/collections/dbpedia-s
databusclient download 'PREFIX dcat: <http://www.w3.org/ns/dcat#> SELECT ?x WHERE { ?sub dcat:downloadURL ?x . } LIMIT 10' --databus https://databus.dbpedia.org/sparql
```

### Deploy command
```
databusclient deploy --help
```
```
Usage: databusclient deploy [OPTIONS] DISTRIBUTIONS...

Arguments:
DISTRIBUTIONS... distributions in the form of List[URL|CV|fileext|compression|sha256sum:contentlength] where URL is the
download URL and CV the key=value pairs (_ separted)
content variants of a distribution, fileExt and Compression can be set, if not they are inferred from the path [required]

Options:
--version-id TEXT Target databus version/dataset identifier of the form <h
ttps://databus.dbpedia.org/$ACCOUNT/$GROUP/$ARTIFACT/$VE
RSION> [required]
--title TEXT Dataset title [required]
--abstract TEXT Dataset abstract max 200 chars [required]
--description TEXT Dataset description [required]
--license TEXT License (see dalicc.net) [required]
--apikey TEXT API key [required]
--help Show this message and exit.
```
Examples of using deploy command
```
databusclient deploy --version-id https://databus.dbpedia.org/user1/group1/artifact1/2022-05-18 --title title1 --abstract abstract1 --description description1 --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
```

```
databusclient deploy --version-id https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18 --title "Client Testing" --abstract "Testing the client...." --description "Testing the client...." --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
```

A few more notes for CLI usage:

* The content variants can be left out ONLY IF there is just one distribution
* For complete inferred: Just use the URL with `https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml`
* If other parameters are used, you need to leave them empty like `https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml||yml|7a751b6dd5eb8d73d97793c3c564c71ab7b565fa4ba619e4a8fd05a6f80ff653:367116`



#### Authentication with vault

For downloading files from the vault, you need to provide a vault token. See [getting-the-access-refresh-token](https://github.com/dbpedia/databus-vault-access?tab=readme-ov-file#step-1-getting-the-access-refresh-token) for details. You can come back here once you have a `vault-token.dat` file. To use it, just provide the path to the file with `--token /path/to/vault-token.dat`.
Expand All @@ -155,8 +221,100 @@ If using vault authentication, make sure the token file is available in the cont
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia-enterprise/live-fusion-snapshots/fusion/2025-08-23/fusion_props=all_subjectns=commons-wikimedia-org_vocab=all.ttl.gz --token vault-token.dat
```

## Module Usage

### Upload-and-deploy command
```bash
databusclient upload-and-deploy --help
```
```text
Usage: databusclient upload-and-deploy [OPTIONS] [FILES]...

Upload files to Nextcloud and deploy to DBpedia Databus.

Arguments:
FILES... files in the form of List[path], where every path must exist locally, which will be uploaded and deployed

Options:
--webdav-url TEXT WebDAV URL (e.g.,
https://cloud.example.com/remote.php/webdav)
--remote TEXT rclone remote name (e.g., 'nextcloud')
--path TEXT Remote path on Nextcloud (e.g., 'datasets/mydataset')
--no-upload Skip file upload and use existing metadata
--metadata PATH Path to metadata JSON file (required if --no-upload is
used)
--version-id TEXT Target databus version/dataset identifier of the form <h
ttps://databus.dbpedia.org/$ACCOUNT/$GROUP/$ARTIFACT/$VE
RSION> [required]
--title TEXT Dataset title [required]
--abstract TEXT Dataset abstract max 200 chars [required]
--description TEXT Dataset description [required]
--license TEXT License (see dalicc.net) [required]
--apikey TEXT API key [required]
--help Show this message and exit.
```
The script uploads all given files and all files in the given folders to the given remote.
Then registers them on the databus.


#### Example of using upload-and-deploy command

```bash
databusclient upload-and-deploy \
--webdav-url https://cloud.scadsai.uni-leipzig.de/remote.php/webdav \
--remote scads-nextcloud \
--path test \
--version-id https://databus.org/user/dataset/version/1.0 \
--title "Test Dataset" \
--abstract "This is a short abstract of the test dataset." \
--description "This dataset was uploaded for testing the Nextcloud → Databus deployment pipeline." \
--license https://dalicc.net/licenselibrary/Apache-2.0 \
--apikey "API-KEY" \
/home/test \
/home/test_folder/test
```


### deploy-with-metadata command
```bash
databusclient deploy-with-metadata --help
```
```text
Usage: databusclient deploy-with-metadata [OPTIONS]

Deploy to DBpedia Databus using metadata json file.

Options:
--metadata PATH Path to metadata JSON file [required]
--version-id TEXT Target databus version/dataset identifier of the form <h
ttps://databus.dbpedia.org/$ACCOUNT/$GROUP/$ARTIFACT/$VE
RSION> [required]
--title TEXT Dataset title [required]
--abstract TEXT Dataset abstract max 200 chars [required]
--description TEXT Dataset description [required]
--license TEXT License (see dalicc.net) [required]
--apikey TEXT API key [required]
--help Show this message and exit.
```

Use the metadata.json file (see [databusclient/metadata.json](databusclient/metadata.json)) to list all files which should be added to the databus.
The script registers all files on the databus.


#### Examples of using deploy-with-metadata command

```bash
databusclient deploy-with-metadata \
--metadata /home/metadata.json \
--version-id https://databus.org/user/dataset/version/1.0 \
--title "Test Dataset" \
--abstract "This is a short abstract of the test dataset." \
--description "This dataset was uploaded for testing the Nextcloud → Databus deployment pipeline." \
--license https://dalicc.net/licenselibrary/Apache-2.0 \
--apikey "API-KEY"
```


## Module Usage
### Step 1: Create lists of distributions for the dataset

```python
Expand Down
75 changes: 75 additions & 0 deletions databusclient/cli.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
#!/usr/bin/env python3
import json

import click
from typing import List
from databusclient import client

from nextcloudclient import upload

@click.group()
def app():
Expand Down Expand Up @@ -36,6 +39,78 @@ def deploy(version_id, title, abstract, description, license_url, apikey, distri
client.deploy(dataid=dataid, api_key=apikey)


@app.command()
@click.option(
"--metadata", "metadata_file",
required=True,
type=click.Path(exists=True),
help="Path to metadata JSON file",
)
@click.option(
"--version-id", "version_id",
required=True,
help="Target databus version/dataset identifier of the form "
"<https://databus.dbpedia.org/$ACCOUNT/$GROUP/$ARTIFACT/$VERSION>",
)
@click.option("--title", required=True, help="Dataset title")
@click.option("--abstract", required=True, help="Dataset abstract max 200 chars")
@click.option("--description", required=True, help="Dataset description")
@click.option("--license", "license_url", required=True, help="License (see dalicc.net)")
@click.option("--apikey", required=True, help="API key")
def deploy_with_metadata(metadata_file, version_id, title, abstract, description, license_url, apikey):
"""
Deploy to DBpedia Databus using metadata json file.
"""

with open(metadata_file, 'r') as f:
metadata = json.load(f)

client.deploy_from_metadata(metadata, version_id, title, abstract, description, license_url, apikey)


@app.command()
@click.option(
"--webdav-url", "webdav_url",
required=True,
help="WebDAV URL (e.g., https://cloud.example.com/remote.php/webdav)",
)
@click.option(
"--remote",
required=True,
help="rclone remote name (e.g., 'nextcloud')",
)
@click.option(
"--path",
required=True,
help="Remote path on Nextcloud (e.g., 'datasets/mydataset')",
)
@click.option(
"--version-id", "version_id",
required=True,
help="Target databus version/dataset identifier of the form "
"<https://databus.dbpedia.org/$ACCOUNT/$GROUP/$ARTIFACT/$VERSION>",
)
@click.option("--title", required=True, help="Dataset title")
@click.option("--abstract", required=True, help="Dataset abstract max 200 chars")
@click.option("--description", required=True, help="Dataset description")
@click.option("--license", "license_url", required=True, help="License (see dalicc.net)")
@click.option("--apikey", required=True, help="API key")
@click.argument(
"files",
nargs=-1,
type=click.Path(exists=True),
)
def upload_and_deploy(webdav_url, remote, path, version_id, title, abstract, description, license_url, apikey,
files: List[str]):
"""
Upload files to Nextcloud and deploy to DBpedia Databus.
"""

click.echo(f"Uploading data to nextcloud: {remote}")
metadata = upload.upload_to_nextcloud(files, remote, path, webdav_url)
client.deploy_from_metadata(metadata, version_id, title, abstract, description, license_url, apikey)


@app.command()
@click.argument("databusuris", nargs=-1, required=True)
@click.option("--localdir", help="Local databus folder (if not given, databus folder structure is created in current working directory)")
Expand Down
Loading