Skip to content

Commit 5277f26

Browse files
author
DhanashreePetare
committed
Add on-the-fly compression conversion during download (Issue #18)
1 parent e16ff76 commit 5277f26

6 files changed

Lines changed: 383 additions & 16 deletions

File tree

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# project-specific
22
tmp/
3+
test-download/
4+
vault-token.dat
35

46
# Byte-compiled / optimized / DLL files
57
__pycache__/

README.md

Lines changed: 41 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,10 @@ docker run --rm -v $(pwd):/data dbpedia/databus-python-client download $DOWNLOAD
166166
- If the dataset/files to be downloaded require vault authentication, you need to provide a vault token with `--vault-token /path/to/vault-token.dat`. See [Registration (Access Token)](#registration-access-token) for details on how to get a vault token.
167167
- `--databus-key`
168168
- If the databus is protected and needs API key authentication, you can provide the API key with `--databus-key YOUR_API_KEY`.
169+
- `--convert-to`
170+
- Enables on-the-fly compression format conversion during download. Supported formats: `bz2`, `gz`, `xz`. Downloaded files will be automatically decompressed and recompressed to the target format. Example: `--convert-to gz` converts all downloaded compressed files to gzip format.
171+
- `--convert-from`
172+
- Optional filter to specify which source compression format should be converted. Use with `--convert-to` to convert only files with a specific compression format. Example: `--convert-to gz --convert-from bz2` converts only `.bz2` files to `.gz`, leaving other formats unchanged.
169173

170174
**Help and further information on download command:**
171175
```bash
@@ -178,23 +182,33 @@ docker run --rm -v $(pwd):/data dbpedia/databus-python-client download --help
178182
Usage: databusclient download [OPTIONS] DATABUSURIS...
179183

180184
Download datasets from databus, optionally using vault access if vault
181-
options are provided.
185+
options are provided. Supports on-the-fly compression format conversion
186+
using --convert-to and --convert-from options.
182187

183188
Options:
184-
--localdir TEXT Local databus folder (if not given, databus folder
185-
structure is created in current working directory)
186-
--databus TEXT Databus URL (if not given, inferred from databusuri,
187-
e.g. https://databus.dbpedia.org/sparql)
188-
--vault-token TEXT Path to Vault refresh token file
189-
--databus-key TEXT Databus API key to download from protected databus
190-
--all-versions When downloading artifacts, download all versions
191-
instead of only the latest
192-
--authurl TEXT Keycloak token endpoint URL [default:
193-
https://auth.dbpedia.org/realms/dbpedia/protocol/openid-
194-
connect/token]
195-
--clientid TEXT Client ID for token exchange [default: vault-token-
196-
exchange]
197-
--help Show this message and exit.
189+
--localdir TEXT Local databus folder (if not given, databus
190+
folder structure is created in current working
191+
directory)
192+
--databus TEXT Databus URL (if not given, inferred from
193+
databusuri, e.g.
194+
https://databus.dbpedia.org/sparql)
195+
--vault-token TEXT Path to Vault refresh token file
196+
--databus-key TEXT Databus API key to download from protected
197+
databus
198+
--all-versions When downloading artifacts, download all
199+
versions instead of only the latest
200+
--authurl TEXT Keycloak token endpoint URL [default:
201+
https://auth.dbpedia.org/realms/dbpedia/protocol
202+
/openid-connect/token]
203+
--clientid TEXT Client ID for token exchange [default: vault-
204+
token-exchange]
205+
--convert-to [bz2|gz|xz] Target compression format for on-the-fly
206+
conversion during download (supported: bz2, gz,
207+
xz)
208+
--convert-from [bz2|gz|xz] Source compression format to convert from
209+
(optional filter). Only files with this
210+
compression will be converted.
211+
--help Show this message and exit.
198212
```
199213
200214
#### Examples of using the download command
@@ -247,6 +261,18 @@ databusclient download 'PREFIX dcat: <http://www.w3.org/ns/dcat#> SELECT ?x WHER
247261
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download 'PREFIX dcat: <http://www.w3.org/ns/dcat#> SELECT ?x WHERE { ?sub dcat:downloadURL ?x . } LIMIT 10' --databus https://databus.dbpedia.org/sparql
248262
```
249263
264+
**Download with Compression Conversion**: download files and convert them to a different compression format on-the-fly
265+
```bash
266+
# Convert all compressed files to gzip format
267+
databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01 --convert-to gz
268+
269+
# Convert only bz2 files to xz format, leaving other compressions unchanged
270+
databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals --convert-to xz --convert-from bz2
271+
272+
# Download a collection and unify all files to bz2 format
273+
databusclient download https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12 --convert-to bz2
274+
```
275+
250276
<a id="cli-deploy"></a>
251277
### Deploy
252278

0 commit comments

Comments
 (0)