Skip to content

dbpedia/databus-python-client

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Databus Python Client

Command-line and Python client for downloading and deploying datasets on DBpedia Databus.

Table of Contents

Quickstart

The client supports two main workflows: downloading datasets from the Databus and deploying datasets to the Databus. Below you can choose how to run it (Python or Docker), then follow the sections on DBpedia downloads, CLI usage, or module usage.

You can use either Python or Docker. Both methods support all client features. The Docker image is available at dbpedia/databus-python-client.

Python

Requirements: Python 3.11+ and pip

Before using the client, install it via pip:

python3 -m pip install databusclient

You can then use the client in the command line:

databusclient --help
databusclient deploy --help
databusclient delete --help
databusclient download --help

Docker

Requirements: Docker

docker run --rm -v $(pwd):/data dbpedia/databus-python-client --help
docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy --help
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download --help

DBpedia

Commands to download the DBpedia Knowledge Graphs generated by Live Fusion. DBpedia Live Fusion publishes two kinds of KGs:

  1. Open Core Knowledge Graphs under CC-BY-SA license, open with copyleft/share-alike, no registration needed.
  2. Industry Knowledge Graphs under BUSL 1.1 license, unrestricted for research and experimentation, commercial license for productive use, free registration needed.

Registration (Access Token)

To download BUSL 1.1 licensed datasets, you need to register and get an access token.

  1. If you do not have a DBpedia Account yet (Forum/Databus), please register at https://account.dbpedia.org
  2. Log in at https://account.dbpedia.org and create your token.
  3. Save the token to a file, e.g. vault-token.dat.

DBpedia Knowledge Graphs

Download Live Fusion KG Dump (BUSL 1.1, registration needed)

High-frequency, conflict-resolved knowledge graph that merges Live Wikipedia and Wikidata signals into a single, queryable dump for enterprise consumption. More information

# Python
databusclient download https://databus.dbpedia.org/dbpedia-enterprise/live-fusion-kg-dump --vault-token vault-token.dat
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia-enterprise/live-fusion-kg-dump --vault-token vault-token.dat

Download Enriched Knowledge Graphs (BUSL 1.1, registration needed)

DBpedia Wikipedia Extraction Enriched

DBpedia-based enrichment of structured Wikipedia extractions (currently EN DBpedia only). More information

# Python
databusclient download https://databus.dbpedia.org/dbpedia-enterprise/dbpedia-wikipedia-kg-enriched-dump --vault-token vault-token.dat
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia-enterprise/dbpedia-wikipedia-kg-enriched-dump --vault-token vault-token.dat

Download DBpedia Wikipedia Knowledge Graphs (CC-BY-SA, no registration needed)

Original extraction of structured Wikipedia data before enrichment. More information

# Python
databusclient download https://databus.dbpedia.org/dbpedia/dbpedia-wikipedia-kg-dump
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/dbpedia-wikipedia-kg-dump

Download DBpedia Wikidata Knowledge Graphs (CC-BY-SA, no registration needed)

Original extraction of structured Wikidata data before enrichment. More information

# Python
databusclient download https://databus.dbpedia.org/dbpedia/dbpedia-wikidata-kg-dump
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/dbpedia-wikidata-kg-dump

CLI Usage

To get started with the command-line interface (CLI) of the databus-python-client, you can use either the Python installation or the Docker image. The examples below show both methods.

Help and further general information:

# Python
databusclient --help
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client --help

# Output:
Usage: databusclient [OPTIONS] COMMAND [ARGS]...

  Databus Client CLI

Options:
  --help  Show this message and exit.

Commands:
  deploy    Flexible deploy to Databus command supporting three modes:
  download  Download datasets from databus, optionally using vault access...

Download

With the download command, you can download datasets or parts thereof from the Databus. The download command expects one or more Databus URIs or a SPARQL query as arguments. The URIs can point to files, versions, artifacts, groups, or collections. If a SPARQL query is provided, the query must return download URLs from the Databus which will be downloaded.

# Python
databusclient download $DOWNLOADTARGET
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download $DOWNLOADTARGET
  • $DOWNLOADTARGET

    • Can be any Databus URI including collections OR SPARQL query (or several thereof).
  • --localdir

    • If no --localdir is provided, the current working directory is used as base directory. The downloaded files will be stored in the working directory in a folder structure according to the Databus layout, i.e. ./$ACCOUNT/$GROUP/$ARTIFACT/$VERSION/.
  • --vault-token

    • If the dataset/files to be downloaded require vault authentication, you need to provide a vault token with --vault-token /path/to/vault-token.dat. See Registration (Access Token) for details on how to get a vault token.

    Note: Vault tokens are only required for certain protected Databus hosts (for example: data.dbpedia.io, data.dev.dbpedia.link). The client now detects those hosts and will fail early with a clear message if a token is required but not provided. Do not pass --vault-token for public downloads.

  • --databus-key

    • If the databus is protected and needs API key authentication, you can provide the API key with --databus-key YOUR_API_KEY.

Help and further information on download command:

# Python
databusclient download --help
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download --help

# Output:
Usage: databusclient download [OPTIONS] DATABUSURIS...

  Download datasets from databus, optionally using vault access if vault
  options are provided.

Options:
  --localdir TEXT     Local databus folder (if not given, databus folder
                      structure is created in current working directory)
  --databus TEXT      Databus URL (if not given, inferred from databusuri,
                      e.g. https://databus.dbpedia.org/sparql)
  --vault-token TEXT  Path to Vault refresh token file
  --databus-key TEXT  Databus API key to download from protected databus
  --all-versions      When downloading artifacts, download all versions
                      instead of only the latest
  --authurl TEXT      Keycloak token endpoint URL  [default:
                      https://auth.dbpedia.org/realms/dbpedia/protocol/openid-
                      connect/token]
  --clientid TEXT     Client ID for token exchange  [default: vault-token-
                      exchange]
  --help              Show this message and exit.

Examples of using the download command

Download File: download of a single file

# Python
databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01/mappingbased-literals_lang=az.ttl.bz2
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01/mappingbased-literals_lang=az.ttl.bz2

Download Version: download of all files of a specific version

# Python
databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01

Download Artifact: download of all files with the latest version of an artifact

# Python
databusclient download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals

Download Group: download of all files with the latest version of all artifacts of a group

# Python
databusclient download https://databus.dbpedia.org/dbpedia/mappings
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/mappings

Download Collection: download of all files within a collection

# Python
databusclient download https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12

Download Query: download of all files returned by a query (SPARQL endpoint must be provided with --databus)

# Python
databusclient download 'PREFIX dcat: <http://www.w3.org/ns/dcat#> SELECT ?x WHERE { ?sub dcat:downloadURL ?x . } LIMIT 10' --databus https://databus.dbpedia.org/sparql
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download 'PREFIX dcat: <http://www.w3.org/ns/dcat#> SELECT ?x WHERE { ?sub dcat:downloadURL ?x . } LIMIT 10' --databus https://databus.dbpedia.org/sparql

Deploy

With the deploy command, you can deploy datasets to the Databus. The deploy command supports three modes:

  1. Classic dataset deployment via list of distributions
  2. Metadata-based deployment via metadata JSON file
  3. Upload & deploy via Nextcloud/WebDAV
# Python
databusclient deploy [OPTIONS] [DISTRIBUTIONS]...
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy [OPTIONS] [DISTRIBUTIONS]...

Help and further information on deploy command:

# Python
databusclient deploy --help
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy --help

# Output:
Usage: databusclient deploy [OPTIONS] [DISTRIBUTIONS]...

  Flexible deploy to Databus command supporting three modes:

  - Classic deploy (distributions as arguments)

  - Metadata-based deploy (--metadata <file>)

  - Upload & deploy via Nextcloud (--webdav-url, --remote, --path)

Options:
  --version-id TEXT   Target databus version/dataset identifier of the form <h
                      ttps://databus.dbpedia.org/$ACCOUNT/$GROUP/$ARTIFACT/$VE
                      RSION>  [required]
  --title TEXT        Dataset title  [required]
  --abstract TEXT     Dataset abstract max 200 chars  [required]
  --description TEXT  Dataset description  [required]
  --license TEXT      License (see dalicc.net)  [required]
  --apikey TEXT       API key  [required]
  --metadata PATH     Path to metadata JSON file (for metadata mode)
  --webdav-url TEXT   WebDAV URL (e.g.,
                      https://cloud.example.com/remote.php/webdav)
  --remote TEXT       rclone remote name (e.g., 'nextcloud')
  --path TEXT         Remote path on Nextcloud (e.g., 'datasets/mydataset')
  --help              Show this message and exit.

Mode 1: Classic Deploy (Distributions)

# Python
databusclient deploy \
--version-id https://databus.dbpedia.org/user1/group1/artifact1/2022-05-18 \
--title "Client Testing" \
--abstract "Testing the client...." \
--description "Testing the client...." \
--license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 \
--apikey MYSTERIOUS \
'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy \
--version-id https://databus.dbpedia.org/user1/group1/artifact1/2022-05-18 \
--title "Client Testing" \
--abstract "Testing the client...." \
--description "Testing the client...." \
--license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 \
--apikey MYSTERIOUS \
'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger

A few more notes for CLI usage:

  • The content variants can be left out ONLY IF there is just one distribution
    • For complete inferred: Just use the URL with https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml
    • If other parameters are used, you need to leave them empty like https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml||yml|7a751b6dd5eb8d73d97793c3c564c71ab7b565fa4ba619e4a8fd05a6f80ff653:367116

Mode 2: Deploy with Metadata File

Use a JSON metadata file to define all distributions. The metadata.json should list all distributions and their metadata. All files referenced there will be registered on the Databus.

# Python
databusclient deploy \
  --metadata ./metadata.json \
  --version-id https://databus.dbpedia.org/user1/group1/artifact1/1.0 \
  --title "Metadata Deploy Example" \
  --abstract "This is a short abstract of the dataset." \
  --description "This dataset was uploaded using metadata.json." \
  --license https://dalicc.net/licenselibrary/Apache-2.0 \
  --apikey "API-KEY"
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy \
  --metadata ./metadata.json \
  --version-id https://databus.dbpedia.org/user1/group1/artifact1/1.0 \
  --title "Metadata Deploy Example" \
  --abstract "This is a short abstract of the dataset." \
  --description "This dataset was uploaded using metadata.json." \
  --license https://dalicc.net/licenselibrary/Apache-2.0 \
  --apikey "API-KEY"

Example metadata.json metadata file structure (file_format and compression are optional):

[
  {
    "checksum": "0929436d44bba110fc7578c138ed770ae9f548e195d19c2f00d813cca24b9f39",
    "size": 12345,
    "url": "https://cloud.example.com/remote.php/webdav/datasets/mydataset/example.ttl",
    "file_format": "ttl"
  },
  {
    "checksum": "2238acdd7cf6bc8d9c9963a9f6014051c754bf8a04aacc5cb10448e2da72c537",
    "size": 54321,
    "url": "https://cloud.example.com/remote.php/webdav/datasets/mydataset/example.csv.gz",
    "file_format": "csv",
    "compression": "gz"
  }
]

Mode 3: Upload & Deploy via Nextcloud

Upload local files or folders to a WebDAV/Nextcloud instance and automatically deploy to DBpedia Databus. Rclone is required.

# Python
databusclient deploy \
  --webdav-url https://cloud.example.com/remote.php/webdav \
  --remote nextcloud \
  --path datasets/mydataset \
  --version-id https://databus.dbpedia.org/user1/group1/artifact1/1.0 \
  --title "Test Dataset" \
  --abstract "Short abstract of dataset" \
  --description "This dataset was uploaded for testing the Nextcloud → Databus pipeline." \
  --license https://dalicc.net/licenselibrary/Apache-2.0 \
  --apikey "API-KEY" \
  ./localfile1.ttl \
  ./data_folder
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client deploy \
  --webdav-url https://cloud.example.com/remote.php/webdav \
  --remote nextcloud \
  --path datasets/mydataset \
  --version-id https://databus.dbpedia.org/user1/group1/artifact1/1.0 \
  --title "Test Dataset" \
  --abstract "Short abstract of dataset" \
  --description "This dataset was uploaded for testing the Nextcloud → Databus pipeline." \
  --license https://dalicc.net/licenselibrary/Apache-2.0 \
  --apikey "API-KEY" \
  ./localfile1.ttl \
  ./data_folder

Delete

With the delete command you can delete collections, groups, artifacts, and versions from the Databus. Deleting files is not supported via API.

Note: Deleting datasets will recursively delete all data associated with the dataset below the specified level. Please use this command with caution. As security measure, the delete command will prompt you for confirmation before proceeding with any deletion.

# Python
databusclient delete [OPTIONS] DATABUSURIS...
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete [OPTIONS] DATABUSURIS...

Help and further information on delete command:

# Python
databusclient delete --help
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete --help

# Output:
Usage: databusclient delete [OPTIONS] DATABUSURIS...

  Delete a dataset from the databus.

  Delete a group, artifact, or version identified by the given databus URI.
  Will recursively delete all data associated with the dataset.

Options:
  --databus-key TEXT  Databus API key to access protected databus  [required]
  --dry-run           Perform a dry run without actual deletion
  --force             Force deletion without confirmation prompt
  --help              Show this message and exit.

To authenticate the delete request, you need to provide an API key with --databus-key YOUR_API_KEY.

If you want to perform a dry run without actual deletion, use the --dry-run option. This will show you what would be deleted without making any changes.

As security measure, the delete command will prompt you for confirmation before proceeding with the deletion. If you want to skip this prompt, you can use the --force option.

Examples of using the delete command

Delete Version: delete a specific version

# Python
databusclient delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01 --databus-key YOUR_API_KEY
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals/2022.12.01 --databus-key YOUR_API_KEY

Delete Artifact: delete an artifact and all its versions

# Python
databusclient delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals --databus-key YOUR_API_KEY
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/mappings/mappingbased-literals --databus-key YOUR_API_KEY

Delete Group: delete a group and all its artifacts and versions

# Python
databusclient delete https://databus.dbpedia.org/dbpedia/mappings --databus-key YOUR_API_KEY
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/mappings --databus-key YOUR_API_KEY

Delete Collection: delete collection

# Python
databusclient delete https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12 --databus-key YOUR_API_KEY
# Docker
docker run --rm -v $(pwd):/data dbpedia/databus-python-client delete https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-12 --databus-key YOUR_API_KEY

Module Usage

Deploy

Step 1: Create lists of distributions for the dataset

from databusclient import create_distribution

# create a list
distributions = []

# minimal requirements
# compression and filetype will be inferred from the path
# this will trigger the download of the file to evaluate the shasum and content length
distributions.append(
    create_distribution(url="https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml", cvs={"type": "swagger"})
)

# full parameters
# will just place parameters correctly, nothing will be downloaded or inferred
distributions.append(
    create_distribution(
        url="https://example.org/some/random/file.csv.bz2",
        cvs={"type": "example", "realfile": "false"},
        file_format="csv",
        compression="bz2",
        sha256_length_tuple=("7a751b6dd5eb8d73d97793c3c564c71ab7b565fa4ba619e4a8fd05a6f80ff653", 367116)
    )
)

A few notes:

  • The dict for content variants can be empty ONLY IF there is just one distribution
  • There can be no compression if there is no file format

Step 2: Create dataset

from databusclient import create_dataset

# minimal way
dataset = create_dataset(
  version_id="https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18",
  title="Client Testing",
  abstract="Testing the client....",
  description="Testing the client....",
  license_url="http://dalicc.net/licenselibrary/AdaptivePublicLicense10",
  distributions=distributions,
)

# with group metadata
dataset = create_dataset(
  version_id="https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18",
  title="Client Testing",
  abstract="Testing the client....",
  description="Testing the client....",
  license_url="http://dalicc.net/licenselibrary/AdaptivePublicLicense10",
  distributions=distributions,
  group_title="Title of group1",
  group_abstract="Abstract of group1",
  group_description="Description of group1"
)

NOTE: Group metadata is applied only if all group parameters are set.

Step 3: Deploy to Databus

from databusclient import deploy

# to deploy something you just need the dataset from the previous step and an API key
# API key can be found (or generated) at https://$$DATABUS_BASE$$/$$USER$$#settings
deploy(dataset, "mysterious API key")

Development & Contributing

Install development dependencies yourself or via Poetry:

poetry install --with dev

Linting

The used linter is Ruff. Ruff is configured in pyproject.toml and is enforced in CI (.github/workflows/ruff.yml).

For development, you can run linting locally with ruff check . and optionally auto-format with ruff format ..

To ensure compatibility with the pyproject.toml configured dependencies, run Ruff via Poetry:

# To check for linting issues:
poetry run ruff check .

# To auto-format code:
poetry run ruff format .

Testing

When developing new features please make sure to add appropriate tests and ensure that all tests pass. Tests are under tests/ and use pytest as test framework.

When fixing bugs or refactoring existing code, please make sure to add tests that cover the affected functionality. The current test coverage is very low, so any additional tests are highly appreciated.

To run tests locally, use:

pytest tests/

Or to ensure compatibility with the pyproject.toml configured dependencies, run pytest via Poetry:

poetry run pytest tests/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 7

Languages