Skip to content

Conversation

@Integer-Ctrl
Copy link
Contributor

@Integer-Ctrl Integer-Ctrl commented Dec 8, 2025

Summary by CodeRabbit

  • New Features

    • Full download command: collections, versions, artifacts, groups and SPARQL queries with Vault/API-key auth and an --all-versions option.
    • New deploy utilities and metadata-driven deploy flow; WebDAV upload support.
  • Breaking Change

    • Legacy monolithic client removed; functionality split into new API modules.
  • Documentation

    • README: Development & Contributing, linting (Ruff), testing (pytest), CLI help updates.
  • Chores

    • Python requirement bumped to 3.11+, package version updated; CI/tooling upgraded to Ruff and newer Python runner.

✏️ Tip: You can customize this high-level summary in your review settings.

@Integer-Ctrl Integer-Ctrl self-assigned this Dec 8, 2025
@coderabbitai
Copy link

coderabbitai bot commented Dec 8, 2025

📝 Walkthrough

Walkthrough

Refactors the monolithic client into modular APIs under databusclient/api/ (deploy, download, delete), removes databusclient/client.py, adds a WebDAV extension, updates CLI to use new APIs, and modernizes tooling to Python 3.11 and Ruff linting.

Changes

Cohort / File(s) Summary
CI & Packaging
\.github/workflows/python-CI.yml, pyproject.toml, Dockerfile
Upgrade GitHub Actions steps (checkout v4, setup-python v5), bump Python to 3.11, replace flake8 with Ruff, add Ruff config and dev dependency, bump package version and description, update Docker base image.
Top-level exports
databusclient/__init__.py
Re-export create_dataset, create_distribution, and deploy from databusclient.api.deploy instead of the removed client module.
API package init
databusclient/api/__init__.py
New package initializer (empty file).
Utilities
databusclient/api/utils.py
Rename get_databus_id_parts_from_uriget_databus_id_parts_from_file_url; rename get_json_ld_from_databusfetch_databus_jsonld; add URI normalization, Accept header, optional X-API-KEY, timeout, and error raising.
Delete API
databusclient/api/delete.py
Adapt deletion flows to call fetch_databus_jsonld and use get_databus_id_parts_from_file_url; update internal helpers and dispatch logic accordingly.
Deploy API
databusclient/api/deploy.py
New module providing deployment orchestration, parsing/inference helpers, dataset/distribution constructors, exceptions (DeployError, BadArgumentException), and entry points deploy / deploy_from_metadata.
Download API
databusclient/api/download.py
New module implementing file/version/artifact/group/collection downloads, SPARQL discovery, batch downloads, Vault token exchange and refresh flow, API-key fallback, and public download(...) entry.
CLI
databusclient/cli.py
Replace direct client usage with databusclient.api.* and databusclient.extensions.webdav; expand deploy/download options (metadata_file, webdav_url, distributions, all_versions, authurl, clientid, databus-key); rewire flows to new APIs.
Extensions
databusclient/extensions/__init__.py, databusclient/extensions/webdav.py
New extensions package; rename upload_to_nextcloudupload_to_webdav with same parameters and preserved behavior.
Removed legacy
databusclient/client.py, databusclient/consume/download.py
Remove monolithic client.py (deployment, download, helpers) and deleted Downloder placeholder class.
Tests
tests/test_download.py, tests/test_deploy.py
Update tests to import and call new API modules (databusclient.api.download, databusclient.api.deploy); add tests for new helpers and exceptions; adjust fixtures/constants.
Docs
README.md
Add Development & Contributing section, update Python requirement to 3.11+, document Poetry, Ruff, pytest commands, and document new CLI options (--all-versions, --authurl, --clientid, delete).

Sequence Diagram(s)

sequenceDiagram
    participant CLI
    participant DeployAPI as api.deploy
    participant WebDAV
    participant DatabusAPI as Databus

    CLI->>DeployAPI: create_dataset(...) or deploy_from_metadata(...)
    alt Classic Deploy
        CLI->>DeployAPI: create_dataset(...)
        DeployAPI-->>CLI: dataset JSON-LD
        CLI->>DeployAPI: deploy(dataset, api_key)
        DeployAPI->>DatabusAPI: POST /datasets (with API key)
        DatabusAPI-->>DeployAPI: 2xx / error
    else Metadata file
        CLI->>CLI: load metadata file
        CLI->>DeployAPI: deploy_from_metadata(metadata,...)
        DeployAPI->>DatabusAPI: POST /datasets
        DatabusAPI-->>DeployAPI: result
    else WebDAV mode
        CLI->>WebDAV: upload_to_webdav(paths, remote, path, webdav_url)
        WebDAV-->>CLI: uploaded URLs / metadata
        CLI->>DeployAPI: deploy_from_metadata(generated_metadata,...)
        DeployAPI->>DatabusAPI: POST /datasets
        DatabusAPI-->>DeployAPI: result
    end
    DeployAPI-->>CLI: deployment result/status
Loading
sequenceDiagram
    participant Client
    participant DownloadAPI as api.download
    participant Databus
    participant FileServer
    participant Vault

    Client->>DownloadAPI: download(localDir, endpoint, databusURIs, token?, databus_key?)
    alt direct file URL
        DownloadAPI->>FileServer: HEAD/GET file
        FileServer-->>DownloadAPI: 200/redirect/401/404
    else SPARQL collection
        DownloadAPI->>Databus: SPARQL query
        Databus-->>DownloadAPI: file URLs
    else artifact/version/group
        DownloadAPI->>Databus: fetch_databus_jsonld(uri)
        Databus-->>DownloadAPI: JSON-LD with download URLs
    end
    loop per file URL
        DownloadAPI->>FileServer: GET file
        alt 401 with Bearer challenge
            DownloadAPI->>Vault: exchange refresh_token -> access_token
            Vault-->>DownloadAPI: access_token
            DownloadAPI->>FileServer: GET with Authorization: Bearer
        else 401 without Bearer
            DownloadAPI->>FileServer: GET with X-API-KEY
        end
        FileServer-->>DownloadAPI: file content / error
    end
    DownloadAPI-->>Client: files saved to localDir
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Focus review on:
    • databusclient/api/download.py — Vault exchange logic, auth fallbacks, SPARQL parsing, timeout and error handling, and progress/download robustness.
    • databusclient/api/deploy.py — metadata parsing, inference of file stats, POST semantics, and exception handling.
    • databusclient/api/utils.py — renamed functions and HTTP behavior; ensure all call sites use updated names and semantics.
    • databusclient/cli.py — CLI-to-API parameter mapping and validation; ensure removed client references are fully replaced.
    • Confirm no residual references to removed databusclient/client.py and adjust imports across tests.

Possibly related PRs

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.27% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Refactor/project structure' is vague and overly broad. While it nominally relates to structural changes in the codebase, it fails to convey the specific primary changes (API module organization, tooling upgrades, removal of client.py, download/deploy refactoring) that would help reviewers quickly understand the changeset's scope. Consider a more descriptive title such as 'Reorganize codebase into api modules and upgrade tooling' or 'Refactor: Extract deploy/download into api modules, upgrade Python/tooling' to clearly communicate the main architectural changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor/project-structure

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5558c04 and 072f37c.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (17)
  • .github/workflows/python-CI.yml (1 hunks)
  • Dockerfile (1 hunks)
  • README.md (5 hunks)
  • databusclient/__init__.py (1 hunks)
  • databusclient/api/__init__.py (1 hunks)
  • databusclient/api/delete.py (12 hunks)
  • databusclient/api/deploy.py (1 hunks)
  • databusclient/api/download.py (1 hunks)
  • databusclient/api/utils.py (2 hunks)
  • databusclient/cli.py (5 hunks)
  • databusclient/client.py (0 hunks)
  • databusclient/consume/download.py (0 hunks)
  • databusclient/extensions/__init__.py (1 hunks)
  • databusclient/extensions/webdav.py (5 hunks)
  • pyproject.toml (2 hunks)
  • tests/test_deploy.py (2 hunks)
  • tests/test_download.py (2 hunks)
💤 Files with no reviewable changes (2)
  • databusclient/consume/download.py
  • databusclient/client.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • databusclient/extensions/init.py
  • databusclient/init.py
  • databusclient/api/init.py
  • databusclient/api/delete.py
  • Dockerfile
  • databusclient/extensions/webdav.py
🧰 Additional context used
🧬 Code graph analysis (5)
tests/test_deploy.py (1)
databusclient/api/deploy.py (6)
  • deploy (418-467)
  • create_dataset (269-415)
  • create_distribution (176-207)
  • get_file_info (158-173)
  • _get_content_variants (27-47)
  • BadArgumentException (15-16)
tests/test_download.py (2)
databusclient/api/download.py (1)
  • download (532-652)
databusclient/cli.py (1)
  • download (161-183)
databusclient/api/deploy.py (1)
databusclient/cli.py (1)
  • deploy (49-127)
databusclient/cli.py (4)
databusclient/api/deploy.py (3)
  • deploy (418-467)
  • create_dataset (269-415)
  • deploy_from_metadata (470-516)
databusclient/api/delete.py (1)
  • delete (179-214)
databusclient/api/download.py (1)
  • download (532-652)
databusclient/extensions/webdav.py (1)
  • upload_to_webdav (31-91)
databusclient/api/download.py (2)
databusclient/api/utils.py (2)
  • fetch_databus_jsonld (33-50)
  • get_databus_id_parts_from_file_url (6-30)
databusclient/cli.py (1)
  • download (161-183)
🪛 LanguageTool
README.md

[uncategorized] ~571-~571: The official name of this software platform is spelled with a capital “H”.
Context: ...pyproject.toml and is enforced in CI (.github/workflows/ruff.yml). For development,...

(GITHUB)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (23)
.github/workflows/python-CI.yml (2)

21-25: Updated GitHub Actions and Python version.

The upgrade from checkout@v3 to v4 and setup-python@v3 to v5 with Python 3.11 aligns well with the refactor's modernization goals. These are standard version updates.

Verify that setup-python@v5 correctly supports Python 3.11 and that all dependencies are compatible with this Python version.


31-32: Linting workflow updated to use Ruff.

The replacement of flake8 with Ruff is a good modernization step. The --output-format=github flag correctly integrates lint output into GitHub's CI annotations. Ruff is properly configured as a dev dependency (^0.5.5) and configured in pyproject.toml with target Python 3.11 alignment.

pyproject.toml (1)

3-27: LGTM! Project modernization is well-aligned.

The version bump, extended description, Python 3.11 requirement, and Ruff tooling configuration are consistent with the refactoring objectives. The python = "^3.11" requirement at line 10 and target-version = "py311" at line 26 correctly align, making the str | None union syntax used throughout the codebase fully compatible.

README.md (1)

36-36: Python 3.11+ requirement is correctly documented.

The quickstart now accurately reflects the updated minimum Python version.

databusclient/api/utils.py (2)

6-30: URI parsing function is well-implemented.

The function correctly handles URIs with varying numbers of path segments by padding with None and returning exactly 6 parts. The multiline return type annotation improves readability.


33-50: JSON-LD fetching with proper error handling and timeout.

The function correctly:

  • Sets appropriate Accept header for JSON-LD
  • Handles optional API key authentication
  • Includes a 30-second timeout
  • Raises on HTTP errors via raise_for_status()

Note: Past review comments flagged str | None syntax for Python 3.9 compatibility, but this is now valid given the Python 3.11 requirement in pyproject.toml.

databusclient/api/download.py (7)

15-123: File download implementation is robust.

The _download_file function correctly handles:

  • Directory creation (line 55)
  • Redirect following via HEAD request with timeout (line 57)
  • Vault token authentication flow (lines 76-87)
  • API key authentication (lines 90-96)
  • 404 handling with graceful skip (lines 101-103)
  • Progress bar via tqdm

The timeout has been added to requests at lines 57, 70, 87, 96 (addressing past review comments).


155-172: SPARQL query fetching with proper error handling.

The function correctly includes:

  • Appropriate Accept: text/sparql header
  • Optional API key support
  • 30-second timeout (line 170)
  • Error handling via raise_for_status() (line 171)

This addresses the past review comment about missing error handling.


234-288: Vault token exchange is well-structured.

The __get_vault_access__ function implements proper OAuth token exchange with:

  • Refresh token loading from file or environment (lines 241-246)
  • Timeout on POST requests (lines 258, 282) - addressing past review comments
  • Proper error handling via raise_for_status() (lines 260, 284)
  • Audience extraction from download URL (lines 263-271)

However, the function accepts auth_url and client_id as parameters but doesn't validate them before use. If called with None values (despite defaults in download()), requests.post(None, ...) would fail unclearly.

Consider adding validation at the start of this function:

def __get_vault_access__(
    download_url: str, token_file: str, auth_url: str, client_id: str
) -> str:
    """
    Get Vault access token for a protected databus download.
    """
    if not auth_url or not client_id:
        raise ValueError("auth_url and client_id are required for Vault token exchange")
    # ... rest of function

This provides a clearer error message if the parameters are missing.


396-434: Version extraction with robust error handling.

The _get_databus_versions_of_artifact function now includes:

  • Validation that databus:hasVersion exists (lines 413-414)
  • Type checking and normalization for dict vs list (lines 416-421)
  • Filtering for valid entries with @id (lines 423-425)
  • Empty result validation (lines 427-428)

This addresses all concerns raised in past review comments about handling missing or malformed data.


437-459: File URL extraction with type checking.

The function now correctly skips parts where file is not a string (lines 455-457), addressing the past review comment about potential None values.


498-529: Artifact extraction with normalized handling.

The _get_databus_artifacts_of_group function now properly:

  • Handles None case (lines 507-508)
  • Normalizes dict/list variants (lines 510-517)
  • Validates item types before accessing keys (lines 520-522)

This addresses the past review comment about handling both dict and list forms.


532-652: Download orchestration with per-URI endpoint resolution.

The main download function correctly:

  • Resolves endpoint per URI (lines 561-569) - addressing past comment about shared mutable endpoint
  • Routes to appropriate download functions based on URI structure
  • Validates endpoint for queries (lines 640-641)

The uri_endpoint variable is properly used instead of mutating the shared endpoint parameter.

Minor issue at line 545: Docstring has "via their databus URIs" but past comment suggested "by the databus URIs" - this is readable as-is.

tests/test_deploy.py (2)

7-13: Imports correctly updated to new API structure.

The test file now imports from databusclient.api.deploy, aligning with the refactored module structure.


18-40: Excellent test coverage for content variant parsing.

The new test_get_content_variants function validates:

  • Content variant extraction with multiple key-value pairs (lines 20-26)
  • Empty content variant handling (lines 29-35)
  • Invalid format error handling via BadArgumentException (lines 38-39)

This provides good coverage of the parsing logic in _get_content_variants.

tests/test_download.py (1)

3-29: Test imports correctly updated to new API.

The test file now imports and uses api_download from databusclient.api.download, aligning with the refactored module structure. The test logic remains unchanged, ensuring continuity of test coverage.

databusclient/cli.py (2)

8-11: Imports correctly updated to use new API modules.

The CLI now imports from modular API packages (api_deploy, api_download, api_delete) and the webdav extension, aligning with the refactored architecture.


161-183: Download function with proper type hints and parameter forwarding.

The download command correctly:

  • Adds type hint List[str] for databusuris (line 162)
  • Declares all options with clear multiline formatting
  • Forwards all parameters to api_download
databusclient/api/deploy.py (5)

27-47: Content variant parsing with validation.

The _get_content_variants function now includes proper validation at lines 39-42, raising BadArgumentException for malformed key-value pairs. This addresses the past review comment about error handling.


148-155: File stats loading with timeout and correct status check.

The _load_file_stats function now includes:

  • 30-second timeout (line 149)
  • Correct status check >= 400 (line 150)

This addresses both concerns from past review comments.


269-415: Dataset creation with comprehensive validation.

The create_dataset function now includes:

  • Version ID format validation (lines 315-319) with clear error message
  • Proper extraction of version components (line 320)
  • Content variant validation for multi-file datasets (lines 339-343)

All past review comments about validation have been addressed.


418-467: Deploy function with proper error handling and timeout.

The deploy function now includes:

  • Safe dataid structure access with try-except (lines 443-446, 455-458)
  • 30-second timeout on POST request (line 452)
  • Proper error raising with DeployError

This addresses all concerns from past review comments.


470-516: Metadata-based deployment function is well-designed.

The deploy_from_metadata function:

  • Delegates distribution creation to helper (line 499)
  • Constructs dataset via create_dataset (lines 501-508)
  • Provides clear console output for tracking deployment
  • Properly propagates all required parameters

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (4)
databusclient/api/delete.py (1)

82-84: Consider using a more specific exception type.

Using a generic Exception makes it harder for callers to distinguish between HTTP failures and other errors. Consider using requests.HTTPError or a custom exception.

-        raise Exception(
-            f"Failed to delete {databusURI}: {response.status_code} - {response.text}"
-        )
+        raise requests.HTTPError(
+            f"Failed to delete {databusURI}: {response.status_code} - {response.text}",
+            response=response,
+        )
databusclient/api/download.py (1)

35-47: Variable file is shadowed.

The variable file from the tuple unpacking on line 35 is overwritten on line 47, which could lead to confusion when maintaining this code.

-        _host, account, group, artifact, version, file = get_databus_id_parts_from_uri(
+        _host, account, group, artifact, version, _file = get_databus_id_parts_from_uri(
             url
         )
databusclient/api/__init__.py (1)

1-1: Consider re-exporting key API symbols for convenience.

The empty __init__.py correctly establishes the package, but you may want to re-export key symbols from deploy, download, delete, and other submodules to provide a convenient public API surface. This would allow users to import directly from databusclient.api rather than requiring full module paths.

For example:

from databusclient.api.deploy import deploy  # or relevant exported symbols
from databusclient.api.download import download

__all__ = ["deploy", "download"]

This is optional and depends on your API design philosophy—explicit full-path imports (e.g., databusclient.api.deploy.deploy()) are also valid if you prefer to keep the API surface explicit and modular.

databusclient/api/deploy.py (1)

8-8: Consider renaming __debug to avoid confusion.

The variable __debug (with double underscore prefix) could be confused with Python's built-in __debug__ constant. Consider renaming to _debug or DEBUG for clarity.

Apply this diff:

-__debug = False
+_debug = False

And update references at lines 159, 438, and 446:

-    if __debug:
+    if _debug:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ace93c5 and c56da44.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (16)
  • .github/workflows/python-CI.yml (1 hunks)
  • README.md (4 hunks)
  • databusclient/__init__.py (1 hunks)
  • databusclient/api/__init__.py (1 hunks)
  • databusclient/api/delete.py (11 hunks)
  • databusclient/api/deploy.py (1 hunks)
  • databusclient/api/download.py (1 hunks)
  • databusclient/api/utils.py (2 hunks)
  • databusclient/cli.py (5 hunks)
  • databusclient/client.py (0 hunks)
  • databusclient/consume/download.py (0 hunks)
  • databusclient/extensions/__init__.py (1 hunks)
  • databusclient/extensions/webdav.py (5 hunks)
  • pyproject.toml (2 hunks)
  • tests/test_databusclient.py (2 hunks)
  • tests/test_download.py (2 hunks)
💤 Files with no reviewable changes (2)
  • databusclient/client.py
  • databusclient/consume/download.py
🧰 Additional context used
🧬 Code graph analysis (6)
tests/test_download.py (2)
databusclient/api/download.py (1)
  • download (497-614)
databusclient/cli.py (1)
  • download (161-183)
tests/test_databusclient.py (1)
databusclient/api/deploy.py (4)
  • deploy (407-448)
  • create_dataset (264-404)
  • create_distribution (171-202)
  • get_file_info (153-168)
databusclient/api/deploy.py (1)
databusclient/cli.py (1)
  • deploy (49-127)
databusclient/__init__.py (2)
databusclient/api/deploy.py (3)
  • deploy (407-448)
  • create_dataset (264-404)
  • create_distribution (171-202)
databusclient/cli.py (1)
  • deploy (49-127)
databusclient/api/download.py (2)
databusclient/api/utils.py (2)
  • fetch_databus_jsonld (32-49)
  • get_databus_id_parts_from_uri (6-29)
databusclient/cli.py (1)
  • download (161-183)
databusclient/api/delete.py (2)
databusclient/api/utils.py (2)
  • fetch_databus_jsonld (32-49)
  • get_databus_id_parts_from_uri (6-29)
databusclient/cli.py (1)
  • delete (197-210)
🪛 LanguageTool
README.md

[uncategorized] ~570-~570: The official name of this software platform is spelled with a capital “H”.
Context: ...pyproject.toml and is enforced in CI (.github/workflows/ruff.yml). For development,...

(GITHUB)

🪛 markdownlint-cli2 (0.18.1)
README.md

23-23: Link fragments should be valid

(MD051, link-fragments)


572-572: Spaces inside code span elements

(MD038, no-space-in-code)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (21)
pyproject.toml (1)

3-4: LGTM!

The version bump, expanded description, and Ruff integration are well configured. The target-version = "py39" correctly matches the python = "^3.9" dependency, and the src paths properly scope linting to the package and tests.

Also applies to: 20-20, 25-27

.github/workflows/python-CI.yml (1)

21-34: LGTM!

The CI workflow modernization is well done. Upgrading to actions/checkout@v4 and actions/setup-python@v5, along with migrating from flake8 to Ruff with --output-format=github, aligns with the tooling changes in pyproject.toml.

databusclient/api/utils.py (2)

6-29: LGTM!

The URI parsing logic is clear with proper normalization (stripping protocol) and consistent padding to ensure a 6-tuple output.


43-49: LGTM!

Good HTTP practices: explicit Accept header, optional API key, timeout, and proper error propagation via raise_for_status().

databusclient/api/delete.py (4)

4-6: LGTM!

Import changes correctly reflect the refactored function name fetch_databus_jsonld from the utils module.


101-137: LGTM!

The artifact deletion logic properly handles both single-version and multi-version cases, with appropriate null checks.


140-173: LGTM!

Group deletion correctly filters artifact URIs (excluding versioned entries) and recursively deletes artifacts before the group itself.


176-211: LGTM!

The routing logic properly distinguishes between collections, files, versions, artifacts, and groups based on URI structure, with appropriate messaging for unsupported operations.

databusclient/api/download.py (5)

1-10: LGTM!

Imports are well-organized and appropriate for the module's functionality.


119-146: LGTM!

Clean iteration over URLs with proper parameter forwarding.


167-186: LGTM!

Good use of SPARQLWrapper with POST method and proper header handling for API key authentication.


189-223: LGTM!

Robust validation of SPARQL response structure with clear error messages for malformed responses.


316-344: LGTM!

The version, artifact, and group download functions correctly compose the lower-level download utilities with proper parameter propagation.

Also applies to: 347-383, 419-438, 441-474

databusclient/extensions/__init__.py (1)

1-1: LGTM!

An empty __init__.py is the standard pattern for declaring a Python package and properly enables the import structure for databusclient.extensions.webdav referenced elsewhere in the PR.

databusclient/__init__.py (1)

2-8: LGTM! Clean refactoring of public API surface.

The import path update correctly aligns with the new modular structure, and the run() wrapper provides a convenient programmatic entry point to the CLI.

tests/test_download.py (1)

3-27: LGTM! Test imports correctly updated to use new API module.

The test now correctly imports and uses api_download from databusclient.api.download, consistent with the refactored API surface.

databusclient/extensions/webdav.py (1)

31-33: LGTM! Better naming reflects WebDAV protocol generality.

Renaming from upload_to_nextcloud to upload_to_webdav more accurately reflects that this function works with any WebDAV-compatible server, not just Nextcloud.

tests/test_databusclient.py (1)

7-7: LGTM! Tests correctly updated to use public API module.

The imports now reference the public API surface from databusclient.api.deploy, and get_file_info is appropriately exposed as a public function without the private __ prefix.

databusclient/cli.py (2)

161-183: LGTM! Download command correctly delegates to API module.

The CLI command properly maps parameters and delegates to api_download, with appropriate defaults for authentication URLs and client IDs.


8-11: LGTM! Clean separation of concerns with API modules.

The CLI now correctly delegates to specialized API modules (api_deploy, api_delete, api_download) and the webdav extension, improving modularity and maintainability.

databusclient/api/deploy.py (1)

451-497: LGTM! Metadata deployment flow is clean and well-structured.

The deploy_from_metadata function provides a clear workflow from metadata to deployment, with informative progress messages.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_deploy.py (1)

74-81: Signature mismatch in skipped test.

The get_file_info call passes two arguments (artifact_name, dst_string), but according to the API signature in databusclient/api/deploy.py, it only accepts one parameter (distribution_str: str). When this test is re-enabled, it will fail with a TypeError.

Apply this diff when fixing the test:

         (
             name,
             cvs,
             formatExtension,
             compression,
             sha256sum,
             content_length,
-        ) = get_file_info(artifact_name, dst_string)
+        ) = get_file_info(dst_string)

Note: The return value unpacking also appears incorrect—get_file_info returns a 5-tuple (cvs, format_extension, compression, sha256sum, content_length), not 6 values. This likely requires broader test refactoring.

♻️ Duplicate comments (1)
databusclient/api/deploy.py (1)

27-47: Make content-variant parsing more robust to extra = and whitespace

The key, value = kv.split("=") will fail if the value itself contains =, and keys/values are not trimmed. You already guard against missing =, which is good; consider also splitting once and normalizing:

-        key, value = kv.split("=")
-        cvs[key] = value
+        key, value = kv.split("=", 1)
+        key = key.strip()
+        value = value.strip()
+        if not key:
+            raise BadArgumentException("Content variant key cannot be empty")
+        cvs[key] = value

This keeps the current behaviour but is more tolerant and yields cleaner keys/values.

🧹 Nitpick comments (2)
databusclient/api/deploy.py (1)

148-155: Avoid loading entire file into memory when computing sha256/length

_load_file_stats reads resp.content fully into memory, which can be problematic for large artifacts.

You can stream the response and compute hash/length incrementally:

-def _load_file_stats(url: str) -> Tuple[str, int]:
-    resp = requests.get(url, timeout=30)
+def _load_file_stats(url: str) -> Tuple[str, int]:
+    resp = requests.get(url, timeout=30, stream=True)
     if resp.status_code >= 400:
         raise requests.exceptions.RequestException(response=resp)
 
-    sha256sum = hashlib.sha256(bytes(resp.content)).hexdigest()
-    content_length = len(resp.content)
-    return sha256sum, content_length
+    hasher = hashlib.sha256()
+    content_length = 0
+    for chunk in resp.iter_content(chunk_size=8192):
+        if not chunk:
+            continue
+        hasher.update(chunk)
+        content_length += len(chunk)
+    resp.close()
+    return hasher.hexdigest(), content_length

This keeps memory usage bounded while preserving behaviour.

databusclient/api/download.py (1)

172-191: Consider adding a timeout to SPARQLWrapper queries

_query_sparql_endpoint delegates entirely to SPARQLWrapper without any timeout configuration. For slow or unresponsive endpoints, this can hang indefinitely.

If supported by your SPARQLWrapper version, you might want to configure a timeout (e.g., via sparql.setTimeout(...)) or equivalent, to mirror the deterministic behaviour you now have on the plain requests calls.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c56da44 and 292d2fa.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • Dockerfile (1 hunks)
  • README.md (5 hunks)
  • databusclient/api/delete.py (12 hunks)
  • databusclient/api/deploy.py (1 hunks)
  • databusclient/api/download.py (1 hunks)
  • databusclient/api/utils.py (2 hunks)
  • databusclient/cli.py (5 hunks)
  • pyproject.toml (2 hunks)
  • tests/test_deploy.py (2 hunks)
  • tests/test_download.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • databusclient/api/delete.py
  • pyproject.toml
🧰 Additional context used
🧬 Code graph analysis (5)
databusclient/cli.py (4)
databusclient/api/deploy.py (3)
  • deploy (418-467)
  • create_dataset (269-415)
  • deploy_from_metadata (470-516)
databusclient/api/delete.py (1)
  • delete (179-214)
databusclient/api/download.py (1)
  • download (504-624)
databusclient/extensions/webdav.py (1)
  • upload_to_webdav (31-91)
databusclient/api/deploy.py (1)
databusclient/cli.py (1)
  • deploy (49-127)
tests/test_deploy.py (1)
databusclient/api/deploy.py (4)
  • deploy (418-467)
  • get_file_info (158-173)
  • _get_content_variants (27-47)
  • BadArgumentException (15-16)
tests/test_download.py (2)
databusclient/api/download.py (1)
  • download (504-624)
databusclient/cli.py (1)
  • download (161-183)
databusclient/api/download.py (2)
databusclient/api/utils.py (2)
  • fetch_databus_jsonld (33-50)
  • get_databus_id_parts_from_file_url (6-30)
databusclient/cli.py (1)
  • download (161-183)
🪛 LanguageTool
README.md

[uncategorized] ~571-~571: The official name of this software platform is spelled with a capital “H”.
Context: ...pyproject.toml and is enforced in CI (.github/workflows/ruff.yml). For development,...

(GITHUB)

🪛 markdownlint-cli2 (0.18.1)
README.md

23-23: Link fragments should be valid

(MD051, link-fragments)


573-573: Spaces inside code span elements

(MD038, no-space-in-code)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (13)
Dockerfile (1)

1-1: LGTM!

Base image update to Python 3.11 aligns with the project's updated Python version requirements.

README.md (2)

36-36: LGTM!

Python version requirement correctly updated to 3.11+ to match pyproject.toml.


561-601: LGTM!

Development section provides clear guidance for contributors with appropriate Poetry, Ruff, and pytest instructions.

tests/test_download.py (2)

3-7: LGTM!

Import correctly updated to use the new modular API path. The TODO comment acknowledges technical debt.


24-29: LGTM!

Test invocations correctly updated to use api_download with the new signature (localDir, endpoint, databusURIs).

tests/test_deploy.py (2)

7-13: LGTM!

Imports correctly updated to reference the new databusclient.api.deploy module. Testing private functions like _get_content_variants is appropriate for ensuring internal logic correctness.


18-40: LGTM!

Comprehensive test coverage for _get_content_variants including valid cases, empty variants, URL-only input, and invalid format error handling.

databusclient/api/utils.py (3)

6-30: LGTM!

Function renamed and implementation correctly normalizes URIs, handles variable-length paths, and pads missing parts with None.


44-50: LGTM!

Well-implemented HTTP client code with proper content negotiation headers, optional authentication, timeout handling, and error propagation.


33-33: Past review concern resolved: str | None syntax is valid for Python 3.11+.

The project requires Python 3.11+ as specified in pyproject.toml (python = "^3.11"). The union type syntax introduced in PEP 604 is fully supported in this version, so the previous Python 3.9 compatibility concern no longer applies.

databusclient/cli.py (3)

20-127: Deploy CLI mode handling looks consistent and clear

The three-mode deploy logic (classic, metadata file, WebDAV) with the upfront conflict checks reads well and matches the API surface (api_deploy.create_dataset, deploy, deploy_from_metadata). No functional issues spotted here.


130-183: Download CLI maps options cleanly to api_download

The additional options (--databus-key, --all-versions, --authurl, --clientid) are passed through correctly to api_download, and the docstring matches the parameters. Looks good.


186-210: Delete CLI wiring is straightforward

The delete command’s parameters (databusuris, databus_key, dry_run, force) are forwarded 1:1 to api_delete. No issues here.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
README.md (1)

23-25: Docs: TOC, anchors, and dev tooling section look consistent now

The new “Development & Contributing” entry and section header match (#development--contributing), and the earlier issues around the broken #contributing link, the ruff check . trailing-space snippet, and the “ensuere” typo are all resolved. This section now clearly documents the Poetry + Ruff + pytest workflow.

Also applies to: 561-601

databusclient/api/download.py (1)

396-435: Nice hardening of JSON-LD parsing for versions, parts, and artifacts

The additional checks around databus:hasVersion, filtering non-dict entries and missing "@id"s, validating @graph “Part” nodes’ file fields, and normalizing databus:hasArtifact to handle both dict and list shapes significantly improve robustness against malformed or slightly variant JSON-LD.

This should make group/artifact/version downloads much more resilient to backend changes.

Also applies to: 437-460, 498-529

🧹 Nitpick comments (2)
databusclient/api/download.py (2)

37-48: Guard localDir inference against non-Databus URLs

When localDir is None, you derive the download path from account, group, artifact, and version. If the URL doesn’t match the expected Databus shape (e.g., fewer path segments), some of these become None and os.path.join will raise a TypeError.

Consider validating the parsed parts before joining and failing fast with a clearer error:

-    if localDir is None:
-        _host, account, group, artifact, version, file = (
-            get_databus_id_parts_from_file_url(url)
-        )
-        localDir = os.path.join(
-            os.getcwd(),
-            account,
-            group,
-            artifact,
-            version if version is not None else "latest",
-        )
+    if localDir is None:
+        _host, account, group, artifact, version, _ = get_databus_id_parts_from_file_url(
+            url
+        )
+        if not all([account, group, artifact]):
+            raise ValueError(
+                f"Cannot infer local directory from non-databus URL: {url}"
+            )
+        localDir = os.path.join(
+            os.getcwd(),
+            account,
+            group,
+            artifact,
+            version or "latest",
+        )

197-231: Clarify SPARQL result shape (single-variable bindings) or relax validation

_get_file_download_urls_from_sparql_query currently enforces that each binding dict has exactly one key; any query returning multiple variables will raise a ValueError, even if one of the variables is the desired file URL.

Since the README encourages user-defined SPARQL queries, consider either:

  • Documenting that queries must select exactly one variable containing the download URL, or
  • Relaxing the validation to allow multi-variable bindings and just pick a specific variable (e.g. "file" or the first key):
-    for binding in bindings:
-        if not isinstance(binding, dict) or len(binding) != 1:
-            raise ValueError(f"Invalid SPARQL binding structure: {binding}")
-
-        value_dict = next(iter(binding.values()))
+    for binding in bindings:
+        if not isinstance(binding, dict) or not binding:
+            raise ValueError(f"Invalid SPARQL binding structure: {binding}")
+
+        # Take the first variable value by default
+        value_dict = next(iter(binding.values()))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 292d2fa and 5558c04.

📒 Files selected for processing (2)
  • README.md (5 hunks)
  • databusclient/api/download.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
databusclient/api/download.py (1)
databusclient/api/utils.py (2)
  • fetch_databus_jsonld (33-50)
  • get_databus_id_parts_from_file_url (6-30)
🪛 LanguageTool
README.md

[uncategorized] ~571-~571: The official name of this software platform is spelled with a capital “H”.
Context: ...pyproject.toml and is enforced in CI (.github/workflows/ruff.yml). For development,...

(GITHUB)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: build

@Integer-Ctrl Integer-Ctrl force-pushed the refactor/project-structure branch from 5558c04 to 072f37c Compare December 9, 2025 09:46
@Integer-Ctrl Integer-Ctrl merged commit 73d4544 into main Dec 9, 2025
3 checks passed
@Integer-Ctrl Integer-Ctrl deleted the refactor/project-structure branch December 9, 2025 14:00
@coderabbitai coderabbitai bot mentioned this pull request Dec 18, 2025
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants