-
Notifications
You must be signed in to change notification settings - Fork 9
Refactor/project structure #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughRefactors the monolithic client into modular APIs under Changes
Sequence Diagram(s)sequenceDiagram
participant CLI
participant DeployAPI as api.deploy
participant WebDAV
participant DatabusAPI as Databus
CLI->>DeployAPI: create_dataset(...) or deploy_from_metadata(...)
alt Classic Deploy
CLI->>DeployAPI: create_dataset(...)
DeployAPI-->>CLI: dataset JSON-LD
CLI->>DeployAPI: deploy(dataset, api_key)
DeployAPI->>DatabusAPI: POST /datasets (with API key)
DatabusAPI-->>DeployAPI: 2xx / error
else Metadata file
CLI->>CLI: load metadata file
CLI->>DeployAPI: deploy_from_metadata(metadata,...)
DeployAPI->>DatabusAPI: POST /datasets
DatabusAPI-->>DeployAPI: result
else WebDAV mode
CLI->>WebDAV: upload_to_webdav(paths, remote, path, webdav_url)
WebDAV-->>CLI: uploaded URLs / metadata
CLI->>DeployAPI: deploy_from_metadata(generated_metadata,...)
DeployAPI->>DatabusAPI: POST /datasets
DatabusAPI-->>DeployAPI: result
end
DeployAPI-->>CLI: deployment result/status
sequenceDiagram
participant Client
participant DownloadAPI as api.download
participant Databus
participant FileServer
participant Vault
Client->>DownloadAPI: download(localDir, endpoint, databusURIs, token?, databus_key?)
alt direct file URL
DownloadAPI->>FileServer: HEAD/GET file
FileServer-->>DownloadAPI: 200/redirect/401/404
else SPARQL collection
DownloadAPI->>Databus: SPARQL query
Databus-->>DownloadAPI: file URLs
else artifact/version/group
DownloadAPI->>Databus: fetch_databus_jsonld(uri)
Databus-->>DownloadAPI: JSON-LD with download URLs
end
loop per file URL
DownloadAPI->>FileServer: GET file
alt 401 with Bearer challenge
DownloadAPI->>Vault: exchange refresh_token -> access_token
Vault-->>DownloadAPI: access_token
DownloadAPI->>FileServer: GET with Authorization: Bearer
else 401 without Bearer
DownloadAPI->>FileServer: GET with X-API-KEY
end
FileServer-->>DownloadAPI: file content / error
end
DownloadAPI-->>Client: files saved to localDir
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (17)
💤 Files with no reviewable changes (2)
🚧 Files skipped from review as they are similar to previous changes (6)
🧰 Additional context used🧬 Code graph analysis (5)tests/test_deploy.py (1)
tests/test_download.py (2)
databusclient/api/deploy.py (1)
databusclient/cli.py (4)
databusclient/api/download.py (2)
🪛 LanguageToolREADME.md[uncategorized] ~571-~571: The official name of this software platform is spelled with a capital “H”. (GITHUB) ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (23)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
🧹 Nitpick comments (4)
databusclient/api/delete.py (1)
82-84: Consider using a more specific exception type.Using a generic
Exceptionmakes it harder for callers to distinguish between HTTP failures and other errors. Consider usingrequests.HTTPErroror a custom exception.- raise Exception( - f"Failed to delete {databusURI}: {response.status_code} - {response.text}" - ) + raise requests.HTTPError( + f"Failed to delete {databusURI}: {response.status_code} - {response.text}", + response=response, + )databusclient/api/download.py (1)
35-47: Variablefileis shadowed.The variable
filefrom the tuple unpacking on line 35 is overwritten on line 47, which could lead to confusion when maintaining this code.- _host, account, group, artifact, version, file = get_databus_id_parts_from_uri( + _host, account, group, artifact, version, _file = get_databus_id_parts_from_uri( url )databusclient/api/__init__.py (1)
1-1: Consider re-exporting key API symbols for convenience.The empty
__init__.pycorrectly establishes the package, but you may want to re-export key symbols fromdeploy,download,delete, and other submodules to provide a convenient public API surface. This would allow users to import directly fromdatabusclient.apirather than requiring full module paths.For example:
from databusclient.api.deploy import deploy # or relevant exported symbols from databusclient.api.download import download __all__ = ["deploy", "download"]This is optional and depends on your API design philosophy—explicit full-path imports (e.g.,
databusclient.api.deploy.deploy()) are also valid if you prefer to keep the API surface explicit and modular.databusclient/api/deploy.py (1)
8-8: Consider renaming__debugto avoid confusion.The variable
__debug(with double underscore prefix) could be confused with Python's built-in__debug__constant. Consider renaming to_debugorDEBUGfor clarity.Apply this diff:
-__debug = False +_debug = FalseAnd update references at lines 159, 438, and 446:
- if __debug: + if _debug:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (16)
.github/workflows/python-CI.yml(1 hunks)README.md(4 hunks)databusclient/__init__.py(1 hunks)databusclient/api/__init__.py(1 hunks)databusclient/api/delete.py(11 hunks)databusclient/api/deploy.py(1 hunks)databusclient/api/download.py(1 hunks)databusclient/api/utils.py(2 hunks)databusclient/cli.py(5 hunks)databusclient/client.py(0 hunks)databusclient/consume/download.py(0 hunks)databusclient/extensions/__init__.py(1 hunks)databusclient/extensions/webdav.py(5 hunks)pyproject.toml(2 hunks)tests/test_databusclient.py(2 hunks)tests/test_download.py(2 hunks)
💤 Files with no reviewable changes (2)
- databusclient/client.py
- databusclient/consume/download.py
🧰 Additional context used
🧬 Code graph analysis (6)
tests/test_download.py (2)
databusclient/api/download.py (1)
download(497-614)databusclient/cli.py (1)
download(161-183)
tests/test_databusclient.py (1)
databusclient/api/deploy.py (4)
deploy(407-448)create_dataset(264-404)create_distribution(171-202)get_file_info(153-168)
databusclient/api/deploy.py (1)
databusclient/cli.py (1)
deploy(49-127)
databusclient/__init__.py (2)
databusclient/api/deploy.py (3)
deploy(407-448)create_dataset(264-404)create_distribution(171-202)databusclient/cli.py (1)
deploy(49-127)
databusclient/api/download.py (2)
databusclient/api/utils.py (2)
fetch_databus_jsonld(32-49)get_databus_id_parts_from_uri(6-29)databusclient/cli.py (1)
download(161-183)
databusclient/api/delete.py (2)
databusclient/api/utils.py (2)
fetch_databus_jsonld(32-49)get_databus_id_parts_from_uri(6-29)databusclient/cli.py (1)
delete(197-210)
🪛 LanguageTool
README.md
[uncategorized] ~570-~570: The official name of this software platform is spelled with a capital “H”.
Context: ...pyproject.toml and is enforced in CI (.github/workflows/ruff.yml). For development,...
(GITHUB)
🪛 markdownlint-cli2 (0.18.1)
README.md
23-23: Link fragments should be valid
(MD051, link-fragments)
572-572: Spaces inside code span elements
(MD038, no-space-in-code)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (21)
pyproject.toml (1)
3-4: LGTM!The version bump, expanded description, and Ruff integration are well configured. The
target-version = "py39"correctly matches thepython = "^3.9"dependency, and thesrcpaths properly scope linting to the package and tests.Also applies to: 20-20, 25-27
.github/workflows/python-CI.yml (1)
21-34: LGTM!The CI workflow modernization is well done. Upgrading to
actions/checkout@v4andactions/setup-python@v5, along with migrating from flake8 to Ruff with--output-format=github, aligns with the tooling changes inpyproject.toml.databusclient/api/utils.py (2)
6-29: LGTM!The URI parsing logic is clear with proper normalization (stripping protocol) and consistent padding to ensure a 6-tuple output.
43-49: LGTM!Good HTTP practices: explicit Accept header, optional API key, timeout, and proper error propagation via
raise_for_status().databusclient/api/delete.py (4)
4-6: LGTM!Import changes correctly reflect the refactored function name
fetch_databus_jsonldfrom the utils module.
101-137: LGTM!The artifact deletion logic properly handles both single-version and multi-version cases, with appropriate null checks.
140-173: LGTM!Group deletion correctly filters artifact URIs (excluding versioned entries) and recursively deletes artifacts before the group itself.
176-211: LGTM!The routing logic properly distinguishes between collections, files, versions, artifacts, and groups based on URI structure, with appropriate messaging for unsupported operations.
databusclient/api/download.py (5)
1-10: LGTM!Imports are well-organized and appropriate for the module's functionality.
119-146: LGTM!Clean iteration over URLs with proper parameter forwarding.
167-186: LGTM!Good use of SPARQLWrapper with POST method and proper header handling for API key authentication.
189-223: LGTM!Robust validation of SPARQL response structure with clear error messages for malformed responses.
316-344: LGTM!The version, artifact, and group download functions correctly compose the lower-level download utilities with proper parameter propagation.
Also applies to: 347-383, 419-438, 441-474
databusclient/extensions/__init__.py (1)
1-1: LGTM!An empty
__init__.pyis the standard pattern for declaring a Python package and properly enables the import structure fordatabusclient.extensions.webdavreferenced elsewhere in the PR.databusclient/__init__.py (1)
2-8: LGTM! Clean refactoring of public API surface.The import path update correctly aligns with the new modular structure, and the
run()wrapper provides a convenient programmatic entry point to the CLI.tests/test_download.py (1)
3-27: LGTM! Test imports correctly updated to use new API module.The test now correctly imports and uses
api_downloadfromdatabusclient.api.download, consistent with the refactored API surface.databusclient/extensions/webdav.py (1)
31-33: LGTM! Better naming reflects WebDAV protocol generality.Renaming from
upload_to_nextcloudtoupload_to_webdavmore accurately reflects that this function works with any WebDAV-compatible server, not just Nextcloud.tests/test_databusclient.py (1)
7-7: LGTM! Tests correctly updated to use public API module.The imports now reference the public API surface from
databusclient.api.deploy, andget_file_infois appropriately exposed as a public function without the private__prefix.databusclient/cli.py (2)
161-183: LGTM! Download command correctly delegates to API module.The CLI command properly maps parameters and delegates to
api_download, with appropriate defaults for authentication URLs and client IDs.
8-11: LGTM! Clean separation of concerns with API modules.The CLI now correctly delegates to specialized API modules (
api_deploy,api_delete,api_download) and thewebdavextension, improving modularity and maintainability.databusclient/api/deploy.py (1)
451-497: LGTM! Metadata deployment flow is clean and well-structured.The
deploy_from_metadatafunction provides a clear workflow from metadata to deployment, with informative progress messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/test_deploy.py (1)
74-81: Signature mismatch in skipped test.The
get_file_infocall passes two arguments(artifact_name, dst_string), but according to the API signature indatabusclient/api/deploy.py, it only accepts one parameter(distribution_str: str). When this test is re-enabled, it will fail with aTypeError.Apply this diff when fixing the test:
( name, cvs, formatExtension, compression, sha256sum, content_length, - ) = get_file_info(artifact_name, dst_string) + ) = get_file_info(dst_string)Note: The return value unpacking also appears incorrect—
get_file_inforeturns a 5-tuple(cvs, format_extension, compression, sha256sum, content_length), not 6 values. This likely requires broader test refactoring.
♻️ Duplicate comments (1)
databusclient/api/deploy.py (1)
27-47: Make content-variant parsing more robust to extra=and whitespaceThe
key, value = kv.split("=")will fail if the value itself contains=, and keys/values are not trimmed. You already guard against missing=, which is good; consider also splitting once and normalizing:- key, value = kv.split("=") - cvs[key] = value + key, value = kv.split("=", 1) + key = key.strip() + value = value.strip() + if not key: + raise BadArgumentException("Content variant key cannot be empty") + cvs[key] = valueThis keeps the current behaviour but is more tolerant and yields cleaner keys/values.
🧹 Nitpick comments (2)
databusclient/api/deploy.py (1)
148-155: Avoid loading entire file into memory when computing sha256/length
_load_file_statsreadsresp.contentfully into memory, which can be problematic for large artifacts.You can stream the response and compute hash/length incrementally:
-def _load_file_stats(url: str) -> Tuple[str, int]: - resp = requests.get(url, timeout=30) +def _load_file_stats(url: str) -> Tuple[str, int]: + resp = requests.get(url, timeout=30, stream=True) if resp.status_code >= 400: raise requests.exceptions.RequestException(response=resp) - sha256sum = hashlib.sha256(bytes(resp.content)).hexdigest() - content_length = len(resp.content) - return sha256sum, content_length + hasher = hashlib.sha256() + content_length = 0 + for chunk in resp.iter_content(chunk_size=8192): + if not chunk: + continue + hasher.update(chunk) + content_length += len(chunk) + resp.close() + return hasher.hexdigest(), content_lengthThis keeps memory usage bounded while preserving behaviour.
databusclient/api/download.py (1)
172-191: Consider adding a timeout to SPARQLWrapper queries
_query_sparql_endpointdelegates entirely toSPARQLWrapperwithout any timeout configuration. For slow or unresponsive endpoints, this can hang indefinitely.If supported by your
SPARQLWrapperversion, you might want to configure a timeout (e.g., viasparql.setTimeout(...)) or equivalent, to mirror the deterministic behaviour you now have on the plainrequestscalls.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
Dockerfile(1 hunks)README.md(5 hunks)databusclient/api/delete.py(12 hunks)databusclient/api/deploy.py(1 hunks)databusclient/api/download.py(1 hunks)databusclient/api/utils.py(2 hunks)databusclient/cli.py(5 hunks)pyproject.toml(2 hunks)tests/test_deploy.py(2 hunks)tests/test_download.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- databusclient/api/delete.py
- pyproject.toml
🧰 Additional context used
🧬 Code graph analysis (5)
databusclient/cli.py (4)
databusclient/api/deploy.py (3)
deploy(418-467)create_dataset(269-415)deploy_from_metadata(470-516)databusclient/api/delete.py (1)
delete(179-214)databusclient/api/download.py (1)
download(504-624)databusclient/extensions/webdav.py (1)
upload_to_webdav(31-91)
databusclient/api/deploy.py (1)
databusclient/cli.py (1)
deploy(49-127)
tests/test_deploy.py (1)
databusclient/api/deploy.py (4)
deploy(418-467)get_file_info(158-173)_get_content_variants(27-47)BadArgumentException(15-16)
tests/test_download.py (2)
databusclient/api/download.py (1)
download(504-624)databusclient/cli.py (1)
download(161-183)
databusclient/api/download.py (2)
databusclient/api/utils.py (2)
fetch_databus_jsonld(33-50)get_databus_id_parts_from_file_url(6-30)databusclient/cli.py (1)
download(161-183)
🪛 LanguageTool
README.md
[uncategorized] ~571-~571: The official name of this software platform is spelled with a capital “H”.
Context: ...pyproject.toml and is enforced in CI (.github/workflows/ruff.yml). For development,...
(GITHUB)
🪛 markdownlint-cli2 (0.18.1)
README.md
23-23: Link fragments should be valid
(MD051, link-fragments)
573-573: Spaces inside code span elements
(MD038, no-space-in-code)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (13)
Dockerfile (1)
1-1: LGTM!Base image update to Python 3.11 aligns with the project's updated Python version requirements.
README.md (2)
36-36: LGTM!Python version requirement correctly updated to 3.11+ to match
pyproject.toml.
561-601: LGTM!Development section provides clear guidance for contributors with appropriate Poetry, Ruff, and pytest instructions.
tests/test_download.py (2)
3-7: LGTM!Import correctly updated to use the new modular API path. The TODO comment acknowledges technical debt.
24-29: LGTM!Test invocations correctly updated to use
api_downloadwith the new signature(localDir, endpoint, databusURIs).tests/test_deploy.py (2)
7-13: LGTM!Imports correctly updated to reference the new
databusclient.api.deploymodule. Testing private functions like_get_content_variantsis appropriate for ensuring internal logic correctness.
18-40: LGTM!Comprehensive test coverage for
_get_content_variantsincluding valid cases, empty variants, URL-only input, and invalid format error handling.databusclient/api/utils.py (3)
6-30: LGTM!Function renamed and implementation correctly normalizes URIs, handles variable-length paths, and pads missing parts with
None.
44-50: LGTM!Well-implemented HTTP client code with proper content negotiation headers, optional authentication, timeout handling, and error propagation.
33-33: Past review concern resolved:str | Nonesyntax is valid for Python 3.11+.The project requires Python 3.11+ as specified in
pyproject.toml(python = "^3.11"). The union type syntax introduced in PEP 604 is fully supported in this version, so the previous Python 3.9 compatibility concern no longer applies.databusclient/cli.py (3)
20-127: Deploy CLI mode handling looks consistent and clearThe three-mode deploy logic (classic, metadata file, WebDAV) with the upfront conflict checks reads well and matches the API surface (
api_deploy.create_dataset,deploy,deploy_from_metadata). No functional issues spotted here.
130-183: Download CLI maps options cleanly toapi_downloadThe additional options (
--databus-key,--all-versions,--authurl,--clientid) are passed through correctly toapi_download, and the docstring matches the parameters. Looks good.
186-210: Delete CLI wiring is straightforwardThe delete command’s parameters (
databusuris,databus_key,dry_run,force) are forwarded 1:1 toapi_delete. No issues here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
README.md (1)
23-25: Docs: TOC, anchors, and dev tooling section look consistent nowThe new “Development & Contributing” entry and section header match (
#development--contributing), and the earlier issues around the broken#contributinglink, theruff check .trailing-space snippet, and the “ensuere” typo are all resolved. This section now clearly documents the Poetry + Ruff + pytest workflow.Also applies to: 561-601
databusclient/api/download.py (1)
396-435: Nice hardening of JSON-LD parsing for versions, parts, and artifactsThe additional checks around
databus:hasVersion, filtering non-dict entries and missing"@id"s, validating@graph“Part” nodes’filefields, and normalizingdatabus:hasArtifactto handle both dict and list shapes significantly improve robustness against malformed or slightly variant JSON-LD.This should make group/artifact/version downloads much more resilient to backend changes.
Also applies to: 437-460, 498-529
🧹 Nitpick comments (2)
databusclient/api/download.py (2)
37-48: GuardlocalDirinference against non-Databus URLsWhen
localDirisNone, you derive the download path fromaccount,group,artifact, andversion. If the URL doesn’t match the expected Databus shape (e.g., fewer path segments), some of these becomeNoneandos.path.joinwill raise aTypeError.Consider validating the parsed parts before joining and failing fast with a clearer error:
- if localDir is None: - _host, account, group, artifact, version, file = ( - get_databus_id_parts_from_file_url(url) - ) - localDir = os.path.join( - os.getcwd(), - account, - group, - artifact, - version if version is not None else "latest", - ) + if localDir is None: + _host, account, group, artifact, version, _ = get_databus_id_parts_from_file_url( + url + ) + if not all([account, group, artifact]): + raise ValueError( + f"Cannot infer local directory from non-databus URL: {url}" + ) + localDir = os.path.join( + os.getcwd(), + account, + group, + artifact, + version or "latest", + )
197-231: Clarify SPARQL result shape (single-variable bindings) or relax validation
_get_file_download_urls_from_sparql_querycurrently enforces that each binding dict has exactly one key; any query returning multiple variables will raise aValueError, even if one of the variables is the desired file URL.Since the README encourages user-defined SPARQL queries, consider either:
- Documenting that queries must select exactly one variable containing the download URL, or
- Relaxing the validation to allow multi-variable bindings and just pick a specific variable (e.g.
"file"or the first key):- for binding in bindings: - if not isinstance(binding, dict) or len(binding) != 1: - raise ValueError(f"Invalid SPARQL binding structure: {binding}") - - value_dict = next(iter(binding.values())) + for binding in bindings: + if not isinstance(binding, dict) or not binding: + raise ValueError(f"Invalid SPARQL binding structure: {binding}") + + # Take the first variable value by default + value_dict = next(iter(binding.values()))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
README.md(5 hunks)databusclient/api/download.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
databusclient/api/download.py (1)
databusclient/api/utils.py (2)
fetch_databus_jsonld(33-50)get_databus_id_parts_from_file_url(6-30)
🪛 LanguageTool
README.md
[uncategorized] ~571-~571: The official name of this software platform is spelled with a capital “H”.
Context: ...pyproject.toml and is enforced in CI (.github/workflows/ruff.yml). For development,...
(GITHUB)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: build
5558c04 to
072f37c
Compare
Summary by CodeRabbit
New Features
Breaking Change
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.