Skip to content

Conversation

@PGijsbers
Copy link
Contributor

@PGijsbers PGijsbers commented Dec 19, 2025

And process the new warnings/errors.

Summary by Sourcery

Update linting and type-checking configuration and adjust code to satisfy new pre-commit hooks and type checker requirements.

Enhancements:

  • Tighten JSON-related typing in OpenML task routers and template processing utilities.
  • Simplify and correct various type casts across database access and configuration loading modules.
  • Refine Pydantic and JsonLD schema generics and field typing for better static type safety.
  • Adjust ApiKey enum definitions to align with style and type-checking expectations.

Build:

  • Bump pre-commit hooks for core checks, mypy, and ruff to newer versions and update ruff configuration ignores accordingly.

Tests:

  • Update hypothesis decorators and test annotations to comply with stricter type checking and linting rules.

Summary by CodeRabbit

  • Chores

    • Updated pre-commit hooks to newer versions and added a ruff-format hook.
    • Adjusted lint configuration to refine which rules are suppressed.
  • Refactor

    • Standardized type annotations and casting across the codebase for more consistent static checking.
    • Improved JSON/template handling and model configuration to tighten validation and data handling.
  • Tests

    • Minor test decorator/annotation formatting tweaks (no behavioral changes).

✏️ Tip: You can customize this high-level summary in your review settings.

@sourcery-ai
Copy link

sourcery-ai bot commented Dec 19, 2025

Reviewer's Guide

This PR updates pre-commit hooks (mypy, ruff, core hooks) and then fixes or adjusts typing and lint issues across routers, database helpers, schemas, tests, and config, mainly by tightening JSON-related types, switching casts to string-based forms, updating hypothesis and enum annotations, and aligning Ruff configuration codes.

Class diagram for updated OpenML tasks JSON template helpers

classDiagram
    class JSON {
    }

    class OpenMLTasksJSONHelpers {
        +convert_template_xml_to_json(xml_template str) dict~str, JSON~
        +fill_template(template str, task RowMapping, task_inputs dict~str, str~, connection Connection) dict~str, JSON~
        +_fill_json_template(template JSON, task RowMapping, task_inputs dict~str, str~, fetched_data dict~str, str~, connection Connection) JSON
    }

    JSON <|.. OpenMLTasksJSONHelpers : uses
Loading

Updated class diagram for JsonLDObjectReference generic model

classDiagram
    class BaseModel {
    }

    class JsonLDObject {
        <<abstract>>
        +context_property_map dict~str, str~
    }
    JsonLDObject --|> BaseModel

    class JsonLDObjectReference_T_JsonLDObject_ {
        +id_ str
    }
    JsonLDObjectReference_T_JsonLDObject_ --|> BaseModel

    class JsonLDGraph {
        +context str or dict~str, HttpUrl~
        +graph list~Distribution | DataService | Dataset | Quality | Feature | Agent | MD5Checksum~
    }
    JsonLDGraph --|> BaseModel

    JsonLDGraph "*" o-- JsonLDObject : contains
Loading

File-Level Changes

Change Details Files
Tighten JSON template typing in OpenML task router and adjust task input handling
  • Introduce a recursive JSON type alias and use it to narrow the return type of convert_template_xml_to_json and fill_template
  • Change _fill_json_template signatures to use the JSON alias, with fetched_data now typed as dict[str, str] and a JSON return type
  • Wrap fill_template and convert_template_xml_to_json return values in cast calls to satisfy the new typing expectations
  • Change task_inputs mapping to always store values as strings instead of sometimes ints
  • Update dataset_id handling in get_task to gate dataset lookup on isinstance(dataset_id, int) before fetching the dataset
src/routers/openml/tasks.py
Standardize typing.cast usage to string-literal form for mypy and Ruff compatibility
  • Replace typing.cast(target_type, value) with cast("TargetType", value) or equivalent string-literal casts in multiple database modules and config loader
  • Remove an unnecessary cast in _load_configuration and instead cast only in load_routing_configuration
src/database/tasks.py
src/database/studies.py
src/database/flows.py
src/config.py
src/routers/openml/tasktype.py
src/database/evaluations.py
Adjust tests to satisfy updated mypy and Ruff rules
  • Move type: ignore comments from hypothesis.settings and given decorators to match new untyped-decorator error codes and drop obsolete misc ignores
  • Simplify ApiKey StrEnum member declarations by removing explicit : str annotations
tests/routers/openml/datasets_list_datasets_test.py
tests/users.py
Refine JSON-LD schema typing to work with newer type checker and Ruff
  • Replace the Generic[T] + TypeVar pattern for JsonLDObjectReference with the new Pydantic-compatible generic class syntax JsonLDObjectReference[T: JsonLDObject]
  • Remove a type: ignore[arg-type] on JsonLDGraph.context now that the Field default and type are aligned
src/schemas/datasets/mldcat_ap.py
Update linting and pre-commit tooling versions and configuration
  • Bump pre-commit hook versions for pre-commit-hooks, mypy, and ruff-pre-commit
  • Update Ruff ignore codes from deprecated TCH001/TCH003 to TC001/TC003
  • Stop globally ignoring ANN101 and ANN102 to enforce annotations on self and cls going forward
.pre-commit-config.yaml
pyproject.toml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link

coderabbitai bot commented Dec 19, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Updates to linting/pre-commit configuration, systematic conversion of typing.cast type arguments to string-literal forward references across database and router code, introduction of a recursive JSON type alias and tighter typing in OpenML task template functions, a schema generic modernization, and small test/type annotation cleanups.

Changes

Cohort / File(s) Summary
Pre-commit & Linting Configuration
\.pre-commit-config\.yaml, pyproject.toml
Bumped pre-commit/mypy/ruff revs and added a ruff-format hook. Adjusted Ruff ignore list: removed ANN101, ANN102; replaced TCH001/TCH003 with TC001/TC003.
Core Configuration
src/config.py
Removed explicit typing.cast to concrete TomlTable and replaced cast type arguments with string-literal forward references ("TomlTable"); runtime behavior unchanged.
Database Typing Refinements
src/database/evaluations.py, src/database/flows.py, src/database/studies.py, src/database/tasks.py
Replaced concrete type arguments in typing.cast(...) with string-literal forward references (e.g., "Sequence[Row]", "int"). No control-flow or runtime changes.
OpenML Tasks Router
src/routers/openml/tasks.py
Added recursive JSON type alias; changed convert_template_xml_to_json, fill_template, and _fill_json_template signatures to use JSON; added casts to dict[str, JSON]; tightened dataset_id handling using isinstance(dataset_id, int); adjusted internal typing for task_inputs/fetched_data.
OpenML Tasktype Router
src/routers/openml/tasktype.py
Switched typing.cast type arguments for creator/contributors fields from concrete str to string-literal "str"; behavior unchanged.
Schema Updates
src/schemas/datasets/mldcat_ap.py
Modernized JsonLDObjectReference generic syntax to PEP-style JsonLDObjectReference[T: JsonLDObject] and added model_config = {"populate_by_name": True, "extra": "forbid"} to JsonLDGraph.
Test Updates
tests/routers/openml/datasets_list_datasets_test.py, tests/users.py
Minor decorator formatting and type: ignore adjustments for Hypothesis in tests; removed explicit : str annotations from ApiKey enum members (kept values).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Repeated, low-risk typing changes lower per-file complexity but occur across many modules.
  • Areas to focus on:
    • src/routers/openml/tasks.py: verify JSON alias covers all runtime shapes expected and that isinstance(dataset_id, int) preserves previous behavior for non-int IDs.
    • src/schemas/datasets/mldcat_ap.py: ensure PEP 695-style generic is compatible with project Python target and Pydantic version.
    • Pre-commit/pyproject updates: confirm CI pre-commit hooks and ruff configuration work as intended after rule renames.

Possibly related PRs

  • Move extra dev commits into main #218 — Modifies the same src/routers/openml/tasks.py functions (convert_template_xml_to_json, fill_template, _fill_json_template); both PRs touch typing and template handling and may conflict or overlap.

Poem

🐰 I hopped through casts and linting lines today,
Swapped strings for types and chased loose errors away,
A JSON trail I neatly mapped, templates tucked in bed,
Schemas spruced in modern style — carrots for my head! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Update pre-commit hooks' directly reflects the primary change: bumping versions in .pre-commit-config.yaml and processing new warnings/errors across the codebase to comply with updated hooks.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pre-commit-autoupdate

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9faa75f and 0efcb71.

📒 Files selected for processing (1)
  • src/routers/openml/tasks.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/routers/openml/tasks.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Sourcery review
  • GitHub Check: compare-php
  • GitHub Check: python

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • In get_task, task_inputs now always stores values as str, but the later isinstance(dataset_id, int) check and database.datasets.get(dataset_id, expdb) expect an int, so this branch will never be taken and the dataset lookup/name formatting logic appears to be broken; consider keeping source_data as an int or explicitly converting it before lookup.
  • Several cast(...) calls were changed to use string literals (e.g. cast("Sequence[Row]", ...), cast("dict[str, JSON]", ...)), which defeats static type checking since cast expects a type, not a string; prefer importing/using the concrete types (or a proper TypeAlias) instead of string literals for these casts.
  • The _fill_json_template signature now constrains fetched_data to dict[str, str], but it participates in templating JSON structures that can contain non-string values; consider widening this to dict[str, JSON] (or similar) to reflect actual usage and avoid needing downstream ignores or unsafe casts.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `get_task`, `task_inputs` now always stores values as `str`, but the later `isinstance(dataset_id, int)` check and `database.datasets.get(dataset_id, expdb)` expect an `int`, so this branch will never be taken and the dataset lookup/name formatting logic appears to be broken; consider keeping `source_data` as an `int` or explicitly converting it before lookup.
- Several `cast(...)` calls were changed to use string literals (e.g. `cast("Sequence[Row]", ...)`, `cast("dict[str, JSON]", ...)`), which defeats static type checking since `cast` expects a type, not a string; prefer importing/using the concrete types (or a proper `TypeAlias`) instead of string literals for these casts.
- The `_fill_json_template` signature now constrains `fetched_data` to `dict[str, str]`, but it participates in templating JSON structures that can contain non-string values; consider widening this to `dict[str, JSON]` (or similar) to reflect actual usage and avoid needing downstream ignores or unsafe casts.

## Individual Comments

### Comment 1
<location> `src/routers/openml/tasks.py:166-169` </location>
<code_context>

     task_inputs = {
-        row.input: int(row.value) if row.value.isdigit() else row.value
+        row.input: str(int(row.value)) if row.value.isdigit() else row.value
         for row in database.tasks.get_input_for_task(task_id, expdb)
     }
</code_context>

<issue_to_address>
**issue (bug_risk):** The `task_inputs` value normalization and the subsequent `dataset_id` int check are now inconsistent and make the dataset lookup unreachable.

With the new normalization, numeric `task_inputs` values are stored as strings (`"123"`), so `dataset_id` is now always a `str`. As a result, `isinstance(dataset_id, int)` is never true and the dataset lookup path is effectively dead code. To fix this, either convert `dataset_id` to `int` before lookup (and drop the `isinstance` check in favor of conversion/validation), or revert to storing numeric values as integers so the existing check continues to work.
</issue_to_address>

### Comment 2
<location> `src/database/tasks.py:22` </location>
<code_context>

 def get_math_functions(function_type: str, connection: Connection) -> Sequence[Row]:
     return cast(
-        Sequence[Row],
+        "Sequence[Row]",
         connection.execute(
             text(
</code_context>

<issue_to_address>
**issue:** Using string literals as the first argument to `typing.cast` undermines static type checking.

`cast` expects a type (e.g. `Sequence[Row]`), not a string. Using `cast("Sequence[Row]", ...)` turns this into a cast to a string literal type (or `str`), which discards the intended type information and can mask real issues. Please switch back to `cast(Sequence[Row], ...)` (or a suitable type alias) throughout these functions.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.


def get_task_types(expdb: Connection) -> Sequence[Row]:
return cast(
Sequence[Row],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Using string literals as the first argument to typing.cast undermines static type checking.

cast expects a type (e.g. Sequence[Row]), not a string. Using cast("Sequence[Row]", ...) turns this into a cast to a string literal type (or str), which discards the intended type information and can mask real issues. Please switch back to cast(Sequence[Row], ...) (or a suitable type alias) throughout these functions.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/config.py (1)

29-29: Consider using the direct type reference instead of a string literal.

Since TomlTable is already defined at line 9, the string literal "TomlTable" in the cast is unnecessary. Direct reference would be more conventional:

-    return typing.cast("TomlTable", _load_configuration(file)["routing"])
+    return typing.cast(TomlTable, _load_configuration(file)["routing"])
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ac2991 and 9faa75f.

📒 Files selected for processing (12)
  • .pre-commit-config.yaml (2 hunks)
  • pyproject.toml (1 hunks)
  • src/config.py (1 hunks)
  • src/database/evaluations.py (1 hunks)
  • src/database/flows.py (2 hunks)
  • src/database/studies.py (3 hunks)
  • src/database/tasks.py (4 hunks)
  • src/routers/openml/tasks.py (5 hunks)
  • src/routers/openml/tasktype.py (1 hunks)
  • src/schemas/datasets/mldcat_ap.py (3 hunks)
  • tests/routers/openml/datasets_list_datasets_test.py (2 hunks)
  • tests/users.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/routers/openml/tasks.py (3)
src/database/flows.py (1)
  • get (67-77)
src/database/tasks.py (1)
  • get (7-17)
src/database/datasets.py (1)
  • get (11-22)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Sourcery review
  • GitHub Check: compare-php
🔇 Additional comments (15)
.pre-commit-config.yaml (2)

33-33: Addition of ruff-format hook looks good.

Adding the ruff-format hook ensures consistent code formatting across the codebase.


6-6: Version bumps are compatible with Python >=3.12.

All updated hook versions support the project's minimum Python version requirement. Mypy's policy bumps the major version for breaking changes, and the v1.11→v1.19 bump contains no major incompatibilities. Ruff uses minor version numbers for breaking changes, with v0.14.0 changing the default target Python from 3.9 to 3.10, which does not conflict with Python 3.12. Pre-commit-hooks v6.0.0 requires Python >= 3.9, well below your 3.12 minimum.

src/routers/openml/tasktype.py (1)

56-56: String-literal type hints in cast() are valid but unusual for built-ins.

The change from cast(str, ...) to cast("str", ...) is valid Python typing syntax. String-literal type hints are typically used for forward references to avoid circular imports, but they can also be used for built-in types. This change appears to be driven by the updated Ruff linting rules (TC001/TC003).

Runtime behavior is unchanged since cast() is a no-op at runtime.

Also applies to: 60-60

src/database/flows.py (1)

9-9: Type casting pattern is consistent across the PR.

The change to string-literal type hints cast("Sequence[Row]", ...) aligns with the broader PR pattern and the updated linting rules. Runtime behavior remains unchanged.

Also applies to: 39-39

src/database/evaluations.py (1)

12-12: LGTM - consistent type casting pattern.

The string-literal type hint in cast() is consistent with the PR-wide refactoring.

src/database/studies.py (1)

46-46: String-literal type casting is consistent throughout.

All three cast() calls have been updated to use string-literal type hints, consistent with the PR-wide pattern. The change on Line 106 correctly casts the LAST_INSERT_ID() result to int.

Also applies to: 59-59, 106-106

src/database/tasks.py (1)

22-22: Type casting updates are uniform and correct.

All four cast() calls consistently use string-literal type hints for Sequence[Row], matching the PR-wide pattern.

Also applies to: 49-49, 65-65, 81-81

tests/routers/openml/datasets_list_datasets_test.py (1)

227-232: Decorator formatting and type ignores added for stricter type checking.

The formatting changes and type: ignore[untyped-decorator] comments are likely driven by the new ruff-format hook and stricter mypy settings (v1.19.1). These changes suppress legitimate warnings about Hypothesis decorators that don't have complete type hints.

The comment reference # 108 appears to reference an issue number.

Also applies to: 246-246

pyproject.toml (1)

64-65: Ruff rule codes TC001 and TC003 are correct.

TC001 and TC003 are correctly remapped from TCH001 and TCH003 in Ruff v0.14.10. However, the description should be clarified: TC001 checks for first-party imports used only for type annotations but not in a type-checking block, and TC003 checks for standard library imports with the same condition. The rules are not specifically about Pydantic type usage—they relate to moving imports into TYPE_CHECKING blocks to reduce runtime overhead.

src/schemas/datasets/mldcat_ap.py (2)

280-280: Good addition of model configuration.

Adding model_config to JsonLDGraph ensures consistency with other model classes in the file and enables proper field aliasing behavior.


44-44: The project's minimum Python version requirement (requires-python = ">=3.12") correctly supports the PEP 695 type parameter syntax used at line 44. No compatibility issues exist.

tests/users.py (1)

12-15: LGTM! Redundant type annotations removed.

The explicit : str annotations are unnecessary for StrEnum members, as the enum already enforces string types.

src/routers/openml/tasks.py (3)

18-18: Well-designed recursive type alias for JSON structures.

The JSON type alias provides clear typing for nested JSON-like data structures, improving type safety throughout the module.


21-27: Type annotations enhanced with JSON alias.

The function signature and return type are now more precise. The cast on line 27 ensures the return type matches the expected structure.


84-93: The cast is safe given that json_template is always a dict[str, JSON]. When _fill_json_template() receives a dict input, it returns a dict with recursively processed values (lines 103-107). The type system could be more explicit, but the cast does not introduce type unsafety in practice.

@codecov
Copy link

codecov bot commented Dec 19, 2025

Codecov Report

❌ Patch coverage is 89.47368% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@6ac2991). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/routers/openml/tasks.py 71.42% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #219   +/-   ##
=======================================
  Coverage        ?   92.54%           
=======================================
  Files           ?       51           
  Lines           ?     1851           
  Branches        ?      144           
=======================================
  Hits            ?     1713           
  Misses          ?       97           
  Partials        ?       41           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PGijsbers PGijsbers merged commit e1c3e0f into main Dec 19, 2025
7 checks passed
@PGijsbers PGijsbers deleted the pre-commit-autoupdate branch December 19, 2025 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants