-
-
Notifications
You must be signed in to change notification settings - Fork 13
Update pre-commit hooks #219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's GuideThis PR updates pre-commit hooks (mypy, ruff, core hooks) and then fixes or adjusts typing and lint issues across routers, database helpers, schemas, tests, and config, mainly by tightening JSON-related types, switching casts to string-based forms, updating hypothesis and enum annotations, and aligning Ruff configuration codes. Class diagram for updated OpenML tasks JSON template helpersclassDiagram
class JSON {
}
class OpenMLTasksJSONHelpers {
+convert_template_xml_to_json(xml_template str) dict~str, JSON~
+fill_template(template str, task RowMapping, task_inputs dict~str, str~, connection Connection) dict~str, JSON~
+_fill_json_template(template JSON, task RowMapping, task_inputs dict~str, str~, fetched_data dict~str, str~, connection Connection) JSON
}
JSON <|.. OpenMLTasksJSONHelpers : uses
Updated class diagram for JsonLDObjectReference generic modelclassDiagram
class BaseModel {
}
class JsonLDObject {
<<abstract>>
+context_property_map dict~str, str~
}
JsonLDObject --|> BaseModel
class JsonLDObjectReference_T_JsonLDObject_ {
+id_ str
}
JsonLDObjectReference_T_JsonLDObject_ --|> BaseModel
class JsonLDGraph {
+context str or dict~str, HttpUrl~
+graph list~Distribution | DataService | Dataset | Quality | Feature | Agent | MD5Checksum~
}
JsonLDGraph --|> BaseModel
JsonLDGraph "*" o-- JsonLDObject : contains
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughUpdates to linting/pre-commit configuration, systematic conversion of typing.cast type arguments to string-literal forward references across database and router code, introduction of a recursive Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- In
get_task,task_inputsnow always stores values asstr, but the laterisinstance(dataset_id, int)check anddatabase.datasets.get(dataset_id, expdb)expect anint, so this branch will never be taken and the dataset lookup/name formatting logic appears to be broken; consider keepingsource_dataas anintor explicitly converting it before lookup. - Several
cast(...)calls were changed to use string literals (e.g.cast("Sequence[Row]", ...),cast("dict[str, JSON]", ...)), which defeats static type checking sincecastexpects a type, not a string; prefer importing/using the concrete types (or a properTypeAlias) instead of string literals for these casts. - The
_fill_json_templatesignature now constrainsfetched_datatodict[str, str], but it participates in templating JSON structures that can contain non-string values; consider widening this todict[str, JSON](or similar) to reflect actual usage and avoid needing downstream ignores or unsafe casts.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `get_task`, `task_inputs` now always stores values as `str`, but the later `isinstance(dataset_id, int)` check and `database.datasets.get(dataset_id, expdb)` expect an `int`, so this branch will never be taken and the dataset lookup/name formatting logic appears to be broken; consider keeping `source_data` as an `int` or explicitly converting it before lookup.
- Several `cast(...)` calls were changed to use string literals (e.g. `cast("Sequence[Row]", ...)`, `cast("dict[str, JSON]", ...)`), which defeats static type checking since `cast` expects a type, not a string; prefer importing/using the concrete types (or a proper `TypeAlias`) instead of string literals for these casts.
- The `_fill_json_template` signature now constrains `fetched_data` to `dict[str, str]`, but it participates in templating JSON structures that can contain non-string values; consider widening this to `dict[str, JSON]` (or similar) to reflect actual usage and avoid needing downstream ignores or unsafe casts.
## Individual Comments
### Comment 1
<location> `src/routers/openml/tasks.py:166-169` </location>
<code_context>
task_inputs = {
- row.input: int(row.value) if row.value.isdigit() else row.value
+ row.input: str(int(row.value)) if row.value.isdigit() else row.value
for row in database.tasks.get_input_for_task(task_id, expdb)
}
</code_context>
<issue_to_address>
**issue (bug_risk):** The `task_inputs` value normalization and the subsequent `dataset_id` int check are now inconsistent and make the dataset lookup unreachable.
With the new normalization, numeric `task_inputs` values are stored as strings (`"123"`), so `dataset_id` is now always a `str`. As a result, `isinstance(dataset_id, int)` is never true and the dataset lookup path is effectively dead code. To fix this, either convert `dataset_id` to `int` before lookup (and drop the `isinstance` check in favor of conversion/validation), or revert to storing numeric values as integers so the existing check continues to work.
</issue_to_address>
### Comment 2
<location> `src/database/tasks.py:22` </location>
<code_context>
def get_math_functions(function_type: str, connection: Connection) -> Sequence[Row]:
return cast(
- Sequence[Row],
+ "Sequence[Row]",
connection.execute(
text(
</code_context>
<issue_to_address>
**issue:** Using string literals as the first argument to `typing.cast` undermines static type checking.
`cast` expects a type (e.g. `Sequence[Row]`), not a string. Using `cast("Sequence[Row]", ...)` turns this into a cast to a string literal type (or `str`), which discards the intended type information and can mask real issues. Please switch back to `cast(Sequence[Row], ...)` (or a suitable type alias) throughout these functions.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
|
||
| def get_task_types(expdb: Connection) -> Sequence[Row]: | ||
| return cast( | ||
| Sequence[Row], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: Using string literals as the first argument to typing.cast undermines static type checking.
cast expects a type (e.g. Sequence[Row]), not a string. Using cast("Sequence[Row]", ...) turns this into a cast to a string literal type (or str), which discards the intended type information and can mask real issues. Please switch back to cast(Sequence[Row], ...) (or a suitable type alias) throughout these functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/config.py (1)
29-29: Consider using the direct type reference instead of a string literal.Since
TomlTableis already defined at line 9, the string literal"TomlTable"in the cast is unnecessary. Direct reference would be more conventional:- return typing.cast("TomlTable", _load_configuration(file)["routing"]) + return typing.cast(TomlTable, _load_configuration(file)["routing"])
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
.pre-commit-config.yaml(2 hunks)pyproject.toml(1 hunks)src/config.py(1 hunks)src/database/evaluations.py(1 hunks)src/database/flows.py(2 hunks)src/database/studies.py(3 hunks)src/database/tasks.py(4 hunks)src/routers/openml/tasks.py(5 hunks)src/routers/openml/tasktype.py(1 hunks)src/schemas/datasets/mldcat_ap.py(3 hunks)tests/routers/openml/datasets_list_datasets_test.py(2 hunks)tests/users.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/routers/openml/tasks.py (3)
src/database/flows.py (1)
get(67-77)src/database/tasks.py (1)
get(7-17)src/database/datasets.py (1)
get(11-22)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Sourcery review
- GitHub Check: compare-php
🔇 Additional comments (15)
.pre-commit-config.yaml (2)
33-33: Addition of ruff-format hook looks good.Adding the ruff-format hook ensures consistent code formatting across the codebase.
6-6: Version bumps are compatible with Python >=3.12.All updated hook versions support the project's minimum Python version requirement. Mypy's policy bumps the major version for breaking changes, and the v1.11→v1.19 bump contains no major incompatibilities. Ruff uses minor version numbers for breaking changes, with v0.14.0 changing the default target Python from 3.9 to 3.10, which does not conflict with Python 3.12. Pre-commit-hooks v6.0.0 requires Python >= 3.9, well below your 3.12 minimum.
src/routers/openml/tasktype.py (1)
56-56: String-literal type hints in cast() are valid but unusual for built-ins.The change from
cast(str, ...)tocast("str", ...)is valid Python typing syntax. String-literal type hints are typically used for forward references to avoid circular imports, but they can also be used for built-in types. This change appears to be driven by the updated Ruff linting rules (TC001/TC003).Runtime behavior is unchanged since
cast()is a no-op at runtime.Also applies to: 60-60
src/database/flows.py (1)
9-9: Type casting pattern is consistent across the PR.The change to string-literal type hints
cast("Sequence[Row]", ...)aligns with the broader PR pattern and the updated linting rules. Runtime behavior remains unchanged.Also applies to: 39-39
src/database/evaluations.py (1)
12-12: LGTM - consistent type casting pattern.The string-literal type hint in
cast()is consistent with the PR-wide refactoring.src/database/studies.py (1)
46-46: String-literal type casting is consistent throughout.All three
cast()calls have been updated to use string-literal type hints, consistent with the PR-wide pattern. The change on Line 106 correctly casts theLAST_INSERT_ID()result to int.Also applies to: 59-59, 106-106
src/database/tasks.py (1)
22-22: Type casting updates are uniform and correct.All four
cast()calls consistently use string-literal type hints forSequence[Row], matching the PR-wide pattern.Also applies to: 49-49, 65-65, 81-81
tests/routers/openml/datasets_list_datasets_test.py (1)
227-232: Decorator formatting and type ignores added for stricter type checking.The formatting changes and
type: ignore[untyped-decorator]comments are likely driven by the new ruff-format hook and stricter mypy settings (v1.19.1). These changes suppress legitimate warnings about Hypothesis decorators that don't have complete type hints.The comment reference
# 108appears to reference an issue number.Also applies to: 246-246
pyproject.toml (1)
64-65: Ruff rule codes TC001 and TC003 are correct.TC001 and TC003 are correctly remapped from TCH001 and TCH003 in Ruff v0.14.10. However, the description should be clarified: TC001 checks for first-party imports used only for type annotations but not in a type-checking block, and TC003 checks for standard library imports with the same condition. The rules are not specifically about Pydantic type usage—they relate to moving imports into TYPE_CHECKING blocks to reduce runtime overhead.
src/schemas/datasets/mldcat_ap.py (2)
280-280: Good addition of model configuration.Adding
model_configtoJsonLDGraphensures consistency with other model classes in the file and enables proper field aliasing behavior.
44-44: The project's minimum Python version requirement (requires-python = ">=3.12") correctly supports the PEP 695 type parameter syntax used at line 44. No compatibility issues exist.tests/users.py (1)
12-15: LGTM! Redundant type annotations removed.The explicit
: strannotations are unnecessary forStrEnummembers, as the enum already enforces string types.src/routers/openml/tasks.py (3)
18-18: Well-designed recursive type alias for JSON structures.The
JSONtype alias provides clear typing for nested JSON-like data structures, improving type safety throughout the module.
21-27: Type annotations enhanced with JSON alias.The function signature and return type are now more precise. The cast on line 27 ensures the return type matches the expected structure.
84-93: The cast is safe given thatjson_templateis always adict[str, JSON]. When_fill_json_template()receives a dict input, it returns a dict with recursively processed values (lines 103-107). The type system could be more explicit, but the cast does not introduce type unsafety in practice.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #219 +/- ##
=======================================
Coverage ? 92.54%
=======================================
Files ? 51
Lines ? 1851
Branches ? 144
=======================================
Hits ? 1713
Misses ? 97
Partials ? 41 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
And process the new warnings/errors.
Summary by Sourcery
Update linting and type-checking configuration and adjust code to satisfy new pre-commit hooks and type checker requirements.
Enhancements:
Build:
Tests:
Summary by CodeRabbit
Chores
Refactor
Tests
✏️ Tip: You can customize this high-level summary in your review settings.