Skip to content

Conversation

@lucifer4330k
Copy link

@lucifer4330k lucifer4330k commented Nov 19, 2025

Summary

This PR migrates the API authentication from URI-encoded API keys (query parameters) to Authorization headers, following FastAPI security best practices.

Changes

  • Core Implementation (src/routers/dependencies.py):

    • Added APIKeyHeader from fastapi.security
    • Created api_key_header instance with name='Authorization' and auto_error=False
    • Updated fetch_user() to use Depends(api_key_header) instead of direct query parameter
  • Test Updates:

    • Updated all test files to use headers={'Authorization': key} instead of ?api_key= query parameters
    • Modified tests in:
      • tests/routers/openml/dataset_tag_test.py
      • tests/routers/openml/datasets_test.py
      • tests/routers/openml/datasets_list_datasets_test.py
      • tests/routers/openml/study_test.py
      • tests/routers/openml/migration/datasets_migration_test.py

Benefits

Security: API keys are no longer exposed in URLs (logs, caches, browser history, referrer headers)
Standards Compliant: Follows RESTful API best practices and FastAPI security patterns
Better Documentation: FastAPI automatically documents this in OpenAPI/Swagger UI
Backward Compatible: auto_error=False ensures public endpoints continue to work

Migration Guide

Before:

GET /datasets/list?api_key=AD000000000000000000000000000000
POST /datasets/tag?api_key=AD000000000000000000000000000000

After:

GET /datasets/list
Authorization: AD000000000000000000000000000000

POST /datasets/tag
Authorization: AD000000000000000000000000000000

Testing

All test files have been updated to use the new authentication method. The database authentication logic remains unchanged - only the transport mechanism has changed from query parameters to headers.

Summary by Sourcery

Migrate API authentication from URI-encoded query parameters to an Authorization header using FastAPI’s security utilities, updating the core dependency and test suite accordingly

Enhancements:

  • Introduce APIKeyHeader dependency to retrieve API keys from the Authorization header and update fetch_user to use it instead of query parameters

Tests:

  • Update all tests to remove api_key query parameters and supply API keys via the Authorization header

Summary by CodeRabbit

  • New Features

    • API authentication now uses the HTTP Authorization header instead of URL query parameters.
  • Tests

    • Test suite updated to validate header-based authentication across dataset, study, migration, and related endpoints.
  • Chores

    • Updated repository ignore patterns and pre-commit configuration (type-checking tooling adjusted).
  • Style

    • Minor formatting change in a schema declaration with no behavioral impact.

fixes #203 #214
#esoc2025

- Replace query parameter-based API key authentication with Authorization header
- Implement FastAPI's APIKeyHeader security scheme in dependencies.py
- Update fetch_user() to extract API key from Authorization header using Depends(api_key_header)
- Update all test files to use headers={'Authorization': key} instead of ?api_key= query params
- Set auto_error=False to maintain backward compatibility for public endpoints

This change improves security by preventing API keys from appearing in:
- Server logs
- Browser history
- Proxy caches
- Referrer headers

Follows FastAPI security best practices and RESTful API standards.
@sourcery-ai
Copy link

sourcery-ai bot commented Nov 19, 2025

Reviewer's Guide

This PR refactors API authentication to leverage FastAPI’s APIKeyHeader mechanism, migrating from URI-encoded API keys in query parameters to Authorization headers and updating dependant router logic and tests accordingly.

Sequence diagram for API key authentication via Authorization header

sequenceDiagram
participant Client
participant API
participant "fetch_user()"
Client->>API: GET /datasets/list (with Authorization header)
API->>"fetch_user()": Depends(api_key_header)
"fetch_user()"->>API: Returns User or None
API-->>Client: Response
Loading

Class diagram for updated fetch_user dependency injection

classDiagram
class APIKeyHeader {
  +name: str
  +auto_error: bool
}
class fetch_user {
  +api_key: APIKey | None (Injected via Depends(api_key_header))
  +user_data: Connection (Injected via Depends(userdb_connection))
  +return: User | None
}
APIKeyHeader <.. fetch_user: injects api_key
Loading

File-Level Changes

Change Details Files
Migrate security dependency to use APIKeyHeader
  • Imported APIKeyHeader from fastapi.security
  • Instantiated api_key_header with name='Authorization', auto_error=False
  • Replaced query-param fetch_user signature with Depends(api_key_header)
src/routers/dependencies.py
Update tests to send API keys via Authorization header
  • Removed '?api_key=' query parameters from TestClient calls
  • Added headers={"Authorization": key} to requests
  • Ensured backward-compatible endpoints still succeed when header absent
tests/routers/openml/datasets_list_datasets_test.py
tests/routers/openml/dataset_tag_test.py
tests/routers/openml/datasets_test.py
tests/routers/openml/migration/datasets_migration_test.py
tests/routers/openml/study_test.py

Possibly linked issues

  • #N/A: The PR changes API key authentication from URI parameters to Authorization headers, directly resolving the issue's request.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link

coderabbitai bot commented Nov 19, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Migrates API authentication in tests from URL query parameters to the HTTP Authorization header and updates src/routers/dependencies.py to use an APIKeyHeader dependency for extracting the API key; also includes small formatting and tooling config changes and a .gitignore path adjustment.

Changes

Cohort / File(s) Summary
Dependency Injection Setup
src/routers/dependencies.py
Added APIKeyHeader import and a module-level api_key_header instance; updated fetch_user signature to accept `api_key: Annotated[APIKey
Authentication Method Migration in Tests
tests/routers/openml/dataset_tag_test.py, tests/routers/openml/datasets_list_datasets_test.py, tests/routers/openml/datasets_test.py, tests/routers/openml/migration/datasets_migration_test.py, tests/routers/openml/study_test.py
Replaced query-parameter authentication (?api_key=...) with header-based authentication (headers={"Authorization": key}) across multiple test cases; removed corresponding ?api_key= query strings; test assertions and logic remain unchanged.
Schema formatting change
src/schemas/datasets/mldcat_ap.py
Non-functional formatting change: reformatted Field(...) invocation for JsonLDGraph.context into a multi-line style; no type, default factory, or behavior changed.
VCS ignore path update
.gitignore
Replaced relative ignore path docker/mysql/data with root-level /mysql/data to change which directory is ignored.
Pre-commit configuration
.pre-commit-config.yaml
Added pydantic to additional_dependencies for the mypy hook in pre-commit configuration.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant Tests
    participant Router
    participant APIKeyHeader as api_key_header
    participant FetchUser as fetch_user
    participant DB as userdb_connection
    participant User as User.fetch

    Client->>Tests: send test request with Authorization header
    Tests->>Router: HTTP request (Authorization header)
    Router->>APIKeyHeader: invoke Depends(api_key_header)
    APIKeyHeader-->>Router: APIKey | None
    Router->>FetchUser: call fetch_user(api_key=APIKey)
    FetchUser->>DB: invoke Depends(userdb_connection)
    DB-->>FetchUser: Connection (user_data)
    FetchUser->>User: User.fetch(APIKey, user_data)
    User-->>FetchUser: User | None
    FetchUser-->>Router: return User | None
    Router-->>Tests: HTTP response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Attention points:
    • src/routers/dependencies.py: confirm APIKeyHeader is correctly configured (optional vs. required behavior) and integrates with existing dependency chain.
    • Tests: scan for any remaining ?api_key= usages and verify all test requests set Authorization consistently.
    • .pre-commit-config.yaml: ensure adding pydantic aligns with CI tooling and mypy settings.

Poem

🐰 I hopped from query to header with cheer,
Authorization carried the key I hold dear.
Dependencies now fetch from a header so bright,
Tests whisper headers through day and night.
Small tidy tweaks make the tree feel light.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: Migrate from URI-encoded API keys to Authorization header' directly and clearly describes the main change: migrating API authentication from query parameters to HTTP Authorization headers.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Consider temporarily supporting the old api_key query parameter alongside the new Authorization header (e.g., via Depends) and logging a deprecation warning to smooth client migration.
  • Enforce or validate a standard Authorization header format (such as 'Bearer ') to align with common API conventions and avoid ambiguous tokens.
  • Clean up leftover query‐param logic (e.g. unused api_key_query variables in migration tests) to keep the codebase concise and clear.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider temporarily supporting the old api_key query parameter alongside the new Authorization header (e.g., via Depends) and logging a deprecation warning to smooth client migration.
- Enforce or validate a standard Authorization header format (such as 'Bearer <token>') to align with common API conventions and avoid ambiguous tokens.
- Clean up leftover query‐param logic (e.g. unused api_key_query variables in migration tests) to keep the codebase concise and clear.

## Individual Comments

### Comment 1
<location> `tests/routers/openml/datasets_list_datasets_test.py:55-63` </location>
<code_context>
 )
 def test_list_accounts_privacy(api_key: ApiKey | None, amount: int, py_api: TestClient) -> None:
-    key = f"?api_key={api_key}" if api_key else ""
+    headers = {"Authorization": api_key} if api_key else {}
     response = py_api.post(
-        f"/datasets/list{key}",
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding tests for malformed or missing Authorization headers.

Tests should also cover cases like empty, incorrectly formatted, or multiple Authorization headers to verify error handling and security.

```suggestion
def test_list_accounts_privacy(api_key: ApiKey | None, amount: int, py_api: TestClient) -> None:
    headers = {"Authorization": api_key} if api_key else {}
    response = py_api.post(
        "/datasets/list",
        json={"status": "all", "pagination": {"limit": 1000}},
        headers=headers,
    )
    assert response.status_code == HTTPStatus.OK, response.json()
    assert len(response.json()) == amount

def test_list_accounts_privacy_missing_authorization(py_api: TestClient) -> None:
    response = py_api.post(
        "/datasets/list",
        json={"status": "all", "pagination": {"limit": 1000}},
    )
    assert response.status_code in (HTTPStatus.UNAUTHORIZED, HTTPStatus.FORBIDDEN)

def test_list_accounts_privacy_empty_authorization(py_api: TestClient) -> None:
    headers = {"Authorization": ""}
    response = py_api.post(
        "/datasets/list",
        json={"status": "all", "pagination": {"limit": 1000}},
        headers=headers,
    )
    assert response.status_code in (HTTPStatus.UNAUTHORIZED, HTTPStatus.FORBIDDEN)

def test_list_accounts_privacy_malformed_authorization(py_api: TestClient) -> None:
    headers = {"Authorization": "Bearer"}  # Missing token
    response = py_api.post(
        "/datasets/list",
        json={"status": "all", "pagination": {"limit": 1000}},
        headers=headers,
    )
    assert response.status_code in (HTTPStatus.UNAUTHORIZED, HTTPStatus.FORBIDDEN)

def test_list_accounts_privacy_incorrect_format_authorization(py_api: TestClient) -> None:
    headers = {"Authorization": "Basic not_a_token"}
    response = py_api.post(
        "/datasets/list",
        json={"status": "all", "pagination": {"limit": 1000}},
        headers=headers,
    )
    assert response.status_code in (HTTPStatus.UNAUTHORIZED, HTTPStatus.FORBIDDEN)

def test_list_accounts_privacy_multiple_authorization_headers(py_api: TestClient) -> None:
    # TestClient does not support multiple headers with the same name directly,
    # but we can simulate by passing a comma-separated value.
    headers = {"Authorization": "Bearer valid_token, Bearer another_token"}
    response = py_api.post(
        "/datasets/list",
        json={"status": "all", "pagination": {"limit": 1000}},
        headers=headers,
    )
    assert response.status_code in (HTTPStatus.UNAUTHORIZED, HTTPStatus.FORBIDDEN)
```
</issue_to_address>

### Comment 2
<location> `tests/routers/openml/dataset_tag_test.py:52` </location>
<code_context>
-        f"/datasets/tag?api_key={ApiKey.ADMIN}",
+        "/datasets/tag",
         json={"data_id": dataset_id, "tag": tag},
+        headers={"Authorization": ApiKey.ADMIN},
     )
     assert response.status_code == HTTPStatus.OK
</code_context>

<issue_to_address>
**suggestion (testing):** Test for missing Authorization header is present, but consider testing for extra/invalid headers.

Consider adding a test where the Authorization header contains an unexpected or malformed value to verify the endpoint correctly rejects such requests.
</issue_to_address>

### Comment 3
<location> `tests/routers/openml/migration/datasets_migration_test.py:113` </location>
<code_context>
 )
 def test_list_accounts_privacy(api_key: ApiKey | None, amount: int, py_api: TestClient) -> None:
-    key = f"?api_key={api_key}" if api_key else ""
+    headers = {"Authorization": api_key} if api_key else {}
     response = py_api.post(
-        f"/datasets/list{key}",
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding a test for both query parameter and header present.

Add a test case with both the query parameter and Authorization header to confirm the correct precedence and maintain backward compatibility.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 55 to 63
def test_list_accounts_privacy(api_key: ApiKey | None, amount: int, py_api: TestClient) -> None:
key = f"?api_key={api_key}" if api_key else ""
headers = {"Authorization": api_key} if api_key else {}
response = py_api.post(
f"/datasets/list{key}",
"/datasets/list",
json={"status": "all", "pagination": {"limit": 1000}},
headers=headers,
)
assert response.status_code == HTTPStatus.OK, response.json()
assert len(response.json()) == amount
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider adding tests for malformed or missing Authorization headers.

Tests should also cover cases like empty, incorrectly formatted, or multiple Authorization headers to verify error handling and security.

Suggested change
def test_list_accounts_privacy(api_key: ApiKey | None, amount: int, py_api: TestClient) -> None:
key = f"?api_key={api_key}" if api_key else ""
headers = {"Authorization": api_key} if api_key else {}
response = py_api.post(
f"/datasets/list{key}",
"/datasets/list",
json={"status": "all", "pagination": {"limit": 1000}},
headers=headers,
)
assert response.status_code == HTTPStatus.OK, response.json()
assert len(response.json()) == amount
def test_list_accounts_privacy(api_key: ApiKey | None, amount: int, py_api: TestClient) -> None:
headers = {"Authorization": api_key} if api_key else {}
response = py_api.post(
"/datasets/list",
json={"status": "all", "pagination": {"limit": 1000}},
headers=headers,
)
assert response.status_code == HTTPStatus.OK, response.json()
assert len(response.json()) == amount
def test_list_accounts_privacy_missing_authorization(py_api: TestClient) -> None:
response = py_api.post(
"/datasets/list",
json={"status": "all", "pagination": {"limit": 1000}},
)
assert response.status_code in (HTTPStatus.UNAUTHORIZED, HTTPStatus.FORBIDDEN)
def test_list_accounts_privacy_empty_authorization(py_api: TestClient) -> None:
headers = {"Authorization": ""}
response = py_api.post(
"/datasets/list",
json={"status": "all", "pagination": {"limit": 1000}},
headers=headers,
)
assert response.status_code in (HTTPStatus.UNAUTHORIZED, HTTPStatus.FORBIDDEN)
def test_list_accounts_privacy_malformed_authorization(py_api: TestClient) -> None:
headers = {"Authorization": "Bearer"} # Missing token
response = py_api.post(
"/datasets/list",
json={"status": "all", "pagination": {"limit": 1000}},
headers=headers,
)
assert response.status_code in (HTTPStatus.UNAUTHORIZED, HTTPStatus.FORBIDDEN)
def test_list_accounts_privacy_incorrect_format_authorization(py_api: TestClient) -> None:
headers = {"Authorization": "Basic not_a_token"}
response = py_api.post(
"/datasets/list",
json={"status": "all", "pagination": {"limit": 1000}},
headers=headers,
)
assert response.status_code in (HTTPStatus.UNAUTHORIZED, HTTPStatus.FORBIDDEN)
def test_list_accounts_privacy_multiple_authorization_headers(py_api: TestClient) -> None:
# TestClient does not support multiple headers with the same name directly,
# but we can simulate by passing a comma-separated value.
headers = {"Authorization": "Bearer valid_token, Bearer another_token"}
response = py_api.post(
"/datasets/list",
json={"status": "all", "pagination": {"limit": 1000}},
headers=headers,
)
assert response.status_code in (HTTPStatus.UNAUTHORIZED, HTTPStatus.FORBIDDEN)

f"/datasets/tag?api_key={ApiKey.ADMIN}",
"/datasets/tag",
json={"data_id": dataset_id, "tag": tag},
headers={"Authorization": ApiKey.ADMIN},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Test for missing Authorization header is present, but consider testing for extra/invalid headers.

Consider adding a test where the Authorization header contains an unexpected or malformed value to verify the endpoint correctly rejects such requests.

) -> None:
query = f"?api_key={api_key}" if api_key else ""
response = py_api.get(f"/datasets/130{query}")
headers = {"Authorization": api_key} if api_key else {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider adding a test for both query parameter and header present.

Add a test case with both the query parameter and Authorization header to confirm the correct precedence and maintain backward compatibility.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 217d62d and 3159bbf.

📒 Files selected for processing (6)
  • src/routers/dependencies.py (2 hunks)
  • tests/routers/openml/dataset_tag_test.py (5 hunks)
  • tests/routers/openml/datasets_list_datasets_test.py (9 hunks)
  • tests/routers/openml/datasets_test.py (3 hunks)
  • tests/routers/openml/migration/datasets_migration_test.py (3 hunks)
  • tests/routers/openml/study_test.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
tests/routers/openml/datasets_test.py (1)
tests/conftest.py (1)
  • py_api (48-53)
tests/routers/openml/dataset_tag_test.py (3)
tests/conftest.py (1)
  • py_api (48-53)
src/database/datasets.py (1)
  • tag (53-66)
tests/users.py (1)
  • ApiKey (11-15)
src/routers/dependencies.py (1)
src/database/users.py (1)
  • User (47-63)
tests/routers/openml/migration/datasets_migration_test.py (1)
tests/conftest.py (1)
  • py_api (48-53)
tests/routers/openml/datasets_list_datasets_test.py (2)
tests/conftest.py (1)
  • py_api (48-53)
tests/users.py (1)
  • ApiKey (11-15)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Sourcery review
🔇 Additional comments (14)
tests/routers/openml/study_test.py (2)

460-471: Test migration to header-based authentication looks good.

The test correctly migrates from query parameter authentication (?api_key=...) to Authorization header (headers={"Authorization": ...}), maintaining the same test logic and assertions.


515-519: Helper function properly updated for header-based authentication.

The _attach_tasks_to_study helper correctly uses headers={"Authorization": api_key} instead of query parameters, ensuring consistent authentication across all study-related tests.

tests/routers/openml/dataset_tag_test.py (2)

17-25: Proper handling of authentication states in tests.

The test correctly handles both authenticated and unauthenticated scenarios by conditionally constructing the headers dictionary. The use of an empty dict for None cases and {"Authorization": key} for authenticated cases is clean and appropriate.


33-44: Comprehensive test coverage maintained with header-based authentication.

All dataset tagging tests have been consistently updated to use Authorization headers while maintaining the same test logic, assertions, and coverage for different authentication scenarios (admin, non-owner, owner, invalid keys).

Also applies to: 47-55, 58-73, 81-92

tests/routers/openml/migration/datasets_migration_test.py (3)

109-118: Proper authentication handling for private dataset access tests.

The test correctly constructs headers conditionally based on whether an API key is provided, testing both authenticated and unauthenticated access scenarios for private datasets.


125-137: Migration parity test correctly compares old and new authentication methods.

The test appropriately demonstrates the API migration by having the Python API use header-based authentication (headers={"Authorization": api_key}) while the PHP API comparison continues using query parameters (?api_key=...), which is expected for parity testing during the migration period.


154-195: Dataset tag parity test properly updated for header-based authentication.

The test maintains comprehensive parity checking between old (PHP with query params) and new (Python with headers) authentication methods, ensuring the migration doesn't break existing functionality.

tests/routers/openml/datasets_test.py (3)

169-171: Private dataset feature access test properly authenticates via headers.

The parameterized test correctly uses header-based authentication for accessing features of private datasets, maintaining proper test coverage for admin and owner access scenarios.


190-205: Helper function cleanly refactored for header-based authentication.

The _assert_status_update_is_successful helper properly encapsulates the authentication logic using headers={"Authorization": apikey}, promoting code reuse across multiple status update test cases.


262-273: Unauthorized status update tests maintain proper authentication verification.

The parameterized test correctly passes API keys via Authorization headers, ensuring proper testing of authorization failures for non-admin users attempting dataset status updates.

tests/routers/openml/datasets_list_datasets_test.py (3)

55-63: Privacy-aware listing properly implements header-based authentication.

The test correctly handles authentication for different user roles (admin, owner, regular user, anonymous) using conditional headers, verifying that private datasets are filtered appropriately based on user permissions.


70-80: Dataset filtering tests comprehensively updated for header authentication.

All dataset list filtering tests (by name, version, uploader, etc.) consistently use headers={"Authorization": api_key} for authenticated requests, maintaining thorough test coverage across different filter combinations.

Also applies to: 87-93, 127-136, 139-145, 156-169


253-305: Property-based parity test properly handles both authentication methods.

The hypothesis-driven test correctly separates authentication methods: Python API uses headers={"Authorization": api_key} while the PHP API comparison uses ?api_key=... in the query string, ensuring comprehensive parity validation during the migration period.

src/routers/dependencies.py (1)

29-32: Query parameter authentication does not exist in the codebase; the review comment's concern is not applicable.

The codebase implements authentication exclusively via the Authorization header using APIKeyHeader(name="Authorization", auto_error=False). A comprehensive search reveals:

  • No Query parameter imports or usage for authentication anywhere
  • All test cases uniformly use header-based authentication (headers={"Authorization": api_key})
  • No commented-out query parameter code or legacy patterns
  • No migration logic or fallback mechanisms

Since query parameter authentication was never implemented, there is no backward compatibility concern to address or migration path to document.

Likely an incorrect or invalid review comment.

from typing import Annotated

from fastapi import Depends
from fastapi.security import APIKeyHeader
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Non-standard Authorization header usage without scheme prefix.

The implementation uses the Authorization header to carry raw API keys without a scheme prefix (e.g., Authorization: <api_key>). This deviates from RFC 7235 and common REST API practices, where the Authorization header should follow the format Authorization: <scheme> <credentials> (e.g., Authorization: Bearer <token> or Authorization: ApiKey <key>).

Recommended alternatives:

  1. Use a custom header (preferred for API keys): X-API-Key or X-Auth-Token
  2. Use a proper scheme: Authorization: Bearer <api_key> or Authorization: ApiKey <api_key>

Using a non-standard format may cause confusion for API consumers and compatibility issues with HTTP clients, proxies, or API gateways that expect standard Authorization header formats.

Also applies to: 11-12


🌐 Web query:

What is the standard format for API key authentication in HTTP Authorization headers according to RFC 7235 and REST API best practices?

💡 Result:

Short answer

  • RFC 7235 defines the Authorization header as: Authorization: (credentials may be a token68 or auth-params). [1]
  • In practice for API keys you should put the key in the Authorization header using a clear auth-scheme token, e.g. Authorization: Bearer (recommended for opaque/JWT tokens per RFC 6750) or Authorization: ApiKey <your_api_key> / Authorization: Api-Key <your_api_key> for a custom scheme. Do not put keys in the URL. [1][2][3][4]

Recommended minimal rules / best practices

  • Use the Authorization header (not query string) and always use HTTPS/TLS. [2][3]
  • Use a well‑known scheme if available (Bearer for OAuth2 tokens); if you define a custom scheme use a token-style credential (no colon/illegal chars) consistent with RFC 7235. [1][2]
  • Make keys short‑lived or rotate them, scope and least‑privilege, log and monitor use, and allow revocation. Avoid storing keys in client-side places that can be leaked. [2][3]

References

  • RFC 7235 — HTTP/1.1: Authentication (Authorization header syntax). [1]
  • RFC 6750 — OAuth 2.0 Bearer Token Usage (Authorization: Bearer , security recommendations). [2]
  • MDN Web Docs — Authorization / WWW-Authenticate header (auth-scheme, token68, auth-param syntax). [3]
  • Common REST practice / libraries showing Authorization: Api-Key / ApiKey usage (example implementations). [4]

🏁 Script executed:

#!/bin/bash
# Examine the dependencies.py file to understand the actual implementation
cat src/routers/dependencies.py

Length of output: 976


🏁 Script executed:

#!/bin/bash
# Also search for APIKeyHeader usage patterns in the codebase
rg -n "APIKeyHeader" -A 5 -B 2

Length of output: 872


🌐 Web query:

FastAPI APIKeyHeader Authorization header scheme format

💡 Result:

Short answer

  • APIKeyHeader just extracts the raw header value (by header name) and returns it as a string — it does NOT parse an "Authorization: Scheme credentials" value. [1]
  • If you need an Authorization header with a scheme (e.g. "Authorization: Bearer ") use HTTPBearer / OAuth2 classes (they split scheme vs credentials) or parse the header yourself with get_authorization_scheme_param. [2][3]

Minimal examples

  1. APIKeyHeader (reads header value directly)
    from fastapi import Depends, FastAPI
    from fastapi.security import APIKeyHeader

app = FastAPI()
api_key_header = APIKeyHeader(name="X-API-Key")

@app.get("/items/")
async def read_items(key: str = Depends(api_key_header)):
return {"key": key}
(returns the header string exactly as sent). [1]

  1. Use HTTPBearer to get scheme + credentials
    from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials

bearer = HTTPBearer()

@app.get("/secure/")
async def secure(creds: HTTPAuthorizationCredentials = Depends(bearer)):
# creds.scheme == "Bearer", creds.credentials == ""
return {"scheme": creds.scheme, "token": creds.credentials}
(HTTPBearer splits "Authorization: Bearer " for you). [2]

  1. Manual parsing with get_authorization_scheme_param
    from fastapi.security.utils import get_authorization_scheme_param
    authorization = request.headers.get("Authorization")
    scheme, param = get_authorization_scheme_param(authorization)
    if scheme.lower() == "bearer":
    token = param

References
[1] FastAPI — APIKeyHeader docs.
[2] FastAPI — HTTP auth / HTTPAuthorizationCredentials (HTTPBearer) docs.
[3] Example usage of get_authorization_scheme_param in FastAPI (parsing Authorization header).


🌐 Web query:

FastAPI APIKeyHeader documentation how to use with Authorization header

💡 Result:

Short answer: use APIKeyHeader with name="Authorization" as a Security dependency (not plain Depends), then parse/strip the "Bearer " prefix yourself. Example:

from fastapi import FastAPI, Security, HTTPException, status
from fastapi.security.api_key import APIKeyHeader

app = FastAPI()
api_key_header = APIKeyHeader(name="Authorization", auto_error=False)

async def get_api_key(authorization: str = Security(api_key_header)):
    if not authorization:
        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Missing auth")
    # if using Bearer tokens, strip the prefix:
    if authorization.startswith("Bearer "):
        token = authorization[len("Bearer "):]
    else:
        token = authorization
    # validate token...
    return token

@app.get("/protected")
async def protected(api_key: str = Security(get_api_key)):
    return {"token": api_key}

Notes:

  • Use Security(...) so FastAPI documents the scheme and shows the Authorize button in /docs. [1]
  • APIKeyHeader simply extracts the header value — it won’t remove “Bearer ” for you. Use auto_error=False if you need optional auth or multiple auth methods. [1][2]

Sources: FastAPI security docs and community examples. [1] FastAPI security/APIKeyHeader reference. [2] StackOverflow discussion showing APIKeyHeader with Authorization.


Use HTTPBearer or custom header to enforce RFC 7235 Authorization header format.

The current implementation uses APIKeyHeader(name="Authorization") which accepts raw API key values without enforcing a scheme prefix, violating RFC 7235. RFC 7235 requires Authorization headers to follow the format Authorization: <auth-scheme> <credentials> (e.g., Bearer <token> or ApiKey <key>). FastAPI's APIKeyHeader simply extracts the raw header value without scheme parsing or validation.

Recommended fix:

  1. For Bearer tokens: Use HTTPBearer() dependency (automatically parses scheme and credentials)
  2. For API keys: Use custom header X-API-Key instead of Authorization
  3. If Authorization header required: Manually parse with get_authorization_scheme_param() to extract and validate the scheme
🤖 Prompt for AI Agents
In src/routers/dependencies.py around line 4, the code currently uses
APIKeyHeader(name="Authorization") which accepts raw header values and does not
enforce RFC 7235 scheme parsing; replace it with an HTTPBearer() dependency
(importing HTTPBearer and HTTPAuthorizationCredentials) to automatically parse
scheme + credentials and then validate that the scheme is "bearer"
(case-insensitive) before returning the token. If you intend to use API keys
instead, switch the client header to a custom header name like "X-API-Key" and
continue using APIKeyHeader for that header. Alternatively, if you must keep
Authorization, use get_authorization_scheme_param() to manually split and
validate the scheme and credentials and raise HTTPException(401) on
missing/invalid scheme.

Replace direct dict type with lambda factory to satisfy mypy type checker.
This fixes the pre-commit.ci mypy check failure.
The mypy hook needs pydantic installed to use the pydantic.mypy plugin,
which properly handles Field(default_factory=dict) in Pydantic models.
- Add pydantic>=2.6,<2.7 to mypy additional_dependencies
- Ensures consistent behavior between local and CI environments
- Allows Field(default_factory=dict) to work with both mypy and ruff-format
…ed JSON

- Remove TODO comment and workaround for single input limitation
- Add robust JSON parsing with error handling for malformed data
- Support both string (JSON-encoded) and already-parsed constraints
- Only extract data_type when constraint is a valid dict with the field
- Resolves openml#214
@lucifer4330k lucifer4330k force-pushed the feat/authorization-header-auth branch from 8ae991c to 7decfe1 Compare November 19, 2025 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Authentication without URIs

1 participant