Skip to content

Move knowledge of how data status works to db layer#324

Merged
PGijsbers merged 2 commits intomainfrom
status-refactor
May 6, 2026
Merged

Move knowledge of how data status works to db layer#324
PGijsbers merged 2 commits intomainfrom
status-refactor

Conversation

@PGijsbers
Copy link
Copy Markdown
Contributor

Knowing that a lack of rows for dataset status implies being 'in preparation' is a database concern.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

Warning

Rate limit exceeded

@PGijsbers has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 30 minutes and 33 seconds before requesting another review.

To continue reviewing without waiting, purchase usage credits in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d43161ad-4657-4f42-8840-db4a71b1bd99

📥 Commits

Reviewing files that changed from the base of the PR and between f62af09 and 0971055.

📒 Files selected for processing (1)
  • src/database/datasets.py

Walkthrough

The changes normalize dataset status handling across the database and router layers. The get_status function in src/database/datasets.py now returns a DatasetStatus enum value directly instead of a raw database row, with a default fallback to IN_PREPARATION. The update_dataset_status endpoint in src/routers/openml/datasets.py is refactored to treat current status as a direct enum value, eliminating object property access patterns. Status transition logic and logging are updated to compare and use enum values directly. The get_dataset endpoint removes intermediate status derivation and passes the status value directly to the response metadata.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main refactoring effort: moving status logic from the router layer to the database layer.
Description check ✅ Passed The description explains the key motivation behind the changes: treating missing status rows as a database concern rather than a router concern.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch status-refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@PGijsbers PGijsbers added the maintenance improvements or changes to existing systems label May 6, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The get_status function now returns a DatasetStatus enum rather than a Row | None, so its return type annotation should be updated accordingly to avoid confusing or misleading type hints.
  • Since get_status now encapsulates the status mapping logic, consider renaming it (e.g., to get_dataset_status_value) or documenting its behavior to make clear that it returns a DatasetStatus enum, not a DB row.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `get_status` function now returns a `DatasetStatus` enum rather than a `Row | None`, so its return type annotation should be updated accordingly to avoid confusing or misleading type hints.
- Since `get_status` now encapsulates the status mapping logic, consider renaming it (e.g., to `get_dataset_status_value`) or documenting its behavior to make clear that it returns a `DatasetStatus` enum, not a DB row.

## Individual Comments

### Comment 1
<location path="src/database/datasets.py" line_range="106-115" />
<code_context>

 async def get_status(id_: int, connection: AsyncConnection) -> Row | None:
     """Get most recent status for the dataset."""
-    row = await connection.execute(
-        text(
-            """
+    row = (
+        await connection.execute(
+            text(
+                """
     SELECT *
     FROM dataset_status
     WHERE did = :dataset_id
     ORDER BY status_date DESC
+    LIMIT 1
     """,
-        ),
-        parameters={"dataset_id": id_},
-    )
-    return row.first()
+            ),
+            parameters={"dataset_id": id_},
+        )
+    ).first()
+    return DatasetStatus(row.status) if row else DatasetStatus.IN_PREPARATION


</code_context>
<issue_to_address>
**issue (bug_risk):** Update the return type annotation of `get_status` to reflect it now returns `DatasetStatus` instead of a DB row or `None`.

The function now always returns a `DatasetStatus` (defaulting to `IN_PREPARATION` when no row is found), but the signature still says `Row | None`. This inaccurate type can mislead callers and type checkers, especially where a row with `.status` is still expected. Please update the return annotation to `-> DatasetStatus` (or a more precise union if you intend to support multiple types).
</issue_to_address>

### Comment 2
<location path="src/database/datasets.py" line_range="112" />
<code_context>
+        await connection.execute(
+            text(
+                """
     SELECT *
     FROM dataset_status
     WHERE did = :dataset_id
     ORDER BY status_date DESC
</code_context>
<issue_to_address>
**suggestion (performance):** Avoid `SELECT *` here and fetch only the `status` column that is actually used.

`get_status` only reads `row.status`, so selecting all columns is unnecessary. Restrict the query to `SELECT status` (or the minimal required columns) to better match the function’s intent and avoid extra data transfer.

```suggestion
    SELECT status
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/database/datasets.py Outdated
Comment thread src/database/datasets.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

❌ Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.87%. Comparing base (3e7c4fc) to head (0971055).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/database/datasets.py 83.33% 0 Missing and 1 partial ⚠️
src/routers/openml/datasets.py 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #324      +/-   ##
==========================================
+ Coverage   93.69%   93.87%   +0.18%     
==========================================
  Files          68       69       +1     
  Lines        3154     3248      +94     
  Branches      223      227       +4     
==========================================
+ Hits         2955     3049      +94     
  Misses        139      139              
  Partials       60       60              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/database/datasets.py`:
- Around line 106-122: The get_status function signature and docstring are out
of sync: update the return type annotation of get_status to return DatasetStatus
(not Row | None) and adjust the docstring to state it always returns a
DatasetStatus and that it defaults to DatasetStatus.IN_PREPARATION when no row
is found; also remove or update any now-unused Row | None imports/annotations
surrounding get_status to keep types consistent and help type-checkers identify
the concrete return type.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e8d691b6-f9e5-4249-8993-08a4ffa63839

📥 Commits

Reviewing files that changed from the base of the PR and between 3014e44 and f62af09.

📒 Files selected for processing (2)
  • src/database/datasets.py
  • src/routers/openml/datasets.py

Comment thread src/database/datasets.py Outdated
@PGijsbers PGijsbers merged commit 82ab6e9 into main May 6, 2026
8 of 9 checks passed
@PGijsbers PGijsbers deleted the status-refactor branch May 6, 2026 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintenance improvements or changes to existing systems

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant