Skip to content

Conversation

@ParticularlyPythonicBS
Copy link
Member

@ParticularlyPythonicBS ParticularlyPythonicBS commented Oct 14, 2025

Adds optional temoa repository hash tracking to dataset versions, enabling users to track which commit of the temoa repository each database version complies with. This improves data provenance and helps maintain consistency between databases and their source repository versions.

Manifest Structure

  • Added temoaRepoHash field to all version entries in manifest.json
  • Field is optional and defaults to null for backward compatibility

CLI Integration

  • prepare command now prompts for temoa repo hash (optional)
  • Interactive validation with retry logic for invalid hash formats

User Interface

  • list-datasets now displays temoa hash in new "Temoa Hash" column
  • Pull commands show temoa hash in version information
  • Interactive flows display temoa hash in version selection menus
  • Rollback operations preserve original temoa hash from target version

Usage Examples

# Prepare with temoa hash
uv run datamanager prepare energy-data.sqlite ./data.sqlite
# → Prompts: "Enter the temoa repository commit hash (optional, press Enter to skip):"
# → User enters: "abc123defg" 
# → Result: temoaRepoHash: "abc123defg" stored in version

# View temoa hashes
uv run datamanager list-datasets
# → Shows: Dataset Name | Latest Version | Last Updated | SHA256 | Temoa Hash

# Skip hash (optional)
# → User presses Enter
# → Result: temoaRepoHash: null in version

Summary by CodeRabbit

  • New Features

    • Prepare optionally prompts for a Temoa Repository Hash (can be skipped); input is validated.
    • list-datasets and pull now show a Temoa Hash column when available; version selection and logs surface the hash.
    • Manifest history entries now include an optional temoaRepoHash field.
  • Documentation

    • Updated workflow and usage docs with prompt examples, field descriptions, and an "The output includes" section.
  • Tests

    • Added/updated tests for hash validation and prepare flows that skip the hash.

@ParticularlyPythonicBS ParticularlyPythonicBS added the enhancement New feature or request label Oct 14, 2025
@coderabbitai
Copy link

coderabbitai bot commented Oct 14, 2025

Walkthrough

Adds an optional Temoa Repository Hash prompt to interactive prepare, validates and stores it in manifest history entries, surfaces a "Temoa Hash" column and temoa info in CLI messages and version selections, updates docs to document the prompt and output fields, and adds tests for validation and prompt skipping.

Changes

Cohort / File(s) Summary of Changes
Documentation updates
docs/source/usage.md, docs/source/workflow.md
Document optional Temoa Repository Hash prompt during prepare; expand list-datasets output fields and describe each field; add "The output includes" section and update workflow step to mention the prompt.
Manifest entries
manifest.json
Add temoaRepoHash: null to history entries for multiple manifest versions under test_database.sqlite.
CLI logic & UI
src/datamanager/__main__.py
Add _validate_temoa_hash (accepts 4–40 hex chars or empty); prompt for optional temoaRepoHash in interactive prepare with validation and retry; include temoaRepoHash in new/rollback/pulled history entries; display shortened temoa hash in dataset listing, version selection, and pull messages; non-interactive mode skips prompt.
Tests — prompt behavior
tests/test_main.py
Patch questionary.text in several prepare/update/no-change tests to return "" (skip), causing temoaRepoHash to be None while preserving other assertions.
Tests — validation
tests/test_manifest.py
Import and test _validate_temoa_hash with valid and invalid cases, including boundary lengths and whitespace handling.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as datamanager CLI
  participant Validator as _validate_temoa_hash
  participant Manifest as manifest.json

  User->>CLI: prepare (interactive)
  CLI->>User: Prompt: "Temoa Repository Hash (optional)"
  User-->>CLI: Enter hex hash or leave empty
  CLI->>Validator: validate(input)
  Validator-->>CLI: valid / invalid
  alt valid or empty
    CLI->>Manifest: Write history entry { ..., temoaRepoHash }
    Manifest-->>CLI: persisted
    CLI-->>User: Show prepare summary (includes temoa hash if present)
  else invalid
    CLI-->>User: Show validation error and re-prompt
  end
Loading
sequenceDiagram
  autonumber
  actor User
  participant CLI as datamanager CLI
  participant Manifest as manifest.json

  rect rgba(200,230,255,0.25)
    note right of CLI: list-datasets (read)
    CLI->>Manifest: Read datasets
    Manifest-->>CLI: Entries (include temoaRepoHash)
    CLI-->>User: Display table with "Temoa Hash" column (shortened or N/A)
  end

  rect rgba(220,255,220,0.25)
    note right of CLI: pull / version selection
    User->>CLI: pull dataset
    CLI->>Manifest: Read history entries
    Manifest-->>CLI: Versions (include temoaRepoHash)
    CLI-->>User: Show version choices including "temoa: <hash>" when present
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A twitch, a nibble, a hex to find,
I hop through commits, exact and kind.
Shortened traces in the table's light,
Rollbacks remember what the burrow liked.
Docs and tests—my little delight. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title concisely and accurately summarizes the primary enhancement of introducing optional TemOA repository hash tracking to dataset versions, making it clear to reviewers what the main feature is.
Docstring Coverage ✅ Passed Docstring coverage is 93.33% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/temoa_hash_in_manifest

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03c4ac2 and faaa54a.

📒 Files selected for processing (6)
  • docs/source/usage.md (2 hunks)
  • docs/source/workflow.md (1 hunks)
  • manifest.json (3 hunks)
  • src/datamanager/__main__.py (10 hunks)
  • tests/test_main.py (5 hunks)
  • tests/test_manifest.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • docs/source/workflow.md
  • tests/test_manifest.py
  • tests/test_main.py
  • manifest.json
🧰 Additional context used
🪛 LanguageTool
docs/source/usage.md

[grammar] ~60-~60: There might be a mistake here.
Context: ... Name**: The logical name of the dataset - Latest Version: The most recent versio...

(QB_NEW_EN)


[grammar] ~61-~61: There might be a mistake here.
Context: ...t Version**: The most recent version tag - Last Updated: When the latest version ...

(QB_NEW_EN)


[grammar] ~62-~62: There might be a mistake here.
Context: ...d (relative time and absolute timestamp) - SHA256: First 12 characters of the fil...

(QB_NEW_EN)


[grammar] ~63-~63: There might be a mistake here.
Context: ...**: First 12 characters of the file hash - Temoa Hash: First 12 characters of the...

(QB_NEW_EN)

🔇 Additional comments (10)
src/datamanager/__main__.py (8)

5-5: LGTM!

The re import is necessary for the regex validation in _validate_temoa_hash.


117-138: LGTM!

The Temoa Hash column integration is well-implemented with appropriate truncation logic for display and safe handling of missing values.


159-164: LGTM!

The temoa hash information is cleanly appended to pull messages when available.


225-230: LGTM!

The interactive pull version selection correctly displays temoa hash information with safe handling of missing values using .get().


268-308: LGTM!

The temoa hash prompt implementation is excellent:

  • Respects non-interactive mode
  • Provides clear user guidance
  • Implements robust validation with retry logic
  • Handles empty input appropriately for optional field
  • Shows helpful error messages for invalid formats

366-366: LGTM!

The temoaRepoHash field is correctly added to both update entries (line 366) and new dataset entries (line 387).

Also applies to: 387-387


500-500: LGTM!

Correctly preserves the original temoaRepoHash from the target version during rollback. This ensures the rollback maintains the same temoa repository association as the version being restored.


563-568: LGTM!

The interactive rollback version selection correctly displays temoa hash information, consistent with the interactive pull implementation.

docs/source/usage.md (2)

36-40: LGTM!

The documentation clearly explains the optional Temoa Repository Hash prompt with helpful examples and guidance.


52-65: LGTM!

The output field descriptions are comprehensive and clear. The LanguageTool hints appear to be false positives related to the bullet list format.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Oct 14, 2025

Codecov Report

❌ Patch coverage is 61.29032% with 24 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/datamanager/__main__.py 45.45% 21 Missing and 3 partials ⚠️

📢 Thoughts on this report? Let us know!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (8)
docs/source/workflow.md (1)

24-24: Clarify accepted hash format for consistency with validator.

Add “4–40 hexadecimal characters” to align with _validate_temoa_hash behavior.

tests/test_manifest.py (2)

8-8: Avoid testing private (underscore) symbols from CLI module.

Consider moving the validator to a small utilities/validators module and importing from there to avoid tests depending on a private function in main.


54-75: Good coverage; add boundary-positive cases.

Add exact-boundary positives to match 4–40 hex rule, e.g. “abcd” (4) and “abcdefg” (7).

Apply this diff:

 def test_validate_temoa_hash() -> None:
@@
-    # Valid short hash
+    # Valid short hash
     assert _validate_temoa_hash("abc123")
     assert _validate_temoa_hash("ABCDEF")
+    assert _validate_temoa_hash("abcd")     # min length 4
+    assert _validate_temoa_hash("abcdefg")  # 7 chars (common short)
tests/test_main.py (1)

24-28: Deduplicate temoa prompt patching with a fixture/helper.

Create a pytest fixture (e.g., in conftest.py) to patch questionary.text to return "" and reuse it across tests to reduce repetition.

Example:

# conftest.py
import pytest

@pytest.fixture
def skip_temoa_prompt(mocker):
    return mocker.patch(
        "questionary.text",
        return_value=mocker.Mock(ask=mocker.Mock(return_value="")),
    )

Then pass skip_temoa_prompt into tests that need it.

Also applies to: 54-58, 97-101, 138-142

docs/source/usage.md (1)

36-40: Specify accepted hash format.

Add “4–40 hexadecimal characters” to match the validator and avoid confusion.

src/datamanager/__main__.py (3)

225-230: Show “N/A” when temoaRepoHash is None in interactive choices.

entry.get(...) returns None if key exists with None, producing “temoa: None”. Normalize to “N/A”.

Apply this diff:

-    version_choices = []
-    for entry in dataset["history"]:
-        temoa_info = f", temoa: {entry.get('temoaRepoHash', 'N/A')}"
-        version_choices.append(
-            f"{entry['version']} (commit: {entry['commit']}, {_rel(entry['timestamp'])}{temoa_info})"
-        )
+    version_choices = []
+    for entry in dataset["history"]:
+        temoa_val = entry.get("temoaRepoHash")
+        temoa_display = temoa_val if temoa_val else "N/A"
+        temoa_info = f", temoa: {temoa_display}"
+        version_choices.append(
+            f"{entry['version']} (commit: {entry['commit']}, {_rel(entry['timestamp'])}{temoa_info})"
+        )

268-309: Prepare: optional temoa prompt/validation – solid; consider a CLI flag.

Flow and retry UX are good. For automation, consider a --temoa-hash option to pass the value in non-interactive (—yes) mode.


563-568: Rollback choices: normalize “N/A” when missing/None.

Mirror the fix from pull interactive to avoid “temoa: None”.

Apply this diff:

-    version_choices = []
-    for entry in dataset["history"][1:]:  # Start from the second entry
-        temoa_info = f", temoa: {entry.get('temoaRepoHash', 'N/A')}"
-        version_choices.append(
-            f"{entry['version']} (commit: {entry['commit']}, {_rel(entry['timestamp'])}{temoa_info})"
-        )
+    version_choices = []
+    for entry in dataset["history"][1:]:  # Start from the second entry
+        temoa_val = entry.get("temoaRepoHash")
+        temoa_display = temoa_val if temoa_val else "N/A"
+        temoa_info = f", temoa: {temoa_display}"
+        version_choices.append(
+            f"{entry['version']} (commit: {entry['commit']}, {_rel(entry['timestamp'])}{temoa_info})"
+        )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ec6e33 and 1c514c1.

📒 Files selected for processing (6)
  • docs/source/usage.md (2 hunks)
  • docs/source/workflow.md (1 hunks)
  • manifest.json (3 hunks)
  • src/datamanager/__main__.py (10 hunks)
  • tests/test_main.py (5 hunks)
  • tests/test_manifest.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_manifest.py (1)
src/datamanager/__main__.py (1)
  • _validate_temoa_hash (45-58)
🪛 LanguageTool
docs/source/usage.md

[grammar] ~60-~60: There might be a mistake here.
Context: ... Name**: The logical name of the dataset - Latest Version: The most recent versio...

(QB_NEW_EN)


[grammar] ~61-~61: There might be a mistake here.
Context: ...t Version**: The most recent version tag - Last Updated: When the latest version ...

(QB_NEW_EN)


[grammar] ~62-~62: There might be a mistake here.
Context: ...d (relative time and absolute timestamp) - SHA256: First 12 characters of the fil...

(QB_NEW_EN)


[grammar] ~63-~63: There might be a mistake here.
Context: ...**: First 12 characters of the file hash - Temoa Hash: First 12 characters of the...

(QB_NEW_EN)

🔇 Additional comments (8)
manifest.json (1)

13-15: temoaRepoHash added with null default – LGTM; verify manifest consistency.

  • Addition is backward compatible and aligns with code paths.
  • Please verify latestVersion ("v1") vs first history version ("v4"); update if this is a real manifest, or confirm if it’s an intentionally inconsistent fixture.

Also applies to: 23-25, 33-35

tests/test_main.py (1)

41-41: Assertion of temoaRepoHash None on create – LGTM.

docs/source/usage.md (1)

52-65: Consistency note: N/A vs None in interactive lists.

Docs say “N/A” when not specified. Ensure interactive menus in code also render “N/A” (not “None”) when temoaRepoHash is None.

src/datamanager/__main__.py (5)

117-129: List view temoa column – LGTM.

Column header and shortening logic are clear and consistent with SHA display.


159-165: Pull message temoa info – LGTM.

Conditional inclusion avoids noise when absent.


366-367: Persisting temoaRepoHash on update – LGTM.


387-388: Persisting temoaRepoHash on create – LGTM.


500-501: Rollback preserves original temoaRepoHash – LGTM.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/datamanager/__main__.py (2)

122-128: Consider extracting hash display logic to a helper function.

The truncation logic for displaying hashes (first 12 chars + ellipsis if longer) is used in multiple places throughout the file. Extracting this to a helper function would reduce duplication and improve maintainability.

Example helper function:

def _format_hash_display(hash_value: Optional[str], max_length: int = 12) -> str:
    """Format a hash for display, truncating if longer than max_length."""
    if not hash_value:
        return "N/A"
    if len(hash_value) > max_length:
        return f"{hash_value[:max_length]}..."
    return hash_value

Then simplify the display logic:

-        temoa_hash_display = "N/A"
-        if latest.get("temoaRepoHash"):
-            temoa_hash_display = (
-                f"{latest['temoaRepoHash'][:12]}..."
-                if len(str(latest["temoaRepoHash"])) > 12
-                else str(latest["temoaRepoHash"])
-            )
+        temoa_hash_display = _format_hash_display(latest.get("temoaRepoHash"))

225-230: Consider extracting version choice string building logic.

The code for building version choice strings with temoa hash info is duplicated in _pull_interactive (lines 225-230) and _rollback_interactive (lines 563-568). Extracting this to a helper function would reduce duplication.

Example helper function:

def _format_version_choice(entry: dict[str, Any]) -> str:
    """Format a version history entry for display in selection menus."""
    temoa_info = f", temoa: {entry.get('temoaRepoHash', 'N/A')}"
    return f"{entry['version']} (commit: {entry['commit']}, {_rel(entry['timestamp'])}{temoa_info})"

Then simplify both locations:

     version_choices = []
     for entry in dataset["history"]:
-        temoa_info = f", temoa: {entry.get('temoaRepoHash', 'N/A')}"
-        version_choices.append(
-            f"{entry['version']} (commit: {entry['commit']}, {_rel(entry['timestamp'])}{temoa_info})"
-        )
+        version_choices.append(_format_version_choice(entry))

Also applies to: 563-568

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c514c1 and 03c4ac2.

📒 Files selected for processing (6)
  • docs/source/usage.md (2 hunks)
  • docs/source/workflow.md (1 hunks)
  • manifest.json (3 hunks)
  • src/datamanager/__main__.py (10 hunks)
  • tests/test_main.py (5 hunks)
  • tests/test_manifest.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/test_main.py
  • docs/source/workflow.md
  • manifest.json
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_manifest.py (1)
src/datamanager/__main__.py (1)
  • _validate_temoa_hash (45-58)
🪛 LanguageTool
docs/source/usage.md

[grammar] ~60-~60: There might be a mistake here.
Context: ... Name**: The logical name of the dataset - Latest Version: The most recent versio...

(QB_NEW_EN)


[grammar] ~61-~61: There might be a mistake here.
Context: ...t Version**: The most recent version tag - Last Updated: When the latest version ...

(QB_NEW_EN)


[grammar] ~62-~62: There might be a mistake here.
Context: ...d (relative time and absolute timestamp) - SHA256: First 12 characters of the fil...

(QB_NEW_EN)


[grammar] ~63-~63: There might be a mistake here.
Context: ...**: First 12 characters of the file hash - Temoa Hash: First 12 characters of the...

(QB_NEW_EN)

🪛 Ruff (0.14.0)
src/datamanager/__main__.py

48-48: Docstring contains ambiguous (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF002)

🔇 Additional comments (3)
src/datamanager/__main__.py (2)

268-309: LGTM! Well-structured interactive prompt with retry logic.

The temoa hash prompt implementation is thorough:

  • Properly skips in non-interactive mode
  • Validates input with clear error messages
  • Allows optional entry (press Enter to skip)
  • Provides helpful retry mechanism for invalid input

The user experience is excellent and the logic is sound.


366-366: LGTM! Consistent propagation of temoaRepoHash field.

The temoaRepoHash field is correctly added to:

  • Update entries (line 366)
  • New dataset entries (line 387)
  • Rollback entries (line 500), preserving the original hash from the target version

The implementation maintains data integrity and ensures proper provenance tracking across all operations.

Also applies to: 387-387, 500-500

tests/test_manifest.py (1)

54-75: LGTM! Comprehensive test coverage for temoa hash validation.

The test function thoroughly covers:

  • Valid short hashes (4+ chars) and long hashes (40 chars)
  • Case-insensitivity (uppercase and lowercase hex)
  • Invalid characters (non-hex)
  • Empty and whitespace-only input (correctly allowed as optional)
  • Boundary cases (too short < 4 chars, too long > 40 chars)

This ensures the validation function behaves correctly for all expected inputs.

@ParticularlyPythonicBS ParticularlyPythonicBS merged commit e046237 into main Oct 15, 2025
4 of 5 checks passed
@ParticularlyPythonicBS ParticularlyPythonicBS deleted the feat/temoa_hash_in_manifest branch October 15, 2025 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants