-
Notifications
You must be signed in to change notification settings - Fork 0
feat: Add optional temoa repository hash tracking to dataset versions #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds an optional Temoa Repository Hash prompt to interactive prepare, validates and stores it in manifest history entries, surfaces a "Temoa Hash" column and temoa info in CLI messages and version selections, updates docs to document the prompt and output fields, and adds tests for validation and prompt skipping. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant CLI as datamanager CLI
participant Validator as _validate_temoa_hash
participant Manifest as manifest.json
User->>CLI: prepare (interactive)
CLI->>User: Prompt: "Temoa Repository Hash (optional)"
User-->>CLI: Enter hex hash or leave empty
CLI->>Validator: validate(input)
Validator-->>CLI: valid / invalid
alt valid or empty
CLI->>Manifest: Write history entry { ..., temoaRepoHash }
Manifest-->>CLI: persisted
CLI-->>User: Show prepare summary (includes temoa hash if present)
else invalid
CLI-->>User: Show validation error and re-prompt
end
sequenceDiagram
autonumber
actor User
participant CLI as datamanager CLI
participant Manifest as manifest.json
rect rgba(200,230,255,0.25)
note right of CLI: list-datasets (read)
CLI->>Manifest: Read datasets
Manifest-->>CLI: Entries (include temoaRepoHash)
CLI-->>User: Display table with "Temoa Hash" column (shortened or N/A)
end
rect rgba(220,255,220,0.25)
note right of CLI: pull / version selection
User->>CLI: pull dataset
CLI->>Manifest: Read history entries
Manifest-->>CLI: Versions (include temoaRepoHash)
CLI-->>User: Show version choices including "temoa: <hash>" when present
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (6)
🚧 Files skipped from review as they are similar to previous changes (4)
🧰 Additional context used🪛 LanguageTooldocs/source/usage.md[grammar] ~60-~60: There might be a mistake here. (QB_NEW_EN) [grammar] ~61-~61: There might be a mistake here. (QB_NEW_EN) [grammar] ~62-~62: There might be a mistake here. (QB_NEW_EN) [grammar] ~63-~63: There might be a mistake here. (QB_NEW_EN) 🔇 Additional comments (10)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (8)
docs/source/workflow.md (1)
24-24: Clarify accepted hash format for consistency with validator.Add “4–40 hexadecimal characters” to align with _validate_temoa_hash behavior.
tests/test_manifest.py (2)
8-8: Avoid testing private (underscore) symbols from CLI module.Consider moving the validator to a small utilities/validators module and importing from there to avoid tests depending on a private function in main.
54-75: Good coverage; add boundary-positive cases.Add exact-boundary positives to match 4–40 hex rule, e.g. “abcd” (4) and “abcdefg” (7).
Apply this diff:
def test_validate_temoa_hash() -> None: @@ - # Valid short hash + # Valid short hash assert _validate_temoa_hash("abc123") assert _validate_temoa_hash("ABCDEF") + assert _validate_temoa_hash("abcd") # min length 4 + assert _validate_temoa_hash("abcdefg") # 7 chars (common short)tests/test_main.py (1)
24-28: Deduplicate temoa prompt patching with a fixture/helper.Create a pytest fixture (e.g., in conftest.py) to patch questionary.text to return "" and reuse it across tests to reduce repetition.
Example:
# conftest.py import pytest @pytest.fixture def skip_temoa_prompt(mocker): return mocker.patch( "questionary.text", return_value=mocker.Mock(ask=mocker.Mock(return_value="")), )Then pass skip_temoa_prompt into tests that need it.
Also applies to: 54-58, 97-101, 138-142
docs/source/usage.md (1)
36-40: Specify accepted hash format.Add “4–40 hexadecimal characters” to match the validator and avoid confusion.
src/datamanager/__main__.py (3)
225-230: Show “N/A” when temoaRepoHash is None in interactive choices.entry.get(...) returns None if key exists with None, producing “temoa: None”. Normalize to “N/A”.
Apply this diff:
- version_choices = [] - for entry in dataset["history"]: - temoa_info = f", temoa: {entry.get('temoaRepoHash', 'N/A')}" - version_choices.append( - f"{entry['version']} (commit: {entry['commit']}, {_rel(entry['timestamp'])}{temoa_info})" - ) + version_choices = [] + for entry in dataset["history"]: + temoa_val = entry.get("temoaRepoHash") + temoa_display = temoa_val if temoa_val else "N/A" + temoa_info = f", temoa: {temoa_display}" + version_choices.append( + f"{entry['version']} (commit: {entry['commit']}, {_rel(entry['timestamp'])}{temoa_info})" + )
268-309: Prepare: optional temoa prompt/validation – solid; consider a CLI flag.Flow and retry UX are good. For automation, consider a --temoa-hash option to pass the value in non-interactive (—yes) mode.
563-568: Rollback choices: normalize “N/A” when missing/None.Mirror the fix from pull interactive to avoid “temoa: None”.
Apply this diff:
- version_choices = [] - for entry in dataset["history"][1:]: # Start from the second entry - temoa_info = f", temoa: {entry.get('temoaRepoHash', 'N/A')}" - version_choices.append( - f"{entry['version']} (commit: {entry['commit']}, {_rel(entry['timestamp'])}{temoa_info})" - ) + version_choices = [] + for entry in dataset["history"][1:]: # Start from the second entry + temoa_val = entry.get("temoaRepoHash") + temoa_display = temoa_val if temoa_val else "N/A" + temoa_info = f", temoa: {temoa_display}" + version_choices.append( + f"{entry['version']} (commit: {entry['commit']}, {_rel(entry['timestamp'])}{temoa_info})" + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
docs/source/usage.md(2 hunks)docs/source/workflow.md(1 hunks)manifest.json(3 hunks)src/datamanager/__main__.py(10 hunks)tests/test_main.py(5 hunks)tests/test_manifest.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_manifest.py (1)
src/datamanager/__main__.py (1)
_validate_temoa_hash(45-58)
🪛 LanguageTool
docs/source/usage.md
[grammar] ~60-~60: There might be a mistake here.
Context: ... Name**: The logical name of the dataset - Latest Version: The most recent versio...
(QB_NEW_EN)
[grammar] ~61-~61: There might be a mistake here.
Context: ...t Version**: The most recent version tag - Last Updated: When the latest version ...
(QB_NEW_EN)
[grammar] ~62-~62: There might be a mistake here.
Context: ...d (relative time and absolute timestamp) - SHA256: First 12 characters of the fil...
(QB_NEW_EN)
[grammar] ~63-~63: There might be a mistake here.
Context: ...**: First 12 characters of the file hash - Temoa Hash: First 12 characters of the...
(QB_NEW_EN)
🔇 Additional comments (8)
manifest.json (1)
13-15: temoaRepoHash added with null default – LGTM; verify manifest consistency.
- Addition is backward compatible and aligns with code paths.
- Please verify latestVersion ("v1") vs first history version ("v4"); update if this is a real manifest, or confirm if it’s an intentionally inconsistent fixture.
Also applies to: 23-25, 33-35
tests/test_main.py (1)
41-41: Assertion of temoaRepoHash None on create – LGTM.docs/source/usage.md (1)
52-65: Consistency note: N/A vs None in interactive lists.Docs say “N/A” when not specified. Ensure interactive menus in code also render “N/A” (not “None”) when temoaRepoHash is None.
src/datamanager/__main__.py (5)
117-129: List view temoa column – LGTM.Column header and shortening logic are clear and consistent with SHA display.
159-165: Pull message temoa info – LGTM.Conditional inclusion avoids noise when absent.
366-367: Persisting temoaRepoHash on update – LGTM.
387-388: Persisting temoaRepoHash on create – LGTM.
500-501: Rollback preserves original temoaRepoHash – LGTM.
1c514c1 to
03c4ac2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/datamanager/__main__.py (2)
122-128: Consider extracting hash display logic to a helper function.The truncation logic for displaying hashes (first 12 chars + ellipsis if longer) is used in multiple places throughout the file. Extracting this to a helper function would reduce duplication and improve maintainability.
Example helper function:
def _format_hash_display(hash_value: Optional[str], max_length: int = 12) -> str: """Format a hash for display, truncating if longer than max_length.""" if not hash_value: return "N/A" if len(hash_value) > max_length: return f"{hash_value[:max_length]}..." return hash_valueThen simplify the display logic:
- temoa_hash_display = "N/A" - if latest.get("temoaRepoHash"): - temoa_hash_display = ( - f"{latest['temoaRepoHash'][:12]}..." - if len(str(latest["temoaRepoHash"])) > 12 - else str(latest["temoaRepoHash"]) - ) + temoa_hash_display = _format_hash_display(latest.get("temoaRepoHash"))
225-230: Consider extracting version choice string building logic.The code for building version choice strings with temoa hash info is duplicated in
_pull_interactive(lines 225-230) and_rollback_interactive(lines 563-568). Extracting this to a helper function would reduce duplication.Example helper function:
def _format_version_choice(entry: dict[str, Any]) -> str: """Format a version history entry for display in selection menus.""" temoa_info = f", temoa: {entry.get('temoaRepoHash', 'N/A')}" return f"{entry['version']} (commit: {entry['commit']}, {_rel(entry['timestamp'])}{temoa_info})"Then simplify both locations:
version_choices = [] for entry in dataset["history"]: - temoa_info = f", temoa: {entry.get('temoaRepoHash', 'N/A')}" - version_choices.append( - f"{entry['version']} (commit: {entry['commit']}, {_rel(entry['timestamp'])}{temoa_info})" - ) + version_choices.append(_format_version_choice(entry))Also applies to: 563-568
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
docs/source/usage.md(2 hunks)docs/source/workflow.md(1 hunks)manifest.json(3 hunks)src/datamanager/__main__.py(10 hunks)tests/test_main.py(5 hunks)tests/test_manifest.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/test_main.py
- docs/source/workflow.md
- manifest.json
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_manifest.py (1)
src/datamanager/__main__.py (1)
_validate_temoa_hash(45-58)
🪛 LanguageTool
docs/source/usage.md
[grammar] ~60-~60: There might be a mistake here.
Context: ... Name**: The logical name of the dataset - Latest Version: The most recent versio...
(QB_NEW_EN)
[grammar] ~61-~61: There might be a mistake here.
Context: ...t Version**: The most recent version tag - Last Updated: When the latest version ...
(QB_NEW_EN)
[grammar] ~62-~62: There might be a mistake here.
Context: ...d (relative time and absolute timestamp) - SHA256: First 12 characters of the fil...
(QB_NEW_EN)
[grammar] ~63-~63: There might be a mistake here.
Context: ...**: First 12 characters of the file hash - Temoa Hash: First 12 characters of the...
(QB_NEW_EN)
🪛 Ruff (0.14.0)
src/datamanager/__main__.py
48-48: Docstring contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?
(RUF002)
🔇 Additional comments (3)
src/datamanager/__main__.py (2)
268-309: LGTM! Well-structured interactive prompt with retry logic.The temoa hash prompt implementation is thorough:
- Properly skips in non-interactive mode
- Validates input with clear error messages
- Allows optional entry (press Enter to skip)
- Provides helpful retry mechanism for invalid input
The user experience is excellent and the logic is sound.
366-366: LGTM! Consistent propagation of temoaRepoHash field.The
temoaRepoHashfield is correctly added to:
- Update entries (line 366)
- New dataset entries (line 387)
- Rollback entries (line 500), preserving the original hash from the target version
The implementation maintains data integrity and ensures proper provenance tracking across all operations.
Also applies to: 387-387, 500-500
tests/test_manifest.py (1)
54-75: LGTM! Comprehensive test coverage for temoa hash validation.The test function thoroughly covers:
- Valid short hashes (4+ chars) and long hashes (40 chars)
- Case-insensitivity (uppercase and lowercase hex)
- Invalid characters (non-hex)
- Empty and whitespace-only input (correctly allowed as optional)
- Boundary cases (too short < 4 chars, too long > 40 chars)
This ensures the validation function behaves correctly for all expected inputs.
03c4ac2 to
faaa54a
Compare
Adds optional temoa repository hash tracking to dataset versions, enabling users to track which commit of the temoa repository each database version complies with. This improves data provenance and helps maintain consistency between databases and their source repository versions.
Manifest Structure
temoaRepoHashfield to all version entries inmanifest.jsonnullfor backward compatibilityCLI Integration
preparecommand now prompts for temoa repo hash (optional)User Interface
list-datasetsnow displays temoa hash in new "Temoa Hash" columnUsage Examples
Summary by CodeRabbit
New Features
Documentation
Tests