-
Notifications
You must be signed in to change notification settings - Fork 22
Feature: Time to First Result (TTFR) Estimator (foundation) #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a Time to First Result (TTFR) estimator module that calculates transparent time ranges for reaching a first useful output from neuroscience datasets. The estimator considers access requirements, data preprocessing complexity, and first output generation time based on datasource characteristics and metadata analysis.
Changes:
- Adds core TTFR estimation logic with configurable datasource defaults and intelligent inference from metadata
- Provides comprehensive test coverage with 20 unit tests across modality inference, multimodal detection, documentation quality assessment, and estimation scenarios
- Includes a demonstration script showcasing the estimator with example datasets
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 12 comments.
| File | Description |
|---|---|
| pyproject.toml | Adds pytest as a project dependency for testing |
| backend/ttfr_estimator.py | Core estimation module with enums, data classes, datasource configuration, and estimation logic |
| backend/test_ttfr_estimator.py | Comprehensive test suite covering helper functions and end-to-end estimation scenarios |
| backend/demo_ttfr.py | Demo script illustrating TTFR estimation for various dataset types |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| MODALITY_KEYWORDS = { | ||
| ModalityComplexity.LOW: ["simulated", "model", "morphology", "ion channel", "database"], | ||
| ModalityComplexity.MEDIUM: ["microscopy", "image", "gene expression", "single cell", "ephys", "electrophysiology"], | ||
| ModalityComplexity.HIGH: ["mri", "fmri", "pet", "meg", "eeg", "bold", "neuroimaging"], | ||
| ModalityComplexity.VERY_HIGH: ["multimodal", "multi-modal", "combined", "integrative"] | ||
| } |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MODALITY_KEYWORDS dictionary on line 170-175 includes overlapping keywords. For example, "multimodal" and "multi-modal" are in VERY_HIGH complexity keywords, but the infer_modality_from_keywords function checks from VERY_HIGH to LOW in order (line 243). This means if a dataset description contains "multimodal imaging", it would be classified as VERY_HIGH, but the function would also match it against keywords in other levels. The current implementation works correctly because of the ordered checking, but this could be documented more clearly.
| min_hours = int(self.min_days * 24) | ||
| max_hours = int(self.max_days * 24) | ||
| return f"{min_hours}–{max_hours} hours" | ||
| elif self.min_days < 1: | ||
| min_hours = int(self.min_days * 24) | ||
| max_days = int(self.max_days) | ||
| return f"{min_hours} hours–{max_days} days" | ||
| else: | ||
| return f"{int(self.min_days)}–{int(self.max_days)} days" |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The str method has several issues: (1) Using int() truncates instead of rounding, which can be misleading (e.g., 0.9 days becomes 0 days instead of 1 day). (2) There's no handling for singular vs plural forms, which will produce grammatically incorrect output like "1 days" when max_days rounds to 1. (3) The mixed units format on line 39 could be confusing. Consider using round() instead of int() and adding conditional logic for singular/plural forms.
| min_hours = int(self.min_days * 24) | |
| max_hours = int(self.max_days * 24) | |
| return f"{min_hours}–{max_hours} hours" | |
| elif self.min_days < 1: | |
| min_hours = int(self.min_days * 24) | |
| max_days = int(self.max_days) | |
| return f"{min_hours} hours–{max_days} days" | |
| else: | |
| return f"{int(self.min_days)}–{int(self.max_days)} days" | |
| min_hours = round(self.min_days * 24) | |
| max_hours = round(self.max_days * 24) | |
| if min_hours == max_hours: | |
| unit = "hour" if min_hours == 1 else "hours" | |
| return f"{min_hours} {unit}" | |
| return f"{min_hours}–{max_hours} hours" | |
| elif self.min_days < 1: | |
| min_hours = round(self.min_days * 24) | |
| max_days = round(self.max_days) | |
| hours_unit = "hour" if min_hours == 1 else "hours" | |
| days_unit = "day" if max_days == 1 else "days" | |
| return f"{min_hours} {hours_unit}–{max_days} {days_unit}" | |
| else: | |
| min_days = round(self.min_days) | |
| max_days = round(self.max_days) | |
| if min_days == max_days: | |
| unit = "day" if min_days == 1 else "days" | |
| return f"{min_days} {unit}" | |
| return f"{min_days}–{max_days} days" |
| "langgraph>=0.6.4", | ||
| "matplotlib>=3.10.3", | ||
| "pandas>=2.3.1", | ||
| "pytest>=9.0.2", |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pytest should be a development dependency, not a production dependency. The pytest.main() call in test_ttfr_estimator.py is only used when running the test file directly, which is a development activity. Consider moving this to a dev-dependencies or optional-dependencies section in pyproject.toml instead.
| @@ -0,0 +1,342 @@ | |||
| from dataclasses import dataclass | |||
| from typing import Optional, Dict, Any, Tuple | |||
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Tuple import from typing is unused and should be removed to keep imports clean.
| from typing import Optional, Dict, Any, Tuple | |
| from typing import Optional, Dict, Any |
| DATASOURCE_CONFIG = { | ||
| "scr_005031_openneuro": { | ||
| "access": AccessType.OPEN, | ||
| "format": FormatType.BIDS, | ||
| "typical_modality": ModalityComplexity.HIGH, | ||
| "doc_quality": "high" | ||
| }, | ||
| "scr_017571_dandi": { | ||
| "access": AccessType.OPEN, | ||
| "format": FormatType.NWB, | ||
| "typical_modality": ModalityComplexity.HIGH, | ||
| "doc_quality": "high" | ||
| }, | ||
| "scr_007271_modeldb_models": { | ||
| "access": AccessType.OPEN, | ||
| "format": FormatType.CUSTOM, | ||
| "typical_modality": ModalityComplexity.MEDIUM, | ||
| "doc_quality": "medium" | ||
| }, | ||
| "scr_017612_ebrains": { | ||
| "access": AccessType.LOGIN, | ||
| "format": FormatType.CUSTOM, | ||
| "typical_modality": ModalityComplexity.VERY_HIGH, | ||
| "doc_quality": "high" | ||
| }, | ||
| "scr_003510_cil_images": { | ||
| "access": AccessType.OPEN, | ||
| "format": FormatType.STANDARD_IMAGE, | ||
| "typical_modality": ModalityComplexity.MEDIUM, | ||
| "doc_quality": "medium" | ||
| }, | ||
| "scr_002145_neuromorpho_modelimage": { | ||
| "access": AccessType.OPEN, | ||
| "format": FormatType.CUSTOM, | ||
| "typical_modality": ModalityComplexity.LOW, | ||
| "doc_quality": "high" | ||
| }, | ||
| "scr_017041_sparc": { | ||
| "access": AccessType.OPEN, | ||
| "format": FormatType.CUSTOM, | ||
| "typical_modality": ModalityComplexity.HIGH, | ||
| "doc_quality": "high" | ||
| }, | ||
| "scr_002978_aba_expression": { | ||
| "access": AccessType.OPEN, | ||
| "format": FormatType.CUSTOM, | ||
| "typical_modality": ModalityComplexity.MEDIUM, | ||
| "doc_quality": "high" | ||
| }, | ||
| "scr_005069_brainminds": { | ||
| "access": AccessType.APPROVAL, | ||
| "format": FormatType.CUSTOM, | ||
| "typical_modality": ModalityComplexity.VERY_HIGH, | ||
| "doc_quality": "medium" | ||
| }, | ||
| "scr_002721_gensat_geneexpression": { | ||
| "access": AccessType.OPEN, | ||
| "format": FormatType.STANDARD_IMAGE, | ||
| "typical_modality": ModalityComplexity.MEDIUM, | ||
| "doc_quality": "medium" | ||
| }, | ||
| "scr_003105_neurondb_currents": { | ||
| "access": AccessType.OPEN, | ||
| "format": FormatType.CUSTOM, | ||
| "typical_modality": ModalityComplexity.LOW, | ||
| "doc_quality": "medium" | ||
| }, | ||
| "scr_006131_hba_atlas": { | ||
| "access": AccessType.OPEN, | ||
| "format": FormatType.CUSTOM, | ||
| "typical_modality": ModalityComplexity.MEDIUM, | ||
| "doc_quality": "high" | ||
| }, | ||
| "scr_014194_icg_ionchannels": { | ||
| "access": AccessType.OPEN, | ||
| "format": FormatType.CUSTOM, | ||
| "typical_modality": ModalityComplexity.LOW, | ||
| "doc_quality": "medium" | ||
| }, | ||
| "scr_013705_neuroml_models": { | ||
| "access": AccessType.OPEN, | ||
| "format": FormatType.CUSTOM, | ||
| "typical_modality": ModalityComplexity.MEDIUM, | ||
| "doc_quality": "high" | ||
| }, | ||
| "scr_014306_bbp_cellmorphology": { | ||
| "access": AccessType.LOGIN, | ||
| "format": FormatType.CUSTOM, | ||
| "typical_modality": ModalityComplexity.MEDIUM, | ||
| "doc_quality": "high" | ||
| }, | ||
| "scr_016433_conp": { | ||
| "access": AccessType.OPEN, | ||
| "format": FormatType.CUSTOM, | ||
| "typical_modality": ModalityComplexity.HIGH, | ||
| "doc_quality": "medium" | ||
| }, | ||
| "scr_006274_neuroelectro_ephys": { | ||
| "access": AccessType.OPEN, | ||
| "format": FormatType.CUSTOM, | ||
| "typical_modality": ModalityComplexity.MEDIUM, | ||
| "doc_quality": "high" | ||
| } | ||
| } |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DATASOURCE_CONFIG dictionary is comprehensive but could benefit from documentation. Consider adding a docstring or comment explaining the structure and purpose of this configuration, especially for maintainers who might need to add new datasources or modify existing ones.
| def detect_multimodal(text: str) -> bool: | ||
| if not text: | ||
| return False | ||
|
|
||
| text_lower = text.lower() | ||
| multimodal_keywords = ["multimodal", "multi-modal", "combined", "integrative", " and "] | ||
|
|
||
| return any(kw in text_lower for kw in multimodal_keywords) |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function has inconsistent handling of None input. Line 252 checks "if not text" which would return True for both None and empty string, but the function signature accepts str type without Optional. If None is a valid input (as suggested by the test on line 46), the type hint should be Optional[str]. Additionally, the test on line 46 passes None which would fail the type check.
|
|
||
| if __name__ == "__main__": | ||
| pytest.main([__file__, "-v"]) |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using pytest.main() in the main block is unnecessary. Tests should be run using the pytest command directly from the command line. This pattern adds pytest as a runtime import requirement when running the file directly, which is unconventional. Consider removing this block entirely.
| if __name__ == "__main__": | |
| pytest.main([__file__, "-v"]) |
| datasource_id: Optional[str] = None, | ||
| metadata: Optional[Dict[str, Any]] = None, | ||
| content: Optional[str] = None | ||
| ) -> TTFREstimate: |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function lacks documentation. Adding a docstring would significantly improve maintainability by explaining the parameters, return value, and the estimation logic. This is especially important for a public API function that will be integrated into other parts of the system.
| ) -> TTFREstimate: | |
| ) -> TTFREstimate: | |
| """ | |
| Estimate the time to first result (TTFR) for working with a datasource. | |
| Parameters | |
| ---------- | |
| datasource_id: | |
| Optional identifier for a known datasource. When provided and present in | |
| ``DATASOURCE_CONFIG``, datasource-specific defaults (access type, format, | |
| typical modality, and documentation quality) are used for the estimate. | |
| metadata: | |
| Optional dictionary of datasource metadata (for example, title, description, | |
| authors, license, documentation links). Certain fields are used to assess | |
| documentation quality and to refine modality inference. | |
| content: | |
| Optional free-form content or description associated with the datasource. | |
| This text is used to infer task complexity (modality) and whether the | |
| scenario is multimodal. | |
| Returns | |
| ------- | |
| TTFREstimate | |
| A structured estimate of time to first result, including time ranges and | |
| any assumptions made during estimation. | |
| Notes | |
| ----- | |
| If ``datasource_id`` is not provided or not found in ``DATASOURCE_CONFIG``, | |
| the function falls back to general heuristics based on the supplied | |
| ``metadata`` and ``content``. Additional assumptions are recorded to explain | |
| how the estimate was derived. | |
| """ |
| print(f" - {assumption}") | ||
|
|
||
| print(f"\nJSON format:") | ||
| import json |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The json module is imported inside the loop on line 55, which means it will be imported multiple times unnecessarily. Move the import statement to the top of the file with other imports for better performance and following Python best practices.
| from ttfr_estimator import ( | ||
| estimate_ttfr, | ||
| AccessType, | ||
| ModalityComplexity, | ||
| FormatType, | ||
| infer_modality_from_keywords, | ||
| detect_multimodal, | ||
| assess_documentation_quality | ||
| ) |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'AccessType' is not used.
Import of 'FormatType' is not used.
Relates to: #26
Summary:
Adds a TTFR estimator module that computes a transparent time range for reaching a first useful output from a dataset (access to basic visualization or analysis). This PR provides the foundation: core logic, tests, and a demo, with proposed integration steps.
Scope of changes:
How it works:
Testing:
cd backend pytest test_ttfr_estimator.py -v python demo_ttfr.pyIntegration plan (follow-up PR):
RetrievedItemin retrieval.pyRationale:
Checklist:
Request for review: