Feature: Time to First Result (TTFR) Estimator (foundation) #28

0rajnishk · 2026-01-11T20:54:35Z

Relates to: #26

Summary:
Adds a TTFR estimator module that computes a transparent time range for reaching a first useful output from a dataset (access to basic visualization or analysis). This PR provides the foundation: core logic, tests, and a demo, with proposed integration steps.

Scope of changes:

backend/ttfr_estimator.py: Core estimation logic (access, preprocessing, first output)
backend/test_ttfr_estimator.py: Unit tests (20 tests, all passing)
backend/demo_ttfr.py: Demo script to preview estimates
Local docs (gitignored): references/ and instructions/ for planning and integration notes

How it works:

Uses datasource defaults (access type, format, typical modality)
Infers complexity and multimodality from metadata/content
Assesses documentation quality from simple signals
Produces a total range with a three-phase breakdown and explicit assumptions

Testing:

cd backend
pytest test_ttfr_estimator.py -v
python demo_ttfr.py

Integration plan (follow-up PR):

Add TTFR to RetrievedItem in retrieval.py
Compute TTFR per result and include in API responses
Format TTFR in agents.py output
Optional: display in the frontend UI

Rationale:

Addresses user need for dataset feasibility and planning
Transparent and explainable estimates; safe fallbacks
Modular, low risk, and fully test-covered

Checklist:

New module and tests
No breaking changes
No new mandatory configuration
Demo for maintainers

Request for review:

Placement of TTFR in responses and desired level of breakdown
Any datasource-specific overrides you would like to add

Copilot

Pull request overview

This PR introduces a Time to First Result (TTFR) estimator module that calculates transparent time ranges for reaching a first useful output from neuroscience datasets. The estimator considers access requirements, data preprocessing complexity, and first output generation time based on datasource characteristics and metadata analysis.

Changes:

Adds core TTFR estimation logic with configurable datasource defaults and intelligent inference from metadata
Provides comprehensive test coverage with 20 unit tests across modality inference, multimodal detection, documentation quality assessment, and estimation scenarios
Includes a demonstration script showcasing the estimator with example datasets

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 12 comments.

File	Description
pyproject.toml	Adds pytest as a project dependency for testing
backend/ttfr_estimator.py	Core estimation module with enums, data classes, datasource configuration, and estimation logic
backend/test_ttfr_estimator.py	Comprehensive test suite covering helper functions and end-to-end estimation scenarios
backend/demo_ttfr.py	Demo script illustrating TTFR estimation for various dataset types

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-11T21:00:05Z

backend/ttfr_estimator.py

+MODALITY_KEYWORDS = {
+    ModalityComplexity.LOW: ["simulated", "model", "morphology", "ion channel", "database"],
+    ModalityComplexity.MEDIUM: ["microscopy", "image", "gene expression", "single cell", "ephys", "electrophysiology"],
+    ModalityComplexity.HIGH: ["mri", "fmri", "pet", "meg", "eeg", "bold", "neuroimaging"],
+    ModalityComplexity.VERY_HIGH: ["multimodal", "multi-modal", "combined", "integrative"]
+}


The MODALITY_KEYWORDS dictionary on line 170-175 includes overlapping keywords. For example, "multimodal" and "multi-modal" are in VERY_HIGH complexity keywords, but the infer_modality_from_keywords function checks from VERY_HIGH to LOW in order (line 243). This means if a dataset description contains "multimodal imaging", it would be classified as VERY_HIGH, but the function would also match it against keywords in other levels. The current implementation works correctly because of the ordered checking, but this could be documented more clearly.

Copilot · 2026-01-11T21:00:05Z

backend/ttfr_estimator.py

+            min_hours = int(self.min_days * 24)
+            max_hours = int(self.max_days * 24)
+            return f"{min_hours}–{max_hours} hours"
+        elif self.min_days < 1:
+            min_hours = int(self.min_days * 24)
+            max_days = int(self.max_days)
+            return f"{min_hours} hours–{max_days} days"
+        else:
+            return f"{int(self.min_days)}–{int(self.max_days)} days"


The str method has several issues: (1) Using int() truncates instead of rounding, which can be misleading (e.g., 0.9 days becomes 0 days instead of 1 day). (2) There's no handling for singular vs plural forms, which will produce grammatically incorrect output like "1 days" when max_days rounds to 1. (3) The mixed units format on line 39 could be confusing. Consider using round() instead of int() and adding conditional logic for singular/plural forms.

Suggested change

min_hours = int(self.min_days * 24)

max_hours = int(self.max_days * 24)

return f"{min_hours}–{max_hours} hours"

elif self.min_days < 1:

min_hours = int(self.min_days * 24)

max_days = int(self.max_days)

return f"{min_hours} hours–{max_days} days"

else:

return f"{int(self.min_days)}–{int(self.max_days)} days"

min_hours = round(self.min_days * 24)

max_hours = round(self.max_days * 24)

if min_hours == max_hours:

unit = "hour" if min_hours == 1 else "hours"

return f"{min_hours} {unit}"

return f"{min_hours}–{max_hours} hours"

elif self.min_days < 1:

min_hours = round(self.min_days * 24)

max_days = round(self.max_days)

hours_unit = "hour" if min_hours == 1 else "hours"

days_unit = "day" if max_days == 1 else "days"

return f"{min_hours} {hours_unit}–{max_days} {days_unit}"

else:

min_days = round(self.min_days)

max_days = round(self.max_days)

if min_days == max_days:

unit = "day" if min_days == 1 else "days"

return f"{min_days} {unit}"

return f"{min_days}–{max_days} days"

Copilot · 2026-01-11T21:00:05Z

pyproject.toml

    "langgraph>=0.6.4",
    "matplotlib>=3.10.3",
    "pandas>=2.3.1",
+    "pytest>=9.0.2",


pytest should be a development dependency, not a production dependency. The pytest.main() call in test_ttfr_estimator.py is only used when running the test file directly, which is a development activity. Consider moving this to a dev-dependencies or optional-dependencies section in pyproject.toml instead.

Copilot · 2026-01-11T21:00:05Z

backend/ttfr_estimator.py

@@ -0,0 +1,342 @@
+from dataclasses import dataclass
+from typing import Optional, Dict, Any, Tuple


The Tuple import from typing is unused and should be removed to keep imports clean.

Suggested change

from typing import Optional, Dict, Any, Tuple

from typing import Optional, Dict, Any

Copilot · 2026-01-11T21:00:06Z

backend/ttfr_estimator.py

+DATASOURCE_CONFIG = {
+    "scr_005031_openneuro": {
+        "access": AccessType.OPEN,
+        "format": FormatType.BIDS,
+        "typical_modality": ModalityComplexity.HIGH,
+        "doc_quality": "high"
+    },
+    "scr_017571_dandi": {
+        "access": AccessType.OPEN,
+        "format": FormatType.NWB,
+        "typical_modality": ModalityComplexity.HIGH,
+        "doc_quality": "high"
+    },
+    "scr_007271_modeldb_models": {
+        "access": AccessType.OPEN,
+        "format": FormatType.CUSTOM,
+        "typical_modality": ModalityComplexity.MEDIUM,
+        "doc_quality": "medium"
+    },
+    "scr_017612_ebrains": {
+        "access": AccessType.LOGIN,
+        "format": FormatType.CUSTOM,
+        "typical_modality": ModalityComplexity.VERY_HIGH,
+        "doc_quality": "high"
+    },
+    "scr_003510_cil_images": {
+        "access": AccessType.OPEN,
+        "format": FormatType.STANDARD_IMAGE,
+        "typical_modality": ModalityComplexity.MEDIUM,
+        "doc_quality": "medium"
+    },
+    "scr_002145_neuromorpho_modelimage": {
+        "access": AccessType.OPEN,
+        "format": FormatType.CUSTOM,
+        "typical_modality": ModalityComplexity.LOW,
+        "doc_quality": "high"
+    },
+    "scr_017041_sparc": {
+        "access": AccessType.OPEN,
+        "format": FormatType.CUSTOM,
+        "typical_modality": ModalityComplexity.HIGH,
+        "doc_quality": "high"
+    },
+    "scr_002978_aba_expression": {
+        "access": AccessType.OPEN,
+        "format": FormatType.CUSTOM,
+        "typical_modality": ModalityComplexity.MEDIUM,
+        "doc_quality": "high"
+    },
+    "scr_005069_brainminds": {
+        "access": AccessType.APPROVAL,
+        "format": FormatType.CUSTOM,
+        "typical_modality": ModalityComplexity.VERY_HIGH,
+        "doc_quality": "medium"
+    },
+    "scr_002721_gensat_geneexpression": {
+        "access": AccessType.OPEN,
+        "format": FormatType.STANDARD_IMAGE,
+        "typical_modality": ModalityComplexity.MEDIUM,
+        "doc_quality": "medium"
+    },
+    "scr_003105_neurondb_currents": {
+        "access": AccessType.OPEN,
+        "format": FormatType.CUSTOM,
+        "typical_modality": ModalityComplexity.LOW,
+        "doc_quality": "medium"
+    },
+    "scr_006131_hba_atlas": {
+        "access": AccessType.OPEN,
+        "format": FormatType.CUSTOM,
+        "typical_modality": ModalityComplexity.MEDIUM,
+        "doc_quality": "high"
+    },
+    "scr_014194_icg_ionchannels": {
+        "access": AccessType.OPEN,
+        "format": FormatType.CUSTOM,
+        "typical_modality": ModalityComplexity.LOW,
+        "doc_quality": "medium"
+    },
+    "scr_013705_neuroml_models": {
+        "access": AccessType.OPEN,
+        "format": FormatType.CUSTOM,
+        "typical_modality": ModalityComplexity.MEDIUM,
+        "doc_quality": "high"
+    },
+    "scr_014306_bbp_cellmorphology": {
+        "access": AccessType.LOGIN,
+        "format": FormatType.CUSTOM,
+        "typical_modality": ModalityComplexity.MEDIUM,
+        "doc_quality": "high"
+    },
+    "scr_016433_conp": {
+        "access": AccessType.OPEN,
+        "format": FormatType.CUSTOM,
+        "typical_modality": ModalityComplexity.HIGH,
+        "doc_quality": "medium"
+    },
+    "scr_006274_neuroelectro_ephys": {
+        "access": AccessType.OPEN,
+        "format": FormatType.CUSTOM,
+        "typical_modality": ModalityComplexity.MEDIUM,
+        "doc_quality": "high"
+    }
+}


The DATASOURCE_CONFIG dictionary is comprehensive but could benefit from documentation. Consider adding a docstring or comment explaining the structure and purpose of this configuration, especially for maintainers who might need to add new datasources or modify existing ones.

Copilot · 2026-01-11T21:00:06Z

backend/ttfr_estimator.py

+def detect_multimodal(text: str) -> bool:
+    if not text:
+        return False
+
+    text_lower = text.lower()
+    multimodal_keywords = ["multimodal", "multi-modal", "combined", "integrative", " and "]
+
+    return any(kw in text_lower for kw in multimodal_keywords)


The function has inconsistent handling of None input. Line 252 checks "if not text" which would return True for both None and empty string, but the function signature accepts str type without Optional. If None is a valid input (as suggested by the test on line 46), the type hint should be Optional[str]. Additionally, the test on line 46 passes None which would fail the type check.

Copilot · 2026-01-11T21:00:07Z

backend/test_ttfr_estimator.py

+
+if __name__ == "__main__":
+    pytest.main([__file__, "-v"])


Using pytest.main() in the main block is unnecessary. Tests should be run using the pytest command directly from the command line. This pattern adds pytest as a runtime import requirement when running the file directly, which is unconventional. Consider removing this block entirely.

Suggested change

if __name__ == "__main__":

pytest.main([__file__, "-v"])

Copilot · 2026-01-11T21:00:07Z

backend/ttfr_estimator.py

+    datasource_id: Optional[str] = None,
+    metadata: Optional[Dict[str, Any]] = None,
+    content: Optional[str] = None
+) -> TTFREstimate:


The function lacks documentation. Adding a docstring would significantly improve maintainability by explaining the parameters, return value, and the estimation logic. This is especially important for a public API function that will be integrated into other parts of the system.

Suggested change

) -> TTFREstimate:

) -> TTFREstimate:

"""

Estimate the time to first result (TTFR) for working with a datasource.

Parameters

----------

datasource_id:

Optional identifier for a known datasource. When provided and present in

``DATASOURCE_CONFIG``, datasource-specific defaults (access type, format,

typical modality, and documentation quality) are used for the estimate.

metadata:

Optional dictionary of datasource metadata (for example, title, description,

authors, license, documentation links). Certain fields are used to assess

documentation quality and to refine modality inference.

content:

Optional free-form content or description associated with the datasource.

This text is used to infer task complexity (modality) and whether the

scenario is multimodal.

Returns

-------

TTFREstimate

A structured estimate of time to first result, including time ranges and

any assumptions made during estimation.

Notes

-----

If ``datasource_id`` is not provided or not found in ``DATASOURCE_CONFIG``,

the function falls back to general heuristics based on the supplied

``metadata`` and ``content``. Additional assumptions are recorded to explain

how the estimate was derived.

"""

Copilot · 2026-01-11T21:00:07Z

backend/demo_ttfr.py

+        print(f" - {assumption}")
+
+    print(f"\nJSON format:")
+    import json


The json module is imported inside the loop on line 55, which means it will be imported multiple times unnecessarily. Move the import statement to the top of the file with other imports for better performance and following Python best practices.

Copilot · 2026-01-11T21:00:07Z

backend/test_ttfr_estimator.py

+from ttfr_estimator import (
+    estimate_ttfr,
+    AccessType,
+    ModalityComplexity,
+    FormatType,
+    infer_modality_from_keywords,
+    detect_multimodal,
+    assess_documentation_quality
+)


Import of 'AccessType' is not used.
Import of 'FormatType' is not used.

0rajnishk added 2 commits January 12, 2026 01:55

Add TTFR estimator module, tests, and demo

45225ff

Add pytest>=9.0.2 to dependencies

d56852f

Copilot AI review requested due to automatic review settings January 11, 2026 20:54

Copilot started reviewing on behalf of 0rajnishk January 11, 2026 20:54 View session

Copilot AI reviewed Jan 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Time to First Result (TTFR) Estimator (foundation) #28

Feature: Time to First Result (TTFR) Estimator (foundation) #28

0rajnishk commented Jan 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 11, 2026

Uh oh!

Copilot AI Jan 11, 2026

Uh oh!

Copilot AI Jan 11, 2026

Uh oh!

Copilot AI Jan 11, 2026

Uh oh!

Copilot AI Jan 11, 2026

Uh oh!

Copilot AI Jan 11, 2026

Uh oh!

Copilot AI Jan 11, 2026

Uh oh!

Copilot AI Jan 11, 2026

Uh oh!

Copilot AI Jan 11, 2026

Uh oh!

Copilot AI Jan 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-            min_hours = int(self.min_days * 24)
-            max_hours = int(self.max_days * 24)
-            return f"{min_hours}–{max_hours} hours"
-        elif self.min_days < 1:
-            min_hours = int(self.min_days * 24)
-            max_days = int(self.max_days)
-            return f"{min_hours} hours–{max_days} days"
-        else:
-            return f"{int(self.min_days)}–{int(self.max_days)} days"
+            min_hours = round(self.min_days * 24)
+            max_hours = round(self.max_days * 24)
+            if min_hours == max_hours:
+                unit = "hour" if min_hours == 1 else "hours"
+                return f"{min_hours} {unit}"
+            return f"{min_hours}–{max_hours} hours"
+        elif self.min_days < 1:
+            min_hours = round(self.min_days * 24)
+            max_days = round(self.max_days)
+            hours_unit = "hour" if min_hours == 1 else "hours"
+            days_unit = "day" if max_days == 1 else "days"
+            return f"{min_hours} {hours_unit}–{max_days} {days_unit}"
+        else:
+            min_days = round(self.min_days)
+            max_days = round(self.max_days)
+            if min_days == max_days:
+                unit = "day" if min_days == 1 else "days"
+                return f"{min_days} {unit}"
+            return f"{min_days}–{max_days} days"

		@@ -0,0 +1,342 @@
		from dataclasses import dataclass
		from typing import Optional, Dict, Any, Tuple

	from typing import Optional, Dict, Any, Tuple
	from typing import Optional, Dict, Any

-) -> TTFREstimate:
+) -> TTFREstimate:
+    """
+    Estimate the time to first result (TTFR) for working with a datasource.
+    Parameters
+    ----------
+    datasource_id:
+        Optional identifier for a known datasource. When provided and present in
+        ``DATASOURCE_CONFIG``, datasource-specific defaults (access type, format,
+        typical modality, and documentation quality) are used for the estimate.
+    metadata:
+        Optional dictionary of datasource metadata (for example, title, description,
+        authors, license, documentation links). Certain fields are used to assess
+        documentation quality and to refine modality inference.
+    content:
+        Optional free-form content or description associated with the datasource.
+        This text is used to infer task complexity (modality) and whether the
+        scenario is multimodal.
+    Returns
+    -------
+    TTFREstimate
+        A structured estimate of time to first result, including time ranges and
+        any assumptions made during estimation.
+    Notes
+    -----
+    If ``datasource_id`` is not provided or not found in ``DATASOURCE_CONFIG``,
+    the function falls back to general heuristics based on the supplied
+    ``metadata`` and ``content``. Additional assumptions are recorded to explain
+    how the estimate was derived.
+    """

Feature: Time to First Result (TTFR) Estimator (foundation) #28

Are you sure you want to change the base?

Feature: Time to First Result (TTFR) Estimator (foundation) #28

Conversation

0rajnishk commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

0rajnishk commented Jan 11, 2026 •

edited

Loading