GH1170: improve error when haplotypes_frequencies_advanced has no coh… by karthik642006 · Pull Request #1184 · malariagen/malariagen-data-python

karthik642006 · 2026-03-21T16:03:31Z

Fixes #1170

Problem

haplotype_frequencies_advanced does not guard against the case where
cohort grouping produces zero cohorts — which can happen when:

min_cohort_size is set too high
sample_query is too restrictive
area_by / period_by / taxon_by combination is too narrow

In these cases, the function continues execution and fails later with
cryptic downstream errors, making it hard for users to diagnose the
root cause.

Fix

Added an explicit guard immediately after cohort construction:

if df_cohorts.empty:
    raise ValueError(
        "No cohorts found after applying filters. Try lowering "
        "min_cohort_size, adjusting area_by/period_by/taxon_by, "
        "or relaxing sample_query."
    )

This ensures the function fails fast with a clear, actionable message
close to the source of the problem.

Tests Added

Added test_hap_frequencies_advanced_no_cohorts covering two scenarios:

@parametrize_with_cases("fixture,api", cases=".")
def test_hap_frequencies_advanced_no_cohorts(fixture, api: AnophelesHapFrequencyAnalysis):
    all_sample_sets = api.sample_sets()["sample_set"].to_list()
    sample_sets = all_sample_sets[0]
    region = fixture.random_region_str()

    # Case 1: min_cohort_size too high
    with pytest.raises(ValueError, match="No cohorts found"):
        api.haplotypes_frequencies_advanced(
            region=region,
            area_by="admin1_iso",
            period_by="year",
            min_cohort_size=10_000,
            sample_sets=sample_sets,
        )

    # Case 2: restrictive sample_query
    with pytest.raises(ValueError, match="No cohorts found"):
        api.haplotypes_frequencies_advanced(
            region=region,
            area_by="admin1_iso",
            period_by="year",
            min_cohort_size=1,
            sample_sets=sample_sets,
            sample_query="country == 'NonExistentCountry'",
        )

Testing Done

Unit tests pass: poetry run pytest -v tests --ignore tests/integration ✅
Pre-commit hooks pass: pre-commit run --all-files ✅
Manually verified ValueError is raised when filters produce zero cohorts ✅

Breaking Changes

None — purely additive change. Existing behaviour is unchanged when
cohorts are non-empty.

…orts

Copilot

Pull request overview

This PR aims to improve user-facing error messaging in haplotypes_frequencies_advanced() when cohort construction yields zero cohorts, and adds a unit test to validate the behavior.

Changes:

Add an explicit “no cohorts found” guard in haplotypes_frequencies_advanced().
Add a unit test expecting a ValueError when min_cohort_size is too high.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`malariagen_data/anoph/hap_frq.py`	Adds a new guard intended to raise a clearer error when `df_cohorts` is empty.
`tests/anoph/test_hap_frq.py`	Adds a test asserting a `ValueError` is raised when no cohorts are produced.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-21T16:06:22Z

malariagen_data/anoph/hap_frq.py

+        if df_cohorts.empty:
+            raise ValueError(
+                "No cohorts found after applying filters. "
+                "Try lowering min_cohort_size, adjusting area_by/period_by/taxon_by, "
+                "or relaxing sample_query."
+            )
+


df_cohorts.empty check here is redundant/unreachable because _build_cohorts_from_sample_grouping() already raises ValueError when the cohort dataframe is empty after applying min_cohort_size (see malariagen_data/anoph/frq_base.py). As written, this custom error message will likely never be seen; callers will still get the helper's existing message. If the goal is to improve the message for this API, consider catching the helper's ValueError and re-raising with the more actionable text (or update the helper message in a shared place) and remove this dead check.

Suggested change

if df_cohorts.empty:

raise ValueError(

"No cohorts found after applying filters. "

"Try lowering min_cohort_size, adjusting area_by/period_by/taxon_by, "

"or relaxing sample_query."

)

Copilot · 2026-03-21T16:06:23Z

tests/anoph/test_hap_frq.py

+@parametrize_with_cases("fixture,api", cases=".")
+def test_hap_frequencies_advanced_no_cohorts(fixture, api: AnophelesHapFrequencyAnalysis):
+    all_sample_sets = api.sample_sets()["sample_set"].to_list()
+    sample_sets = random.choice(all_sample_sets)
+    region = fixture.random_region_str()
+
+    params_advanced = dict(
+        region=region,
+        area_by="admin1_iso",
+        period_by="year",
+        min_cohort_size=10_000,
+        sample_sets=sample_sets,
+    )
+
+    with pytest.raises(ValueError, match="No cohorts found"):
+        api.haplotypes_frequencies_advanced(**params_advanced)


This new test expects an exception message matching "No cohorts found", but the cohort-building helper currently raises "No cohorts available for the given sample selection parameters and minimum cohort size." when zero cohorts are produced. Unless the production code is updated to re-raise with the new wording, this assertion will fail. Also, the PR description mentions covering a restrictive sample_query scenario, but this test currently only covers the high min_cohort_size case.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-22T10:40:57Z

tests/anoph/test_hap_frq.py

+
+
+@parametrize_with_cases("fixture,api", cases=".")
+def test_hap_frequencies_advanced_no_cohorts(fixture, api: AnophelesHapFrequencyAnalysis):


This function definition line exceeds the repository’s configured Ruff line length (88) and may fail ruff-format in pre-commit/CI. Please run the formatter or wrap the signature similarly to nearby tests in this file.

Suggested change

def test_hap_frequencies_advanced_no_cohorts(fixture, api: AnophelesHapFrequencyAnalysis):

def test_hap_frequencies_advanced_no_cohorts(

fixture, api: AnophelesHapFrequencyAnalysis

):

Copilot · 2026-03-22T10:40:57Z

tests/anoph/test_hap_frq.py

+@parametrize_with_cases("fixture,api", cases=".")
+def test_hap_frequencies_advanced_no_cohorts(fixture, api: AnophelesHapFrequencyAnalysis):
+    all_sample_sets = api.sample_sets()["sample_set"].to_list()
+    sample_sets = random.choice(all_sample_sets)
+    region = fixture.random_region_str()
+
+    params_advanced = dict(
+        region=region,
+        area_by="admin1_iso",
+        period_by="year",
+        min_cohort_size=10_000,
+        sample_sets=sample_sets,
+    )
+
+    with pytest.raises(ValueError, match="No cohorts found"):
+        api.haplotypes_frequencies_advanced(**params_advanced)


The PR description says this test covers two scenarios (high min_cohort_size and an overly restrictive sample_query), but the added test currently only exercises the min_cohort_size path. Either add the sample_query case to the test (as described) or update the PR description to match what’s implemented.

jonbrenas · 2026-03-22T11:37:51Z

Hi @karthik642006, it sounds like Copilot makes some good points here. Is what it says incorrect?

GH1170: improve error when haplotypes_frequencies_advanced has no coh…

eea5d96

…orts

Copilot AI review requested due to automatic review settings March 21, 2026 16:03

Copilot started reviewing on behalf of karthik642006 March 21, 2026 16:03 View session

Copilot AI reviewed Mar 21, 2026

View reviewed changes

Merge branch 'master' into GH1170-improve-hap-frq-empty-cohorts

56daa9b

karthik642006 requested a review from Copilot March 22, 2026 10:37

Copilot started reviewing on behalf of karthik642006 March 22, 2026 10:38 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH1170: improve error when haplotypes_frequencies_advanced has no coh…#1184

GH1170: improve error when haplotypes_frequencies_advanced has no coh…#1184
karthik642006 wants to merge 2 commits intomalariagen:masterfrom
karthik642006:GH1170-improve-hap-frq-empty-cohorts

karthik642006 commented Mar 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

jonbrenas commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants



		@parametrize_with_cases("fixture,api", cases=".")
		def test_hap_frequencies_advanced_no_cohorts(fixture, api: AnophelesHapFrequencyAnalysis):

Conversation

karthik642006 commented Mar 21, 2026

Fixes #1170

Problem

Fix

Tests Added

Testing Done

Breaking Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

jonbrenas commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants