Skip to content

GH1170: improve error when haplotypes_frequencies_advanced has no coh…#1184

Open
karthik642006 wants to merge 2 commits intomalariagen:masterfrom
karthik642006:GH1170-improve-hap-frq-empty-cohorts
Open

GH1170: improve error when haplotypes_frequencies_advanced has no coh…#1184
karthik642006 wants to merge 2 commits intomalariagen:masterfrom
karthik642006:GH1170-improve-hap-frq-empty-cohorts

Conversation

@karthik642006
Copy link

Fixes #1170

Problem

haplotype_frequencies_advanced does not guard against the case where
cohort grouping produces zero cohorts — which can happen when:

  • min_cohort_size is set too high
  • sample_query is too restrictive
  • area_by / period_by / taxon_by combination is too narrow

In these cases, the function continues execution and fails later with
cryptic downstream errors, making it hard for users to diagnose the
root cause.

Fix

Added an explicit guard immediately after cohort construction:

if df_cohorts.empty:
    raise ValueError(
        "No cohorts found after applying filters. Try lowering "
        "min_cohort_size, adjusting area_by/period_by/taxon_by, "
        "or relaxing sample_query."
    )

This ensures the function fails fast with a clear, actionable message
close to the source of the problem.

Tests Added

Added test_hap_frequencies_advanced_no_cohorts covering two scenarios:

@parametrize_with_cases("fixture,api", cases=".")
def test_hap_frequencies_advanced_no_cohorts(fixture, api: AnophelesHapFrequencyAnalysis):
    all_sample_sets = api.sample_sets()["sample_set"].to_list()
    sample_sets = all_sample_sets[0]
    region = fixture.random_region_str()

    # Case 1: min_cohort_size too high
    with pytest.raises(ValueError, match="No cohorts found"):
        api.haplotypes_frequencies_advanced(
            region=region,
            area_by="admin1_iso",
            period_by="year",
            min_cohort_size=10_000,
            sample_sets=sample_sets,
        )

    # Case 2: restrictive sample_query
    with pytest.raises(ValueError, match="No cohorts found"):
        api.haplotypes_frequencies_advanced(
            region=region,
            area_by="admin1_iso",
            period_by="year",
            min_cohort_size=1,
            sample_sets=sample_sets,
            sample_query="country == 'NonExistentCountry'",
        )

Testing Done

  • Unit tests pass: poetry run pytest -v tests --ignore tests/integration
  • Pre-commit hooks pass: pre-commit run --all-files
  • Manually verified ValueError is raised when filters produce zero cohorts ✅

Breaking Changes

None — purely additive change. Existing behaviour is unchanged when
cohorts are non-empty.

Copilot AI review requested due to automatic review settings March 21, 2026 16:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to improve user-facing error messaging in haplotypes_frequencies_advanced() when cohort construction yields zero cohorts, and adds a unit test to validate the behavior.

Changes:

  • Add an explicit “no cohorts found” guard in haplotypes_frequencies_advanced().
  • Add a unit test expecting a ValueError when min_cohort_size is too high.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
malariagen_data/anoph/hap_frq.py Adds a new guard intended to raise a clearer error when df_cohorts is empty.
tests/anoph/test_hap_frq.py Adds a test asserting a ValueError is raised when no cohorts are produced.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +185 to +191
if df_cohorts.empty:
raise ValueError(
"No cohorts found after applying filters. "
"Try lowering min_cohort_size, adjusting area_by/period_by/taxon_by, "
"or relaxing sample_query."
)

Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

df_cohorts.empty check here is redundant/unreachable because _build_cohorts_from_sample_grouping() already raises ValueError when the cohort dataframe is empty after applying min_cohort_size (see malariagen_data/anoph/frq_base.py). As written, this custom error message will likely never be seen; callers will still get the helper's existing message. If the goal is to improve the message for this API, consider catching the helper's ValueError and re-raising with the more actionable text (or update the helper message in a shared place) and remove this dead check.

Suggested change
if df_cohorts.empty:
raise ValueError(
"No cohorts found after applying filters. "
"Try lowering min_cohort_size, adjusting area_by/period_by/taxon_by, "
"or relaxing sample_query."
)

Copilot uses AI. Check for mistakes.
Comment on lines +242 to +257
@parametrize_with_cases("fixture,api", cases=".")
def test_hap_frequencies_advanced_no_cohorts(fixture, api: AnophelesHapFrequencyAnalysis):
all_sample_sets = api.sample_sets()["sample_set"].to_list()
sample_sets = random.choice(all_sample_sets)
region = fixture.random_region_str()

params_advanced = dict(
region=region,
area_by="admin1_iso",
period_by="year",
min_cohort_size=10_000,
sample_sets=sample_sets,
)

with pytest.raises(ValueError, match="No cohorts found"):
api.haplotypes_frequencies_advanced(**params_advanced)
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new test expects an exception message matching "No cohorts found", but the cohort-building helper currently raises "No cohorts available for the given sample selection parameters and minimum cohort size." when zero cohorts are produced. Unless the production code is updated to re-raise with the new wording, this assertion will fail. Also, the PR description mentions covering a restrictive sample_query scenario, but this test currently only covers the high min_cohort_size case.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.



@parametrize_with_cases("fixture,api", cases=".")
def test_hap_frequencies_advanced_no_cohorts(fixture, api: AnophelesHapFrequencyAnalysis):
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function definition line exceeds the repository’s configured Ruff line length (88) and may fail ruff-format in pre-commit/CI. Please run the formatter or wrap the signature similarly to nearby tests in this file.

Suggested change
def test_hap_frequencies_advanced_no_cohorts(fixture, api: AnophelesHapFrequencyAnalysis):
def test_hap_frequencies_advanced_no_cohorts(
fixture, api: AnophelesHapFrequencyAnalysis
):

Copilot uses AI. Check for mistakes.
Comment on lines +242 to +257
@parametrize_with_cases("fixture,api", cases=".")
def test_hap_frequencies_advanced_no_cohorts(fixture, api: AnophelesHapFrequencyAnalysis):
all_sample_sets = api.sample_sets()["sample_set"].to_list()
sample_sets = random.choice(all_sample_sets)
region = fixture.random_region_str()

params_advanced = dict(
region=region,
area_by="admin1_iso",
period_by="year",
min_cohort_size=10_000,
sample_sets=sample_sets,
)

with pytest.raises(ValueError, match="No cohorts found"):
api.haplotypes_frequencies_advanced(**params_advanced)
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says this test covers two scenarios (high min_cohort_size and an overly restrictive sample_query), but the added test currently only exercises the min_cohort_size path. Either add the sample_query case to the test (as described) or update the PR description to match what’s implemented.

Copilot uses AI. Check for mistakes.
@jonbrenas
Copy link
Collaborator

Hi @karthik642006, it sounds like Copilot makes some good points here. Is what it says incorrect?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve error messaging when haplotypes_frequencies_advanced yields no cohorts

4 participants