GH1170: improve error when haplotypes_frequencies_advanced has no coh…#1184
GH1170: improve error when haplotypes_frequencies_advanced has no coh…#1184karthik642006 wants to merge 2 commits intomalariagen:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to improve user-facing error messaging in haplotypes_frequencies_advanced() when cohort construction yields zero cohorts, and adds a unit test to validate the behavior.
Changes:
- Add an explicit “no cohorts found” guard in
haplotypes_frequencies_advanced(). - Add a unit test expecting a
ValueErrorwhenmin_cohort_sizeis too high.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
malariagen_data/anoph/hap_frq.py |
Adds a new guard intended to raise a clearer error when df_cohorts is empty. |
tests/anoph/test_hap_frq.py |
Adds a test asserting a ValueError is raised when no cohorts are produced. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if df_cohorts.empty: | ||
| raise ValueError( | ||
| "No cohorts found after applying filters. " | ||
| "Try lowering min_cohort_size, adjusting area_by/period_by/taxon_by, " | ||
| "or relaxing sample_query." | ||
| ) | ||
|
|
There was a problem hiding this comment.
df_cohorts.empty check here is redundant/unreachable because _build_cohorts_from_sample_grouping() already raises ValueError when the cohort dataframe is empty after applying min_cohort_size (see malariagen_data/anoph/frq_base.py). As written, this custom error message will likely never be seen; callers will still get the helper's existing message. If the goal is to improve the message for this API, consider catching the helper's ValueError and re-raising with the more actionable text (or update the helper message in a shared place) and remove this dead check.
| if df_cohorts.empty: | |
| raise ValueError( | |
| "No cohorts found after applying filters. " | |
| "Try lowering min_cohort_size, adjusting area_by/period_by/taxon_by, " | |
| "or relaxing sample_query." | |
| ) |
| @parametrize_with_cases("fixture,api", cases=".") | ||
| def test_hap_frequencies_advanced_no_cohorts(fixture, api: AnophelesHapFrequencyAnalysis): | ||
| all_sample_sets = api.sample_sets()["sample_set"].to_list() | ||
| sample_sets = random.choice(all_sample_sets) | ||
| region = fixture.random_region_str() | ||
|
|
||
| params_advanced = dict( | ||
| region=region, | ||
| area_by="admin1_iso", | ||
| period_by="year", | ||
| min_cohort_size=10_000, | ||
| sample_sets=sample_sets, | ||
| ) | ||
|
|
||
| with pytest.raises(ValueError, match="No cohorts found"): | ||
| api.haplotypes_frequencies_advanced(**params_advanced) |
There was a problem hiding this comment.
This new test expects an exception message matching "No cohorts found", but the cohort-building helper currently raises "No cohorts available for the given sample selection parameters and minimum cohort size." when zero cohorts are produced. Unless the production code is updated to re-raise with the new wording, this assertion will fail. Also, the PR description mentions covering a restrictive sample_query scenario, but this test currently only covers the high min_cohort_size case.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
|
|
||
| @parametrize_with_cases("fixture,api", cases=".") | ||
| def test_hap_frequencies_advanced_no_cohorts(fixture, api: AnophelesHapFrequencyAnalysis): |
There was a problem hiding this comment.
This function definition line exceeds the repository’s configured Ruff line length (88) and may fail ruff-format in pre-commit/CI. Please run the formatter or wrap the signature similarly to nearby tests in this file.
| def test_hap_frequencies_advanced_no_cohorts(fixture, api: AnophelesHapFrequencyAnalysis): | |
| def test_hap_frequencies_advanced_no_cohorts( | |
| fixture, api: AnophelesHapFrequencyAnalysis | |
| ): |
| @parametrize_with_cases("fixture,api", cases=".") | ||
| def test_hap_frequencies_advanced_no_cohorts(fixture, api: AnophelesHapFrequencyAnalysis): | ||
| all_sample_sets = api.sample_sets()["sample_set"].to_list() | ||
| sample_sets = random.choice(all_sample_sets) | ||
| region = fixture.random_region_str() | ||
|
|
||
| params_advanced = dict( | ||
| region=region, | ||
| area_by="admin1_iso", | ||
| period_by="year", | ||
| min_cohort_size=10_000, | ||
| sample_sets=sample_sets, | ||
| ) | ||
|
|
||
| with pytest.raises(ValueError, match="No cohorts found"): | ||
| api.haplotypes_frequencies_advanced(**params_advanced) |
There was a problem hiding this comment.
The PR description says this test covers two scenarios (high min_cohort_size and an overly restrictive sample_query), but the added test currently only exercises the min_cohort_size path. Either add the sample_query case to the test (as described) or update the PR description to match what’s implemented.
|
Hi @karthik642006, it sounds like Copilot makes some good points here. Is what it says incorrect? |
Fixes #1170
Problem
haplotype_frequencies_advanceddoes not guard against the case wherecohort grouping produces zero cohorts — which can happen when:
min_cohort_sizeis set too highsample_queryis too restrictivearea_by/period_by/taxon_bycombination is too narrowIn these cases, the function continues execution and fails later with
cryptic downstream errors, making it hard for users to diagnose the
root cause.
Fix
Added an explicit guard immediately after cohort construction:
This ensures the function fails fast with a clear, actionable message
close to the source of the problem.
Tests Added
Added
test_hap_frequencies_advanced_no_cohortscovering two scenarios:Testing Done
poetry run pytest -v tests --ignore tests/integration✅pre-commit run --all-files✅ValueErroris raised when filters produce zero cohorts ✅Breaking Changes
None — purely additive change. Existing behaviour is unchanged when
cohorts are non-empty.