-
Notifications
You must be signed in to change notification settings - Fork 39
Add cohort downsampling support to PCA and update tests #803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -288,3 +288,82 @@ def test_pca_fit_exclude_samples(fixture, api: AnophelesPca): | |||||||
| len(pca_df.query(f"sample_id in {exclude_samples} and not pca_fit")) | ||||||||
| == n_samples_excluded | ||||||||
| ) | ||||||||
|
|
||||||||
|
|
||||||||
| @parametrize_with_cases("fixture,api", cases=".") | ||||||||
| def test_pca_cohort_downsampling(fixture, api: AnophelesPca): | ||||||||
| # Parameters for selecting input data. | ||||||||
| all_sample_sets = api.sample_sets()["sample_set"].to_list() | ||||||||
|
||||||||
| all_sample_sets = api.sample_sets()["sample_set"].to_list() | |
| all_sample_sets = api.sample_sets()["sample_set"].to_list() | |
| random.seed(random_seed) # Seed the random module for deterministic sampling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note that these "flaky failures" seem to be a design decision, rather than an oversight, in order to cover a greater range of variables than would be covered by fixed test inputs.
For what it's worth, I do find it a little annoying when these random tests failures occur on PRs that aren't modifying anything related to the code involved in the random test failures. The natural response is often to log the unrelated issue separately and then to try re-running the tests in hope of green light, which doesn't seem ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The
cohort_sizeandmin_cohort_sizeparameters are accepted but not implemented beyond raising errors; consider either implementing their logic or removing them to avoid confusion.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note that Copilot's nitpick above does not appear to be true because
cohort_sizeandmin_cohort_sizeare either used to retrieve the cached results (which are based on certain params, including these) or they are passed to the_pcafunction, i.e.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks @leehart. We decided on Friday that Copilot was less nitpicking and more plain old wrong ;).