-
Notifications
You must be signed in to change notification settings - Fork 39
add advanced haplotype clustering #817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
We now use the snp_effects() function... didnt know that existed, but makes sense |
|
I also add grey lines to the amino acid plot of the diplotype clustering functions - it looks a lot neater. And the haplotype function can now support multiple transcripts. |
|
This is super cool, @sanjaynagi. Thanks very much. I am struggling with bandwidth this week but have asked @leehart and @jonbrenas to take a look so we can merge asap. :) |
|
Awesome, thanks @ahernank |
|
Just adding a corresponding issue for this #819 |
|
Many thanks @sanjaynagi - This is great! Pylint is telling me there's a potential glitch around the following code in the # Get SNP genotype allele counts for the transcript, applying snp_query
df_eff = (
self.snp_effects(
transcript=transcript,
)
.query(snp_query)
.reset_index(drop=True)
)Raises problem:
I gather the issue is that class AnophelesSnpFrequencyAnalysis(AnophelesSnpData, AnophelesFrequencyAnalysis):Obviously all the tests pass, and the examples in the notebooks appear to work, so I don't quite understand why something like an We should probably check to see whether this warning from Pylint is right, anyhow. We might need to modify the inheritance chain, e.g. adding |
|
|
||
| # Get SNP genotype allele counts for the transcript, applying snp_query | ||
| df_eff = ( | ||
| self.snp_effects( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pylint reports:
Instance of 'AnophelesHapClustAnalysis' has no 'snp_effects' member
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. also not clear why the code was working - but i have added the Frequency class to hapclust class now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks Lee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sanjaynagi . The snp_effects function is in AnophelesSnpFrequencyAnalysis rather than AnophelesFrequencyAnalysis, and AnophelesFrequencyAnalysis only inherits from AnophelesBase, so probably won't make snp_effects available to AnophelesHapClustAnalysis, if I understand correctly.
Also, since AnophelesSnpFrequencyAnalysis inherits from AnophelesSnpData, I suspect we might need to put it before AnophelesSnpData in the AnophelesHapClustAnalysis base classes list. That made we wonder whether we need AnophelesSnpData there at all, but when I tried removing it locally, I got a method resolution error.
Unfortunately, I'm starting to suspect that there might be a larger architectural issue here, around inheritance, namely https://en.wikipedia.org/wiki/Multiple_inheritance#The_diamond_problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, my bad. I will replace the import of AnophelesFrequencyAnalysis and see what happens
…alariagen/malariagen-data-python into advanced-haplotype-clustering-05-09-25
|
Also, here is a notebook with a demo of the function https://colab.research.google.com/drive/1HqDJL0cz9gAKaeXdwYjcGXx-X1GYJznL?usp=sharing |
|
Looks like we're still waiting to try using I believe class AnophelesHapClustAnalysis(AnophelesSnpFrequencyAnalysis, AnophelesHapData, AnophelesSnpData):We might be able to address the potential diamond inheritance problem in a separate issue. Otherwise, I reckon this looks good to go. |
|
Hi @sanjaynagi. Is it OK if I bring this PR's branch up-to-date with the base branch, and take a closer look at these CI test failures? |
|
Yes please do!
…On Thu, 8 Jan 2026, 09:34 Lee, ***@***.***> wrote:
*leehart* left a comment (malariagen/malariagen-data-python#817)
<#817 (comment)>
Hi @sanjaynagi <https://github.com/sanjaynagi>. Is it OK if I bring this
PR's branch up-to-date with the base branch, and take a closer look at
these CI test failures?
—
Reply to this email directly, view it on GitHub
<#817 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIKN6HOLZ4BKSNBYT7SI63D4FYQCXAVCNFSM6AAAAACFYIVBYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTOMRTGAYDQMRXHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
Thanks @sanjaynagi . Looks like we have a couple of mypy linting errors in the CI checks https://github.com/malariagen/malariagen-data-python/actions/runs/20814001998/job/59784955756?pr=817 I also noticed that we have: class AnophelesHapClustAnalysis(AnophelesHapData, AnophelesSnpData, AnophelesFrequencyAnalysis)instead of class AnophelesHapClustAnalysis(AnophelesSnpFrequencyAnalysis, AnophelesHapData, AnophelesSnpData):In any case, we are still running into issue #835 at https://github.com/malariagen/malariagen-data-python/actions/runs/20814001969/job/59784955727?pr=817 ...which is preventing us from the CI checks & tests completing. This segfault issue exists outside the changes in this PR. |
|
Noting linting failures:
These can be addressed in similar ways to pending PRs #842 and #844 However, the segfault error is still sporadic and unresolved, via issue #835 and PR #842 |
This PR adds advanced haplotype clustering, allowing users to cut the tree and assign clusters, and to plot haplotype data (amino acid mutations by default) below the tree.
TODO