-
Notifications
You must be signed in to change notification settings - Fork 39
Adir1.x #795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @tristanpwdennis . Thanks for your work on this. I hope we can figure out the necessary adjustments to make this fit. I see you've marked this PR ready for review but we'll still need the existing code in Similarly, there is currently an assumption that sequence QC metadata will be available (in a certain place and format), which we will need to resolve. If those data aren't available in this case, then we will need to re-engineer the code (in general) to allow for this, without affecting the other existing classes, functions and data support. The existing Af1.x support does not include aim_analysis=None,
aim_metadata_dtype=None,
aim_ids=None,
aim_palettes=None,I'm not sure I understand the changes to the Ultimately, in order to add anything to the master branch, we would also require all of the CI checks to pass. I've created an issue for this work #797 and assigned you, just to help us track the work. |
|
Hi Lee - thanks for taking a look at this. Requesting a review must have been an accident - I was just trying to add you back to the PR but probably misunderstood. I've turned it back into a draft (think that's the right thing to do?) and will re-request when I think it's actually ready. The code, data and metadata in vobs-asia are a few years behind the structure currently used for Af and Ag - so things like the ToU inclusion, inclusion of the QC metadata, broke when I tried to run the current API version with the data. All of these data exist, and can be included, I just need to adapt the data and GCS filesystem structure to be compatible with the API. Working on this now. The changes to anopheles.py add caching to the roh_hmm function - on reflection, would this have been better as its own PR? If so, can do that I think. Apologies - a lot of my edits are so that I can get some analysis done first, then make the data and alterations compatible with the API second - but I don't think there is any reason why we can't get it to work? Not sure about the iykernel version - will take a look. Is discussion here or on the issue best? HTH |
|
Thanks @tristanpwdennis, that's helped to clarify a few things for me. Yeah, I expect the work to add caching to the Thanks again for explaining the situation. I'll be on hand to review this again once everything's ready and aligned. Or I can take a look at things along the way, if needs be. Regarding where to post discussion, it is often a grey area, but generally I reckon comments relating to the issue itself (e.g. describing a bug or the requirements for a new feature) seem to belong with the issue, whereas comments relating to a particular ongoing attempt to resolve an issue (e.g. a particular approach or set of code changes) usually belong with the PR itself. The comments will be seen either way, but having them in the most relevant place can help us when we're looking back later. Having said that, discussion often starts under the issue until a PR is drafted, and then naturally continues within the PR until the PR is approved, and then the issue is closed, on a good day. |
Had commented this out, back now!
…s hap/cnv/aim data)
ahernank
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging shortly, unless any concerns.
|
Hi @ahernank - One concern is that this PR is failing still tests, which would cause the master branch to also fail tests when merged in (as well as the obvious problems of safeguards failing). |
|
I agree with @leehart. The tests fail on an error which actually exists (as opposed to the more transient "we chose an empty region somehow") so I don't think we can merge it yet. |
|
Hi guys I had been meaning to discuss the reasons the tests are failing - but was a little slow. Apologies. If you examine the tracebacks, the tests fail in three areas:
Depending on what you guys think, we have three options:
Please let me know what you think. I am inclined to go for #1 immediately and then add new or modified tests as soon as possible - but this is because we are hoping to submit a paper. Cheers all, thank you for your eyes on this. |
|
Thank you, @tristanpwdennis . I agree that removing these tests for dirus and minimus is probably the most expedient way to progress as long as we remember that it is still in a beta state. I think some of the issues can be addressed fairly easily, though. For g123, it looks like the issue is that For the metadata, the tests are quite important so I would rather add dummy data (NaN or 0) in the missing columns. There is also an errant column 'Unnamed: 0' that needs to be removed. For the diplotype clustering, it is possible to use a different query, at least for Adir. |
|
Thanks @jonbrenas! I have removed the checks for the g123 and dip clustering. The G123 still looks for haplotype data - unless I am missing something - and I couldn't work out how to modify the query in the dip_clustering_advanced - which I don't think is included in Adir1.x anyway as we don't have CNV calls yet. Let me know what you think |
jonbrenas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We need to make some updates to deal with the new errors caused by the merge with master.
|
The integration tests failed because 'unrestricted_use_only' and 'surveillance_use_only' are not instantiated. That was to be expected and can easily be corrected. |
|
It's hard to think of a value in the metadata that may apply in the same way as sex_call (e.g. by country or sample set may break?, whereas in all datasets we know the result will be M or F). I've added a tag to the cases so that the tests with those queries only run on ag3 and af1. Linting fixed. Surveillance and restriction instantiated. The tests are failing, but I am unable to reproduce this error locally - thoughts? |
|
The tests returned a SEGFAULT (which is not typically an error that you should be able to see in Python) so it seemed likely that it was a transient error with the script. Re-running the tests solved it. An option for the query would be to build it as "country == C" where C is the country of the first sample in the metadata. Just skipping the test works, at least until we find a better solution, imho. |
jonbrenas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I will merge later today if no one objects.
The UI decided to change my review from "Approve" to "Request changes"!
jonbrenas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I will merge later today if no one objects.
This replaces PR #793.
Continuing previous discussion....
Thanks @leehart - I'll keep working on this and adding features (e.g. cohorts analysis and fixing the metadata so it is compatible with Ag3/Af1), before I request a review.
In the meantime, I'm happy to contribute to some of the re-engineering to allow more optional implementations of functionality to new classes (e.g. As1, Adir1, updated Amin1 and potentially others). If that would be helpful, please point me at whatever I can do.