Skip to content

Conversation

@rtraborn
Copy link
Collaborator

@rtraborn rtraborn commented Dec 8, 2025

Hi @dkoslicki and team!
I created a sylph coverage model from Shaw and Yu, 2024 and added it to yacht, in a branch I named superyacht just for fun.
This is a draft that I'm still testing, so that and other caveats still apply. A few notes:

  • The coverage code is in a function called cov_calc, which calculates lambda and ani according as specified by the sylph paper.
  • For engineering reasons, I decided to put cov_calc inside get_exclusive_hashes, given that that function provides us with the signature objects needed to make the calculations.
  • Because of this, I am passing the output of cov_calc, a pandas dataframe, along with hypothesis_recovery. There are probably good ways to integrate this, and I'll give this some more thought.
  • Also, I decided to not incorporate the output of cov_calc more deeply into hypothesis_recovery for now. I have some ideas on what might be the best approach that we could discuss if you'd like. I thought it would be best to share this new branch while I look into this more deeply.
  • The script internal_superyacht_test.py is just a script that I have been using to test the new branch, and this can be ignored; I'll remove it once we move towards publication.
  • I plan to update the way I instantiated the AdjustStatusLambda enum in a more idiomatic python way this week. It should be a relatively quick fix.
  • I did not incorporate the taxonomic reassignment/winner_map routine from sylph, but it's something I would like to add.

I'm going to do more testing this week on additional datasets. Happy to discuss here or via email/video!

@rtraborn
Copy link
Collaborator Author

rtraborn commented Jan 2, 2026

After some more testing, I just pushed some additional updates to this branch.

  • Fixed a typo: corrected to logger.warning in cov_calc.py
  • Added missing scipy.special.gamma import to utils.py
  • Fixed a few bugs I discovered in binary_search_lambda()
  • Replaced print statements with logger.info() for consistency
  • Consolidated duplicate constants into utils.py
  • Removed a few unused local variables from hypothesis_recovery_src.py

@rtraborn
Copy link
Collaborator Author

rtraborn commented Jan 7, 2026

A small update, but with my most recent commit from last night I made the promised change to the AdjustStatusLambda enum in cov_calc to make it more idiomatically python-like. I think it looks cleaner- thanks to @standage for the flagging this!

@rtraborn
Copy link
Collaborator Author

rtraborn commented Jan 14, 2026

Hi All!

I've made some more changes over the past week. Here's an overview of my most recent updates (Part 1 of 2):

  • Formally adds the Winner takes all strategy from sylph, which performs abundance calculation and k-mer reassignment. This helps prevent double-counting of k-mers, assigning shared k-mers to the taxon with the highest calculated ANI. I'll note that in the interest of performance this procedure is only being done once per instance, rather than twice (as in sylph). We can discuss this in the future.
  • This procedure tracks k-mers lost to reassignment (kmers_lost column) and filters orgs with final_est_ani < 0.90 (90% ANI threshold)
  • Coverage results are now merged into the overall Excel output of yacht run. Previously the cov_calc output was passed along as a separate dataframe. I tried to do this without touching too much of the original yacht code; let me know what you think and if we need to make any tweaks.
  • The columns added to this new output are naive_ani, final_est_ani, final_est_cov, mean_cov, median_cov, lambda_status, ani_ci, lambda_ci, rel_abund, kmers_lost
  • Fixed a bug created by the refactoring described above 🙃 (MIN_ANI_THRESHOLD is defined).
  • I fixed a bug that I encountered after more extensive testing. What happened was that ani_from_lambda function had a ZeroDivisionError when lambda_val was 0 or very close to 0. It took my testing on a pretty diverse metagenomic sample (more on this test later) for this to crop up, but I'm glad it did.

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants