Adding the sylph coverage model to yacht #141

rtraborn · 2025-12-08T15:23:53Z

Hi @dkoslicki and team!
I created a sylph coverage model from Shaw and Yu, 2024 and added it to yacht, in a branch I named superyacht just for fun.
This is a draft that I'm still testing, so that and other caveats still apply. A few notes:

The coverage code is in a function called cov_calc, which calculates lambda and ani according as specified by the sylph paper.
For engineering reasons, I decided to put cov_calc inside get_exclusive_hashes, given that that function provides us with the signature objects needed to make the calculations.
Because of this, I am passing the output of cov_calc, a pandas dataframe, along with hypothesis_recovery. There are probably good ways to integrate this, and I'll give this some more thought.
Also, I decided to not incorporate the output of cov_calc more deeply into hypothesis_recovery for now. I have some ideas on what might be the best approach that we could discuss if you'd like. I thought it would be best to share this new branch while I look into this more deeply.
The script internal_superyacht_test.py is just a script that I have been using to test the new branch, and this can be ignored; I'll remove it once we move towards publication.
I plan to update the way I instantiated the AdjustStatusLambda enum in a more idiomatic python way this week. It should be a relatively quick fix.
I did not incorporate the taxonomic reassignment/winner_map routine from sylph, but it's something I would like to add.

I'm going to do more testing this week on additional datasets. Happy to discuss here or via email/video!

…ate effective coverage, etc according to sylph (Shaw and Yu, 2024).

… that aren't necessary.

… print statements with logger. Moved all constants to utils.py.

rtraborn · 2026-01-02T22:02:16Z

After some more testing, I just pushed some additional updates to this branch.

Fixed a typo: corrected to logger.warning in cov_calc.py
Added missing scipy.special.gamma import to utils.py
Fixed a few bugs I discovered in binary_search_lambda()
Replaced print statements with logger.info() for consistency
Consolidated duplicate constants into utils.py
Removed a few unused local variables from hypothesis_recovery_src.py

rtraborn · 2026-01-07T18:12:11Z

A small update, but with my most recent commit from last night I made the promised change to the AdjustStatusLambda enum in cov_calc to make it more idiomatically python-like. I think it looks cleaner- thanks to @standage for the flagging this!

…eed.

rtraborn · 2026-01-14T19:14:02Z

Hi All!

I've made some more changes over the past week. Here's an overview of my most recent updates (Part 1 of 2):

Formally adds the Winner takes all strategy from sylph, which performs abundance calculation and k-mer reassignment. This helps prevent double-counting of k-mers, assigning shared k-mers to the taxon with the highest calculated ANI. I'll note that in the interest of performance this procedure is only being done once per instance, rather than twice (as in sylph). We can discuss this in the future.
This procedure tracks k-mers lost to reassignment (kmers_lost column) and filters orgs with final_est_ani < 0.90 (90% ANI threshold)
Coverage results are now merged into the overall Excel output of yacht run. Previously the cov_calc output was passed along as a separate dataframe. I tried to do this without touching too much of the original yacht code; let me know what you think and if we need to make any tweaks.
The columns added to this new output are naive_ani, final_est_ani, final_est_cov, mean_cov, median_cov, lambda_status, ani_ci, lambda_ci, rel_abund, kmers_lost
Fixed a bug created by the refactoring described above 🙃 (MIN_ANI_THRESHOLD is defined).
I fixed a bug that I encountered after more extensive testing. What happened was that ani_from_lambda function had a ZeroDivisionError when lambda_val was 0 or very close to 0. It took my testing on a pretty diverse metagenomic sample (more on this test later) for this to crop up, but I'm glad it did.

…takes-all k-mer reassignment.

sonarqubecloud · 2026-01-16T16:22:30Z

Quality Gate passed

Issues
20 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

R. Taylor Raborn and others added 5 commits December 5, 2025 09:53

First commit on superyacht branch, including files required to calcul…

00e2db0

…ate effective coverage, etc according to sylph (Shaw and Yu, 2024).

Move files [skip ci]

21dc0cb

Removed remaining print statements throughout.

3b5a370

Moved test script [skip ci]

6f0d8ff

Corrected incorrect indent. [skip ci]

e09ec7b

dkoslicki mentioned this pull request Dec 17, 2025

Update workflow to trigger on pull requests #144

Merged

rtraborn added 2 commits December 29, 2025 22:37

Various code improvements to improve legibility and remove stray bits…

50ff040

… that aren't necessary.

Various bug fixes and improvements to code quality. Replaced relevant…

4dd7972

… print statements with logger. Moved all constants to utils.py.

Changed to a more typically python enum for cov_calc.

92e25cc

rtraborn and others added 5 commits January 10, 2026 17:11

Minor update to README and .gitignore.

80bedab

Added winner map functionality to hypothesis_recovery_src.py.

b1bd726

Minor update to .gitignore.

bab404f

Updated utils.py to define MIN_ANI_THRESHOLD.

b8b476a

Added parallelization to the coverage calculation; should increase sp…

1f81fcc

…eed.

R. Taylor Raborn added 7 commits January 14, 2026 14:31

Cleaning up documentation.

12fcde9

Added sample_sig as a pool variable to remove big performance overheads.

94cb13a

Improvements to parallel processing and new arguments for the winner-…

1f643c1

…takes-all k-mer reassignment.

Small tweaks to help statement for new winner-map related arguments.

ef9077d

Fixed a small typo.

45faa74

Renamed MEDIAN_ANI_THRESHOLD to avoid confusion.

6997b5c

Fixed critical bug due to umap_unordered- now matching on organism name.

3333649

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding the sylph coverage model to yacht #141

Adding the sylph coverage model to yacht #141

Uh oh!

rtraborn commented Dec 8, 2025

Uh oh!

rtraborn commented Jan 2, 2026 •

edited

Loading

Uh oh!

rtraborn commented Jan 7, 2026 •

edited

Loading

Uh oh!

rtraborn commented Jan 14, 2026 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Adding the sylph coverage model to yacht #141

Are you sure you want to change the base?

Adding the sylph coverage model to yacht #141

Uh oh!

Conversation

rtraborn commented Dec 8, 2025

Uh oh!

rtraborn commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rtraborn commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rtraborn commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud bot commented Jan 16, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rtraborn commented Jan 2, 2026 •

edited

Loading

rtraborn commented Jan 7, 2026 •

edited

Loading

rtraborn commented Jan 14, 2026 •

edited

Loading