Skip to content

Conversation

@noamteyssier
Copy link

hey @bede

this p.r. adds support for bq and vbq.

I saw your thread scaling post on bsky and I was super curious to see how binseq files would do.

The current implementation just checks if the input file ends in {bq,vbq} to determine if it's a binseq input. I then just implemented the binseq::ParallelReader trait which handles either single/paired records and follows the same logic as the paraseq impl.

I tried to minimize the amount of code changed, but unfortunately the should_keep_sequence was a little difficult to work around because I needed to borrow immutably and mutably at the same time. My solution was just to make it an associated function of the struct, but I think a better solution exists. Shouldn't change much but requires more arguments at calltime which is not as ergonomic.

I've just tried this with 100M random human sequences from wgsim and I got the following at 16 threads (though roughly 5s of each was spent loading the index). For the single version I just ran the R1 of the fastq through and converted the R1 into either bq or vbq using bqtools. For the paired version I passed in both R1 and R2 and created a paired bq or vbq with bqtools.

Command Mean [s] Min [s] Max [s] Relative
fastq-single 17.767 ± 0.060 17.730 17.836 1.67 ± 0.01
bq-single 10.668 ± 0.040 10.627 10.706 1.00
vbq-single 11.298 ± 0.064 11.225 11.345 1.06 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
fastq-paired 33.552 ± 0.270 33.382 33.864 1.57 ± 0.02
bq-paired 21.313 ± 0.270 21.123 21.622 1.00
vbq-paired 22.549 ± 0.069 22.477 22.615 1.06 ± 0.01

I've found with other programs that bq and vbq and scale linearly with the number of threads well into 128+ threads. Would love to see that same benchmark you ran with binseq inputs as well!

Cheers,
Noam

@bede
Copy link
Owner

bede commented Aug 17, 2025

Hi Noam, nice PR! I will look more closely at this when I have time, but for now here are the results on the same machine for which I posted benchmarks on Bluesky, with plain FASTQ added for good measure. Results look great. As with uncompressed FASTA/FASTQ, scaling falls off above ~2Gbp/s with 16 threads, where Deacon may be saturating memory bandwidth (cc @RagnarGrootKoerkamp). Great to hear that you've good results with hundreds of threads for other applications, is this involving Paraseq out of interest?

deacon-thread-scaling-vbq
[
    {"threads": 1, "mbps": 139.1, "format": "vbq (PR #31)"},
    {"threads": 2, "mbps": 275.2, "format": "vbq (PR #31)"},
    {"threads": 4, "mbps": 537.9, "format": "vbq (PR #31)"},
    {"threads": 8, "mbps": 1053.1, "format": "vbq (PR #31)"},
    {"threads": 16, "mbps": 1956.2, "format": "vbq (PR #31)"},
    {"threads": 32, "mbps": 2500.1, "format": "vbq (PR #31)"}
]

I used rsviruses17900.fastq.gz encoded as vbq like so:

gzip -dc data/rsviruses17900/rsviruses17900.fastq.gz | bqtools encode -p r -f a -o data/rsviruses17900/rsviruses17900.vbq

and e.g.

cargo run -r -- filter -t 32 data/panhuman-1.k31w15.idx data/rsviruses17900/rsviruses17900.vbq > /dev/null

I appreciate this test is far from ideal. I measured the average sustained throughput, excluding index loading time. These PBSIM3-simulated long reads occasionally contain ambiguous bases, and it's interesting to see the impact substituting these for random nucleotides during encoding has on classification accuracy. I've only skimmed the Binseq paper – can the vbq format accommodate Ns somehow? We want to skip ambiguous minimizers ideally.

@noamteyssier
Copy link
Author

ah that's fascinating that the memory bandwidth saturates at those 16 threads. probably doesn't help that bq and vbq are sharing that memory bandwidth decoding back from binary to ascii and then back to binary.

Great to hear that you've good results with hundreds of threads for other applications, is this involving Paraseq out of interest?

paraseq oftentimes can't scale to hundreds of threads. it does scale very well though when the per-sequence task is complex enough that it doesnt completely saturate the reader threads (like in mmr).

But the largest advantage binseq has over compressed fastq is in very fast per-sequence tasks where decompression becomes the largest bottleneck. I'm working on a project now I hope will come out soon where paraseq can scale up to maybe 8 or so threads, but binseq can get up past 128.

can the vbq format accommodate Ns somehow?

not as it is now, but will be coming soon! interesting that it's making a large difference in the classification accuracy, I'd actually be curious to see what would happen if you were to change the ambiguous nucleotide policy (default: random) to some fixed nucleotide to see if that would change the results.

@bede
Copy link
Owner

bede commented Aug 18, 2025

But the largest advantage binseq has over compressed fastq is in very fast per-sequence tasks where decompression becomes the largest bottleneck

We certainly agree on this 🙂. Deacon is now heavily rate-limited by gzip decompression.

not as it is now, but will be coming soon!

From my perspective, N support in vbq and bqtools would be a really nice feature, allowing Deacon to generate identical results for fastq and vbq even in the presence of ambiguity. I would be happiest to merge this PR once vbq and bqtools have N support.

Another change (to Paraseq) that would be very impactful for Deacon would be parallel paired fastq reading from separate files. This would almost double decompression-limited processing of paired reads. Ragnar has opened an issue and I think is also looking into it.

@bede bede force-pushed the main branch 2 times, most recently from 08ec7de to 97868a0 Compare November 20, 2025 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants