optimal_accuracy time -> stats; pre-allocate null_two scratch buffers by traviswheeler · Pull Request #32 · TravisWheelerLab/nail

traviswheeler · 2026-04-06T22:06:21Z

optimal_accuracy was timed in AlignStageStats but never propagated to the global ThreadedTimed stats, silently inflating the "misc" bucket (~14% of alignment time on a 1000-HMM run). Add ThreadedTimed::OptimalAccuracy and wire it up in Stats::add_sample.

Introduce NullTwoScratch to hold the four Vec allocations that null_two_score previously made on every call. The scratch is held on DefaultAlignStage and reused across calls.

Buffer sizing: the largest buffers are core_posteriors (target_length + 1 floats) and match_sums/insert_sums (profile_length + 1 floats each). In the worst case — say, titin at ~34k AA against a large Pfam model — these reach roughly 160 KB total per thread. In practice they'll stabilize quickly at the high-water mark for the run.

Two properties are maintained on every call via NullTwoScratch::zero_prefix:

Allocate only when a new high-water mark is reached; otherwise reuse the existing allocation with no heap traffic.
Zero only the prefix of each buffer that the current call will touch, not the entire capacity. A 400-AA sequence after a 34k-AA sequence writes 401 zeros, not 34,000.

Changes implemented with Claude

…uffers optimal_accuracy was timed in AlignStageStats but never propagated to the global ThreadedTimed stats, silently inflating the "misc" bucket (~14% of alignment time on a 1000-HMM run). Add ThreadedTimed::OptimalAccuracy and wire it up in Stats::add_sample. Introduce NullTwoScratch to hold the four Vec allocations that null_two_score previously made on every call. The scratch is held on DefaultAlignStage and reused across calls. Buffer sizing: the largest buffers are core_posteriors (target_length + 1 floats) and match_sums/insert_sums (profile_length + 1 floats each). In the worst case — say, titin at ~34k AA against a large Pfam model — these reach roughly 160 KB total per thread. In practice they'll stabilize quickly at the high-water mark for the run. Two properties are maintained on every call via NullTwoScratch::zero_prefix: - Allocate only when a new high-water mark is reached; otherwise reuse the existing allocation with no heap traffic. - Zero only the prefix of each buffer that the current call will touch, not the entire capacity. A 400-AA sequence after a 34k-AA sequence writes 401 zeros, not 34,000. Changes implemented with Claude

I forgot to add this file. This replace clear()+resize() with an explicit zero_prefix helper that makes the memory strategy unambiguous: - If the buffer is smaller than needed, resize (allocating only at a new high-water mark). - Otherwise, fill only the prefix the current call will touch. The clear()+resize() idiom was already correct — it wrote exactly n zeros regardless of capacity — but the intent wasn't obvious from reading it .

traviswheeler requested a review from a team April 6, 2026 22:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimal_accuracy time -> stats; pre-allocate null_two scratch buffers#32

optimal_accuracy time -> stats; pre-allocate null_two scratch buffers#32
traviswheeler wants to merge 2 commits into
mainfrom
small_wins

traviswheeler commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

traviswheeler commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant