Skip to content

optimal_accuracy time -> stats; pre-allocate null_two scratch buffers#32

Open
traviswheeler wants to merge 2 commits intomainfrom
small_wins
Open

optimal_accuracy time -> stats; pre-allocate null_two scratch buffers#32
traviswheeler wants to merge 2 commits intomainfrom
small_wins

Conversation

@traviswheeler
Copy link
Copy Markdown
Member

optimal_accuracy was timed in AlignStageStats but never propagated to the global ThreadedTimed stats, silently inflating the "misc" bucket (~14% of alignment time on a 1000-HMM run). Add ThreadedTimed::OptimalAccuracy and wire it up in Stats::add_sample.

Introduce NullTwoScratch to hold the four Vec allocations that null_two_score previously made on every call. The scratch is held on DefaultAlignStage and reused across calls.

Buffer sizing: the largest buffers are core_posteriors (target_length + 1 floats) and match_sums/insert_sums (profile_length + 1 floats each). In the worst case — say, titin at ~34k AA against a large Pfam model — these reach roughly 160 KB total per thread. In practice they'll stabilize quickly at the high-water mark for the run.

Two properties are maintained on every call via NullTwoScratch::zero_prefix:

  • Allocate only when a new high-water mark is reached; otherwise reuse the existing allocation with no heap traffic.
  • Zero only the prefix of each buffer that the current call will touch, not the entire capacity. A 400-AA sequence after a 34k-AA sequence writes 401 zeros, not 34,000.

Changes implemented with Claude

…uffers

optimal_accuracy was timed in AlignStageStats but never propagated to the
global ThreadedTimed stats, silently inflating the "misc" bucket (~14% of
alignment time on a 1000-HMM run). Add ThreadedTimed::OptimalAccuracy and
wire it up in Stats::add_sample.

Introduce NullTwoScratch to hold the four Vec allocations that null_two_score
previously made on every call. The scratch is held on DefaultAlignStage and
reused across calls.

Buffer sizing: the largest buffers are core_posteriors (target_length + 1
floats) and match_sums/insert_sums (profile_length + 1 floats each). In the
worst case — say, titin at ~34k AA against a large Pfam model — these reach
roughly 160 KB total per thread. In practice they'll stabilize quickly at
the high-water mark for the run.

Two properties are maintained on every call via NullTwoScratch::zero_prefix:
  - Allocate only when a new high-water mark is reached; otherwise reuse
    the existing allocation with no heap traffic.
  - Zero only the prefix of each buffer that the current call will touch,
    not the entire capacity. A 400-AA sequence after a 34k-AA sequence
    writes 401 zeros, not 34,000.

Changes implemented with Claude
@traviswheeler traviswheeler requested a review from a team April 6, 2026 22:06
I forgot to add this file.
This replace clear()+resize() with an explicit zero_prefix helper that
makes the memory strategy unambiguous:
  - If the buffer is smaller than needed, resize (allocating only at a
    new high-water mark).
  - Otherwise, fill only the prefix the current call will touch.

The clear()+resize() idiom was already correct — it wrote exactly n zeros
regardless of capacity — but the intent wasn't obvious from reading it

.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant