optimal_accuracy time -> stats; pre-allocate null_two scratch buffers#32
Open
traviswheeler wants to merge 2 commits intomainfrom
Open
optimal_accuracy time -> stats; pre-allocate null_two scratch buffers#32traviswheeler wants to merge 2 commits intomainfrom
traviswheeler wants to merge 2 commits intomainfrom
Conversation
…uffers
optimal_accuracy was timed in AlignStageStats but never propagated to the
global ThreadedTimed stats, silently inflating the "misc" bucket (~14% of
alignment time on a 1000-HMM run). Add ThreadedTimed::OptimalAccuracy and
wire it up in Stats::add_sample.
Introduce NullTwoScratch to hold the four Vec allocations that null_two_score
previously made on every call. The scratch is held on DefaultAlignStage and
reused across calls.
Buffer sizing: the largest buffers are core_posteriors (target_length + 1
floats) and match_sums/insert_sums (profile_length + 1 floats each). In the
worst case — say, titin at ~34k AA against a large Pfam model — these reach
roughly 160 KB total per thread. In practice they'll stabilize quickly at
the high-water mark for the run.
Two properties are maintained on every call via NullTwoScratch::zero_prefix:
- Allocate only when a new high-water mark is reached; otherwise reuse
the existing allocation with no heap traffic.
- Zero only the prefix of each buffer that the current call will touch,
not the entire capacity. A 400-AA sequence after a 34k-AA sequence
writes 401 zeros, not 34,000.
Changes implemented with Claude
I forgot to add this file.
This replace clear()+resize() with an explicit zero_prefix helper that
makes the memory strategy unambiguous:
- If the buffer is smaller than needed, resize (allocating only at a
new high-water mark).
- Otherwise, fill only the prefix the current call will touch.
The clear()+resize() idiom was already correct — it wrote exactly n zeros
regardless of capacity — but the intent wasn't obvious from reading it
.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
optimal_accuracy was timed in AlignStageStats but never propagated to the global ThreadedTimed stats, silently inflating the "misc" bucket (~14% of alignment time on a 1000-HMM run). Add ThreadedTimed::OptimalAccuracy and wire it up in Stats::add_sample.
Introduce NullTwoScratch to hold the four Vec allocations that null_two_score previously made on every call. The scratch is held on DefaultAlignStage and reused across calls.
Buffer sizing: the largest buffers are core_posteriors (target_length + 1 floats) and match_sums/insert_sums (profile_length + 1 floats each). In the worst case — say, titin at ~34k AA against a large Pfam model — these reach roughly 160 KB total per thread. In practice they'll stabilize quickly at the high-water mark for the run.
Two properties are maintained on every call via NullTwoScratch::zero_prefix:
Changes implemented with Claude