Optimize positional lookups with cached prefix-sum tuple#249
Open
EliMunkey wants to merge 1 commit intograntjenks:masterfrom
Open
Optimize positional lookups with cached prefix-sum tuple#249EliMunkey wants to merge 1 commit intograntjenks:masterfrom
EliMunkey wants to merge 1 commit intograntjenks:masterfrom
Conversation
Add a cached prefix-sum tuple (_cumsum) that accelerates the two most expensive internal methods: _pos (flat index to sublist position) and _loc (sublist position to flat index). The _cumsum is a tuple of cumulative sublist lengths with a leading zero, built during _build_index and invalidated on structural changes. When available, _loc becomes O(1) via direct tuple indexing, and _pos becomes O(log n) via C-level bisect_right — replacing O(log n) Python-level tree traversal in both cases. Additional optimizations: - Add __slots__ to SortedList and SortedKeyList - Simplify __getitem__ for integer indices by removing redundant checks already handled by _pos - Use __class__ is int dispatch in __getitem__/__delitem__ for faster type checking than isinstance(index, slice) - Defer _maxes and _load attribute lookups to rare split/merge paths - Skip _expand call in add() when index is empty and no split needed - Add lazy _maxes update in _delete (skip when deleted element is not max) - Add _cumsum validation to _check() invariant method Benchmark results (Python 3.14, 1M elements, A/B tested): getitem +67% (cumsum _pos + simplified dispatch) bisect +52% (cumsum _loc) index +42% (cumsum _loc) add +9% (inline _expand guard) delitem +10% (deferred attr lookups) remove +13% (deferred attr lookups) Overall +4-6% across all 18 benchmark operations All changes applied consistently to both SortedList and SortedKeyList. 299 existing tests pass. Zero API changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a cached prefix-sum tuple (
_cumsum) that accelerates the two most expensive internal methods:_loc(sublist position → flat index): O(log n) Python tree traversal → O(1) tuple lookup_pos(flat index → sublist position): O(log n) Python tree traversal → O(log n) C-levelbisect_rightThe
_cumsumis built during_build_indexand invalidated on structural changes. When cumsum is unavailable (e.g., during__delitem__loops where each delete invalidates it), the original tree traversal is used as a fallback — so write-heavy operations are unaffected.Additional optimizations
__slots__onSortedListandSortedKeyList__getitem__for integers — removed redundant special-case checks already handled by_pos__class__ is intdispatch in__getitem__/__delitem__(faster thanisinstance(x, slice))_delete/_expand—_maxesand_loadonly loaded in rare split/merge paths_expandguard inadd()— skip function call when index is empty and no split needed_maxesupdate in_delete— skip when deleted element is not the sublist maximum_cumsumvalidation in_check()invariant method for both classesAll optimizations applied consistently to both
SortedListandSortedKeyList.Benchmark results
Tested on Python 3.14.3 (Apple Silicon), 1M elements. Results verified with interleaved A/B testing (3 trials, best-of-3 per operation) using the project's own
benchmark_sortedlist.py:Per-operation (at 1,000,000 elements)
Overall: +4.2% across all 18 operations at 1M elements
The mixed workloads (priorityqueue, multiset, ranking) also benefit because they interleave
bisect,index, andgetitemwithadd/remove.How it works
The existing
_indextree stores sublist lengths in a dense binary tree, supporting O(log n) traversal for both_pos(downward, root-to-leaf) and_loc(upward, leaf-to-root). These traversals use Pythonwhileloops with ~10 iterations each.The
_cumsumtuple stores(0, len₀, len₀+len₁, len₀+len₁+len₂, ...)— a prefix sum with a leading zero. This enables:_loc(pos, idx): simply_cumsum[pos] + idx(O(1), one tuple index + one add)_pos(idx):bisect_right(_cumsum, idx) - 1to find the sublist, thenidx -= _cumsum[pos]for the offset (O(log n) in C viabisect)The leading zero eliminates the need for
pos - 1indexing orif pos:branching.The
_cumsumis invalidated (set to empty tuple) whenever the tree is modified — both on structural changes (splits/merges that clear the tree) and on incremental updates (single add/delete that update tree nodes). The fallback tree traversal handles the write-heavy case (e.g.,__delitem__loops) without regression.Test plan
ruff check— no new lint warnings (5 pre-existing B905 warnings unchanged)ruff format— formatting passes_cumsumvalidation added to_check()for bothSortedListandSortedKeyListboolindexing,intsubclasses,pickle,copy,_reset()with custom load factors🤖 Generated with Claude Code