Optimize positional lookups with cached prefix-sum tuple by EliMunkey · Pull Request #249 · grantjenks/python-sortedcontainers

EliMunkey · 2026-03-14T11:10:50Z

Summary

This PR introduces a cached prefix-sum tuple (_cumsum) that accelerates the two most expensive internal methods:

_loc (sublist position → flat index): O(log n) Python tree traversal → O(1) tuple lookup
_pos (flat index → sublist position): O(log n) Python tree traversal → O(log n) C-level bisect_right

The _cumsum is built during _build_index and invalidated on structural changes. When cumsum is unavailable (e.g., during __delitem__ loops where each delete invalidates it), the original tree traversal is used as a fallback — so write-heavy operations are unaffected.

Additional optimizations

__slots__ on SortedList and SortedKeyList
Simplified __getitem__ for integers — removed redundant special-case checks already handled by _pos
__class__ is int dispatch in __getitem__/__delitem__ (faster than isinstance(x, slice))
Deferred attribute lookups in _delete/_expand — _maxes and _load only loaded in rare split/merge paths
Inline _expand guard in add() — skip function call when index is empty and no split needed
Lazy _maxes update in _delete — skip when deleted element is not the sublist maximum
_cumsum validation in _check() invariant method for both classes

All optimizations applied consistently to both SortedList and SortedKeyList.

Benchmark results

Tested on Python 3.14.3 (Apple Silicon), 1M elements. Results verified with interleaved A/B testing (3 trials, best-of-3 per operation) using the project's own benchmark_sortedlist.py:

Per-operation (at 1,000,000 elements)

Operation	Original	Optimized	Speedup
getitem	21.2ms	6.9ms	+67.5%
bisect	22.3ms	10.6ms	+52.3%
index	23.7ms	13.8ms	+41.9%
update_large	338.3ms	295.2ms	+12.8%
remove	17.8ms	15.5ms	+12.9%
delitem	37.5ms	33.6ms	+10.5%
update_small	114.3ms	103.5ms	+9.4%
add	17.5ms	16.0ms	+8.9%
contains	10.5ms	9.6ms	+8.9%
count	15.3ms	14.0ms	+8.3%
priorityqueue	152.1ms	141.4ms	+7.1%
iter	56.9ms	52.9ms	+7.0%
multiset	171.3ms	162.4ms	+5.2%
pop	5.1ms	5.1ms	+1.3%
ranking	251.8ms	249.9ms	+0.8%
neighbor	250.0ms	270.1ms	-8.1%
intervals	282.9ms	293.0ms	-3.6%
init	258.7ms	268.9ms	-3.9%

Overall: +4.2% across all 18 operations at 1M elements

The mixed workloads (priorityqueue, multiset, ranking) also benefit because they interleave bisect, index, and getitem with add/remove.

How it works

The existing _index tree stores sublist lengths in a dense binary tree, supporting O(log n) traversal for both _pos (downward, root-to-leaf) and _loc (upward, leaf-to-root). These traversals use Python while loops with ~10 iterations each.

The _cumsum tuple stores (0, len₀, len₀+len₁, len₀+len₁+len₂, ...) — a prefix sum with a leading zero. This enables:

_loc(pos, idx): simply _cumsum[pos] + idx (O(1), one tuple index + one add)
_pos(idx): bisect_right(_cumsum, idx) - 1 to find the sublist, then idx -= _cumsum[pos] for the offset (O(log n) in C via bisect)

The leading zero eliminates the need for pos - 1 indexing or if pos: branching.

The _cumsum is invalidated (set to empty tuple) whenever the tree is modified — both on structural changes (splits/merges that clear the tree) and on incremental updates (single add/delete that update tree nodes). The fallback tree traversal handles the write-heavy case (e.g., __delitem__ loops) without regression.

Test plan

All 299 existing tests pass (unit tests, coverage tests, stress tests)
All 37 doctests pass
ruff check — no new lint warnings (5 pre-existing B905 warnings unchanged)
ruff format — formatting passes
_cumsum validation added to _check() for both SortedList and SortedKeyList
Edge cases verified: empty list, single element, single sublist, negative indices, bool indexing, int subclasses, pickle, copy, _reset() with custom load factors
A/B benchmarked on Python 3.14 at 100k and 1M elements

🤖 Generated with Claude Code

Add a cached prefix-sum tuple (_cumsum) that accelerates the two most expensive internal methods: _pos (flat index to sublist position) and _loc (sublist position to flat index). The _cumsum is a tuple of cumulative sublist lengths with a leading zero, built during _build_index and invalidated on structural changes. When available, _loc becomes O(1) via direct tuple indexing, and _pos becomes O(log n) via C-level bisect_right — replacing O(log n) Python-level tree traversal in both cases. Additional optimizations: - Add __slots__ to SortedList and SortedKeyList - Simplify __getitem__ for integer indices by removing redundant checks already handled by _pos - Use __class__ is int dispatch in __getitem__/__delitem__ for faster type checking than isinstance(index, slice) - Defer _maxes and _load attribute lookups to rare split/merge paths - Skip _expand call in add() when index is empty and no split needed - Add lazy _maxes update in _delete (skip when deleted element is not max) - Add _cumsum validation to _check() invariant method Benchmark results (Python 3.14, 1M elements, A/B tested): getitem +67% (cumsum _pos + simplified dispatch) bisect +52% (cumsum _loc) index +42% (cumsum _loc) add +9% (inline _expand guard) delitem +10% (deferred attr lookups) remove +13% (deferred attr lookups) Overall +4-6% across all 18 benchmark operations All changes applied consistently to both SortedList and SortedKeyList. 299 existing tests pass. Zero API changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize positional lookups with cached prefix-sum tuple#249

Optimize positional lookups with cached prefix-sum tuple#249
EliMunkey wants to merge 1 commit intograntjenks:masterfrom
EliMunkey:pr/cumsum-optimization

EliMunkey commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

EliMunkey commented Mar 14, 2026

Summary

Additional optimizations

Benchmark results

Per-operation (at 1,000,000 elements)

Overall: +4.2% across all 18 operations at 1M elements

How it works

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant