Skip to content

Conversation

@bernaferrari
Copy link

Summary

Significant performance improvements to the diff algorithm through multiple optimizations:

  • Interning: Maps items to integer IDs for faster areItemsTheSame() comparisons
  • IntIntMap: Primitive int→int hashmap avoiding boxing overhead
  • Patience anchors: For lists >1000 items, uses unique elements as anchor points to split the diff problem
  • Inline pragmas: VM hints for hot path methods

Benchmarks

Benchmark Before After Speedup
RandomDiff(1000) 193 ms 116 ms 1.7x
RandomDiff(5000) 4,957 ms 422 ms 11.7x
RandomDiff(10000) 20,661 ms 1,704 ms 12.1x
RealWorldCode(3000 lines) 82 ms 13 ms 6.4x
LargeFile(10000 lines) 59 ms 25 ms 2.4x

Breaking Changes

None. All existing tests pass without modification.

- Interning: map items to int IDs for O(1) comparisons
- Prefix/Suffix trimming: skip common head/tail before diff
- IntIntMap: primitive int->int hash map (no boxing)
- Int32List: reduced memory for ID arrays

Benchmarks (1000 items):
- RandomDiff: ~36% faster (202ms -> 130ms)
- PrefixSuffix: ~4x faster (377µs -> 88µs)
- InsertDelete: ~40% faster (3.8ms -> 2.3ms)
- IntIntMap: primitive int->int hash map with open addressing
- Int32List: memory optimization for ID arrays
- anchors.dart: patience-style anchor finding with LIS algorithm
  (prepared for future integration)
- Patience anchors now mark unique matches with negative IDs
- Use Uint8List bitmask for O(1) anchor membership check
- IntIntMap for fast hash-based interning
- Maintain correct collision handling

Results:
- RandomDiff(10000): 15.5s -> 13.0s (16% faster)
- RandomDiff(1000): 202ms -> 124ms (39% faster)
- PrefixSuffix(1000): 378µs -> 86µs (4.4x faster)
Added @pragma vm:prefer-inline to:
- _Snake.hasAdditionOrRemoval(), isAddition(), diagonalSize()
- _Range.oldSize(), newSize()

Results:
- RandomDiff(1000): 124ms -> 116ms (6% faster)
- RandomDiff(10000): 13.0s -> 12.6s (3% faster)
…avior

The prefix/suffix optimization was greedily locking in matches for
duplicate elements, changing which duplicate gets preserved. This
caused regression test failures for issue knaeckeKami#15.

Removed prefix/suffix trimming from both interner.dart and calculateDiff().
Other optimizations (interning, IntIntMap, anchors, inline pragmas) remain.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant