Cache source line starts for span formatting#877
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves span formatting performance by caching source line-start offsets in SrcFile and updating parser source construction to use the new constructor.
Changes:
- Added
SrcFile::newand cached line-start computation viaOnceLock. - Reimplemented
SrcFileclone/equality/hash behavior to exclude the cache. - Updated the S-expression parser to construct
SrcFilethrough the new API.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
egglog-ast/src/span.rs |
Adds cached line-start offsets and updates SrcFile trait implementations. |
src/ast/parse.rs |
Routes parser source creation through SrcFile::new. |
Comments suppressed due to low confidence (3)
egglog-ast/src/span.rs:61
- This slices
contentsatoffset, but callers can pass byte offsets that are not UTF-8 character boundaries.Span::Displaydoes this for spans ending in a multibyte character because it callsget_location(span.j - 1), so formatting an atom likeécan panic instead of reporting a location. Clamp/adjust the offset to a valid character boundary before slicing.
let col = self.contents[line_start..offset].chars().count() + 1;
egglog-ast/src/span.rs:30
contentsremains a public mutable field, butline_startsis cached after the first location lookup. If a caller mutatescontentsafterward, the cache is stale and laterget_locationcalls can return incorrect line/column values or panic when cached offsets no longer match the string. Make the contents immutable/private or provide mutation APIs that invalidate the cache.
pub contents: String,
line_starts: OnceLock<Vec<usize>>,
egglog-ast/src/span.rs:30
- Adding a private field to this public struct makes
SrcFile { name, contents }literals in downstreamegglog-astusers stop compiling. The new constructor helps new code, but it does not preserve source compatibility for an existing public API; consider a compatibility path or call out/coordinate the breaking API change.
line_starts: OnceLock<Vec<usize>>,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #[derive(Debug)] | ||
| pub struct SrcFile { | ||
| pub name: Option<String>, | ||
| pub contents: String, | ||
| line_starts: OnceLock<Vec<usize>>, |
Merging this PR will improve performance by 7.85%
Performance Changes
Tip Curious why this is faster? Comment Comparing Footnotes
|
|
What if we just change the |
Summary
Motivation
Profiling --term-encoding on the reduced eggcc fixture showed startup time dominated by formatting generated rules: Span::Display repeatedly called SrcFile::get_location, which rescanned the whole file prefix for each span.
Testing