Skip to content

Cache source line starts for span formatting#877

Open
oflatt wants to merge 1 commit into
mainfrom
codex-span-location-cache-main
Open

Cache source line starts for span formatting#877
oflatt wants to merge 1 commit into
mainfrom
codex-span-location-cache-main

Conversation

@oflatt
Copy link
Copy Markdown
Member

@oflatt oflatt commented May 16, 2026

Summary

  • Cache line-start offsets in SrcFile so repeated Span formatting no longer scans from the start of large source files.
  • Route parser source construction through SrcFile::new.

Motivation

Profiling --term-encoding on the reduced eggcc fixture showed startup time dominated by formatting generated rules: Span::Display repeatedly called SrcFile::get_location, which rescanned the whole file prefix for each span.

Testing

  • cargo test -p egglog-ast

Copilot AI review requested due to automatic review settings May 16, 2026 00:25
@oflatt oflatt requested a review from a team as a code owner May 16, 2026 00:25
@oflatt oflatt requested review from saulshanabrook and removed request for a team May 16, 2026 00:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves span formatting performance by caching source line-start offsets in SrcFile and updating parser source construction to use the new constructor.

Changes:

  • Added SrcFile::new and cached line-start computation via OnceLock.
  • Reimplemented SrcFile clone/equality/hash behavior to exclude the cache.
  • Updated the S-expression parser to construct SrcFile through the new API.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
egglog-ast/src/span.rs Adds cached line-start offsets and updates SrcFile trait implementations.
src/ast/parse.rs Routes parser source creation through SrcFile::new.
Comments suppressed due to low confidence (3)

egglog-ast/src/span.rs:61

  • This slices contents at offset, but callers can pass byte offsets that are not UTF-8 character boundaries. Span::Display does this for spans ending in a multibyte character because it calls get_location(span.j - 1), so formatting an atom like é can panic instead of reporting a location. Clamp/adjust the offset to a valid character boundary before slicing.
        let col = self.contents[line_start..offset].chars().count() + 1;

egglog-ast/src/span.rs:30

  • contents remains a public mutable field, but line_starts is cached after the first location lookup. If a caller mutates contents afterward, the cache is stale and later get_location calls can return incorrect line/column values or panic when cached offsets no longer match the string. Make the contents immutable/private or provide mutation APIs that invalidate the cache.
    pub contents: String,
    line_starts: OnceLock<Vec<usize>>,

egglog-ast/src/span.rs:30

  • Adding a private field to this public struct makes SrcFile { name, contents } literals in downstream egglog-ast users stop compiling. The new constructor helps new code, but it does not preserve source compatibility for an existing public API; consider a compatibility path or call out/coordinate the breaking API change.
    line_starts: OnceLock<Vec<usize>>,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread egglog-ast/src/span.rs
Comment on lines +26 to +30
#[derive(Debug)]
pub struct SrcFile {
pub name: Option<String>,
pub contents: String,
line_starts: OnceLock<Vec<usize>>,
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 16, 2026

Merging this PR will improve performance by 7.85%

⚡ 1 improved benchmark
✅ 49 untouched benchmarks
⏩ 190 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime tests[proof_testing_eqsat-basic] 28.6 ms 26.6 ms +7.85%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing codex-span-location-cache-main (280cbcf) with main (629b0b2)

Open in CodSpeed

Footnotes

  1. 190 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@saulshanabrook
Copy link
Copy Markdown
Member

What if we just change the EgglogSpan to store the line and column number for start and stop values, instead of storing the offset into the file regardless of line number? Then we don't have to recompute this each time to display?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants