perf(sessions): sub-second /chat loading for large session histories#27028
perf(sessions): sub-second /chat loading for large session histories#27028N1kky-wed wants to merge 23 commits into
Conversation
InShot_20260514_115500194.mp4Real-world comparison: 1 minute 7 seconds → 1.24 seconds (98% reduction) |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces significant performance optimizations for session loading and sorting. Key improvements include a secondary sort by startTime for sessions with identical lastUpdated timestamps, and a new high-performance path for retrieving message counts using wc -l on POSIX systems or optimized buffer scanning on Windows. Furthermore, loadConversationRecord now includes an ultra-fast preview mode that extracts metadata using regex from raw file buffers, significantly reducing the overhead of JSON parsing for large files. A review comment identifies a potential issue where the current logic for parsing command-line output would fail if file paths contain spaces, and it suggests a more robust regex-based extraction and path resolution strategy.
|
*The glory of this nation belongs to the people because lawful authority
flows from God-given rights through the people, not from institutions down
onto them. Government is only legitimate when it protects those rights;
technology is only legitimate when it serves them.*
…On Thu, May 14, 2026 at 9:22 AM gemini-code-assist[bot] < ***@***.***> wrote:
***@***.***[bot]* commented on this pull request.
Code Review
This pull request introduces significant performance optimizations for
session loading and sorting. Key improvements include a secondary sort by
startTime for sessions with identical lastUpdated timestamps, and a new
high-performance path for retrieving message counts using wc -l on POSIX
systems or optimized buffer scanning on Windows. Furthermore,
loadConversationRecord now includes an ultra-fast preview mode that
extracts metadata using regex from raw file buffers, significantly reducing
the overhead of JSON parsing for large files. A review comment identifies a
potential issue where the current logic for parsing command-line output
would fail if file paths contain spaces, and it suggests a more robust
regex-based extraction and path resolution strategy.
------------------------------
In packages/cli/src/utils/sessionUtils.ts
<#27028 (comment)>
:
> + const parts = line.trim().split(/\s+/);
+ if (parts.length >= 2 && parts[1] !== 'total') {
+ lineCounts.set(path.basename(parts[1]), parseInt(parts[0], 10));
[image: high]
<https://camo.githubusercontent.com/7559374fd248a2a146dfe7112beda558979c4f6d02dbe7c5161fd893fe834423/68747470733a2f2f7777772e677374617469632e636f6d2f636f64657265766965776167656e742f686967682d7072696f726974792e737667>
The current logic for parsing wc -l output using split(/\s+/) will fail
if any part of the file path contains spaces. Using a regular expression to
capture the count and the full path is more robust. Additionally, ensure
that file paths extracted from command output are sanitized to prevent path
traversal and other vulnerabilities (Rule 6), and resolved consistently
using the repository's standard resolveToRealPath function (Rule 9).
const match = line.trim().match(/^(\d+)\s+(.+)$/);
if (match && match[2] !== 'total') {
const resolvedPath = resolveToRealPath(match[2]);
lineCounts.set(path.basename(resolvedPath), parseInt(match[1], 10));
}
References
1. Sanitize file paths extracted from untrusted sources, such as
command output, to prevent path traversal (..), null byte injection (\0),
and other vulnerabilities.
2. Ensure consistent path resolution by using a single, robust
function (e.g., resolveToRealPath) for all related path validations,
including internal validations in components like WorkspaceContext.
—
Reply to this email directly, view it on GitHub
<#27028 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BPTYXQHX3CMCQLDFZQ3R4OL42XJA7AVCNFSM6AAAAACY5LG4GGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DEOJQGY4TMOBQHE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements performance optimizations for session loading, including a regex-based fast path for large files, bulk line counting using native system tools on POSIX, and parallelized metadata processing. It also adds secondary sorting by session start time. Reviewer feedback highlights the need for a helper to extract the latest metadata occurrences in JSONL files, anchoring regex patterns to prevent false positives during parsing, and sanitizing AI-generated summaries to prevent prompt injection.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces significant performance optimizations for session browsing by implementing a fast-path for loading conversation metadata. Key changes include a regex-based 'ultra-fast' preview for large session files, bulk line counting using native wc -l on POSIX systems, and optimized metadata-only parsing that avoids full JSON deserialization of message content. Additionally, session sorting now includes a stable secondary sort by startTime. Feedback was provided regarding a fragile regex pattern used for summary extraction in the fast-path, which could fail if nested objects appear before the summary key.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request significantly optimizes session loading performance by introducing a fast preview path that uses regex-based extraction from file buffers and a bulk line-counting mechanism (using wc -l on POSIX). It also refactors the metadata-only loading path to reduce unnecessary JSON parsing. Feedback highlights several critical areas for improvement: the sanitizeSummary function incorrectly uses HTML encoding which breaks terminal rendering and violates prompt injection rules; the regex for metadata extraction is fragile regarding escaped quotes; the optimized line reader relies on a brittle startsWith check; and the Windows fallback for line counting could be more efficient by using Buffer.indexOf instead of byte-by-byte iteration.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces significant performance optimizations for session management and browsing. Key improvements include a secondary sort by startTime for stable session ordering, a platform-optimized bulk line-counting mechanism using wc -l or stream buffers, and an 'ultra-fast' metadata extraction path in loadConversationRecord that utilizes regex on raw file buffers to avoid expensive JSON parsing. Review feedback identified a brittle regex used for message ID extraction in the optimized metadata path, which could fail if property ordering in the JSON lines changes, potentially leading to incorrect message counts.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces significant performance optimizations for loading and displaying session history. Key changes include a secondary sort by startTime for stable session ordering, a bulk line-counting mechanism (utilizing wc -l on POSIX systems), and an 'ultra-fast' preview mode in loadConversationRecord that uses regex-based extraction from raw file buffers to bypass full JSON parsing. Feedback from the review focuses on preventing potential crashes due to file descriptor limits during concurrent processing, ensuring robust fallbacks for the bulk line-counting logic, and removing redundant file system calls to further improve efficiency.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces significant performance optimizations for session management, including a regex-based "fast preview" mode for large session files and bulk line counting using native system tools. It also refines session sorting and adds metadata sanitization. Feedback focuses on improving the robustness of the fast preview logic, specifically by ensuring the most recent metadata is captured from buffers, handling regex flags safely to avoid runtime errors, and lowering concurrency limits in fallback paths to prevent file descriptor exhaustion. Additionally, there is a note regarding the potential for data visibility issues in extremely large files where messages might fall outside the optimized read buffers.
…x parsing, clarifying comments)
…regex, and clean formatting)
… timestamp fallbacks
…ta lastUpdated for perfect sorting
…d legacy json formats
…acyRecordFallback
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces significant performance optimizations to the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces performance optimizations for session history loading and browsing. Key improvements include a secondary sort by startTime for sessions with identical update times, a fast line-counting utility using wc -l or streams, and an 'ultra-fast' metadata extraction path that uses regex on raw file buffers to avoid full JSON parsing. Feedback was provided to improve the robustness of regex patterns used during metadata extraction to prevent false matches within message content.
…tches in message content Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Summary
The
/chatcommand took 25+ seconds to load with large session histories.This PR reduces load time to 634ms on a real-world benchmark of 59 sessions / 2.3GB of JSONL.
Details
Three compounding bottlenecks eliminated:
chatRecordingService.ts: New ultra-fast preview path uses rawfd.read()calls (64KB head + 128KB tail). Core metadata is extracted via regex directly
on buffers, while the first user message is extracted via line-based JSON parsing
to robustly handle nested structures — without streaming the full file.
sessionUtils.ts: Bulkwc -lacross all files in a single subprocess call(batched at 100 files) replaces per-file subprocess calls. Windows fallback uses
batch-limited (20 concurrent) Node.js buffer scanning to avoid FD exhaustion.
utils.ts: Stable secondary sort by startTime when lastUpdated ties, fixinginconsistent session ordering. Filename timestamp used as reliable startTime
fallback instead of mtime.
Note: The 25.62s benchmark was recorded on a fresh environment. In practice,
load times frequently exceed 60 seconds on warmed systems with large histories.
Related Issues
Fixes #27027
Related to #26478
How to Validate
/chatand compare load time against main branchNote: Windows fallback path (batch-limited buffer scanning) is implemented
but not benchmarked on hardware — community validation welcome.
Pre-Merge Checklist