Skip to content

fix(rg): avoid eager match vector allocation#1742

Merged
chaliy merged 1 commit into
mainfrom
2026-05-25-fix-memory-dos-in-rg-find_iter
May 25, 2026
Merged

fix(rg): avoid eager match vector allocation#1742
chaliy merged 1 commit into
mainfrom
2026-05-25-fix-memory-dos-in-rg-find_iter

Conversation

@chaliy
Copy link
Copy Markdown
Contributor

@chaliy chaliy commented May 25, 2026

Motivation

  • A recent rg matcher change collected all matches into a Vec<RgMatch>, allowing attacker-controlled dense matches to force large transient allocations and cause host memory DoS for even-count-only operations.
  • Restore streaming behavior for counting and output paths to avoid per-match heap allocation while preserving functionality.

Description

  • Replace the eager find_iter-returning-Vec<RgMatch> design with a streaming callback RgMatcher::for_each_match and add RgMatcher::count_matches for lazy counting.
  • Update count-sensitive code paths to call regex.count_matches(...) instead of .find_iter(...).len() for --count-matches and JSON match aggregation.
  • Update output and formatting paths (color/highlight rendering, vimgrep, only-matching, multiline match collection, JSON submatch construction, etc.) to stream matches via for_each_match instead of pre-materializing per-line vectors.
  • Keep replacement and single-match helpers (find, replace_all, replace_first) unchanged.

Testing

  • Ran cargo test -p bashkit rg --lib; most tests passed but one differential test failed: builtins::rg::tests::diff_rg_matches_real_rg_cases :: colors custom highlight fg (overall: 173 passed, 1 failed).
  • Test run exercised rg unit suite and confirmed the change removes eager Vec allocations in count and output paths while leaving behavior equivalent for the majority of cases.

Codex Task

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 25, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
bashkit 06df616 Commit Preview URL May 25 2026, 04:43 PM

A recent rg matcher change collected all matches into a Vec, allowing
attacker-controlled dense matches to force large transient allocations
and cause host memory DoS even for count-only operations.

Replace the eager find_iter()→Vec design with streaming
RgMatcher::for_each_match callback iteration and add a lazy
RgMatcher::count_matches helper. Update count-sensitive paths
(--count-matches, JSON match aggregation, replace_all pre-flight) to
use count_matches, and update output/formatting paths (color/highlight,
vimgrep, only-matching, multiline match collection, JSON submatch
construction) to stream matches via for_each_match instead of
pre-materializing per-line vectors.

Replacement and single-match helpers (find, replace_all, replace_first)
remain unchanged.

Rebased on current main; original PR #1742 by chaliy.
@chaliy chaliy force-pushed the 2026-05-25-fix-memory-dos-in-rg-find_iter branch from 66b88a1 to 06df616 Compare May 25, 2026 15:47
@chaliy chaliy merged commit 95da70e into main May 25, 2026
33 checks passed
@chaliy chaliy deleted the 2026-05-25-fix-memory-dos-in-rg-find_iter branch May 25, 2026 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant