Skip to content

Use possessive quantifiers in excerpt helper regexps to prevent backtracking#2738

Open
zachasme wants to merge 1 commit intomainfrom
except-helper-possessive-quantifiers
Open

Use possessive quantifiers in excerpt helper regexps to prevent backtracking#2738
zachasme wants to merge 1 commit intomainfrom
except-helper-possessive-quantifiers

Conversation

@zachasme
Copy link
Contributor

Hopefully fixes Regexp timeouts for very large cards. See https://3.basecamp.com/2914079/buckets/27/card_tables/cards/9700847055

Also cleans up the format_html helper.

…racking

Also cleans up the format_html helper.
Copilot AI review requested due to automatic review settings March 20, 2026 13:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce the risk of Regexp timeouts when generating excerpts from very large card contents by adjusting the excerpt-normalization regexes, and it also refactors the HTML formatting helper to encapsulate scrubber memoization.

Tip

If you aren't ready for review, convert to a draft PR.
Click "Convert to draft" or run gh pr ready --undo.
Click "Ready for review" or run gh pr ready to reengage.

Changes:

  • Update format_excerpt regex quantifiers to use possessive *+ in an attempt to reduce backtracking on large inputs.
  • Refactor format_html to call a private auto_link_scrubber helper for memoization instead of using an inline instance variable.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
app/helpers/html_helper.rb Refactors scrubber memoization into a private helper method used by format_html.
app/helpers/excerpt_helper.rb Adjusts excerpt-normalization regexes to reduce backtracking risk on very large strings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +6 to +8
text = text.gsub(/^>\s*+(.*)$/m, '> \1')
text = text.gsub(/^\s*+[-+]\s*+(.*)$/m, '• \1')
text = text.gsub(/^\d+\.\s*+(.*)$/m) { |m| m }
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Ruby, the /m flag makes . match newlines, so (.*) here can still span large portions of the string and may still trigger heavy backtracking on very large inputs (which this PR is trying to avoid). Consider removing /m and matching per-line explicitly (e.g., using a negated character class like [^\n]*+ / [^\r\n]*+, or making the trailing quantifier possessive) so the whole pattern stays linear-time on multiline content.

Suggested change
text = text.gsub(/^>\s*+(.*)$/m, '> \1')
text = text.gsub(/^\s*+[-+]\s*+(.*)$/m, '• \1')
text = text.gsub(/^\d+\.\s*+(.*)$/m) { |m| m }
text = text.gsub(/^>\s*+([^\r\n]*+)$/, '> \1')
text = text.gsub(/^\s*+[-+]\s*+([^\r\n]*+)$/, '• \1')
text = text.gsub(/^\d+\.\s*+([^\r\n]*+)$/) { |m| m }

Copilot uses AI. Check for mistakes.
text = text.gsub(/^\d+\.\s*(.*)$/m) { |m| m }
text = text.gsub(/^>\s*+(.*)$/m, '> \1')
text = text.gsub(/^\s*+[-+]\s*+(.*)$/m, '• \1')
text = text.gsub(/^\d+\.\s*+(.*)$/m) { |m| m }
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gsub is a functional no-op (the block returns the full match unchanged), but it still forces an extra regex scan/allocation on large strings. Either remove it, or change it to actually normalize ordered-list spacing (similar to the quote/bullet replacements) without re-copying the string unnecessarily.

Suggested change
text = text.gsub(/^\d+\.\s*+(.*)$/m) { |m| m }

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants