Use possessive quantifiers in excerpt helper regexps to prevent backtracking#2738
Use possessive quantifiers in excerpt helper regexps to prevent backtracking#2738
Conversation
…racking Also cleans up the format_html helper.
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce the risk of Regexp timeouts when generating excerpts from very large card contents by adjusting the excerpt-normalization regexes, and it also refactors the HTML formatting helper to encapsulate scrubber memoization.
Tip
If you aren't ready for review, convert to a draft PR.
Click "Convert to draft" or run gh pr ready --undo.
Click "Ready for review" or run gh pr ready to reengage.
Changes:
- Update
format_excerptregex quantifiers to use possessive*+in an attempt to reduce backtracking on large inputs. - Refactor
format_htmlto call a privateauto_link_scrubberhelper for memoization instead of using an inline instance variable.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| app/helpers/html_helper.rb | Refactors scrubber memoization into a private helper method used by format_html. |
| app/helpers/excerpt_helper.rb | Adjusts excerpt-normalization regexes to reduce backtracking risk on very large strings. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| text = text.gsub(/^>\s*+(.*)$/m, '> \1') | ||
| text = text.gsub(/^\s*+[-+]\s*+(.*)$/m, '• \1') | ||
| text = text.gsub(/^\d+\.\s*+(.*)$/m) { |m| m } |
There was a problem hiding this comment.
In Ruby, the /m flag makes . match newlines, so (.*) here can still span large portions of the string and may still trigger heavy backtracking on very large inputs (which this PR is trying to avoid). Consider removing /m and matching per-line explicitly (e.g., using a negated character class like [^\n]*+ / [^\r\n]*+, or making the trailing quantifier possessive) so the whole pattern stays linear-time on multiline content.
| text = text.gsub(/^>\s*+(.*)$/m, '> \1') | |
| text = text.gsub(/^\s*+[-+]\s*+(.*)$/m, '• \1') | |
| text = text.gsub(/^\d+\.\s*+(.*)$/m) { |m| m } | |
| text = text.gsub(/^>\s*+([^\r\n]*+)$/, '> \1') | |
| text = text.gsub(/^\s*+[-+]\s*+([^\r\n]*+)$/, '• \1') | |
| text = text.gsub(/^\d+\.\s*+([^\r\n]*+)$/) { |m| m } |
| text = text.gsub(/^\d+\.\s*(.*)$/m) { |m| m } | ||
| text = text.gsub(/^>\s*+(.*)$/m, '> \1') | ||
| text = text.gsub(/^\s*+[-+]\s*+(.*)$/m, '• \1') | ||
| text = text.gsub(/^\d+\.\s*+(.*)$/m) { |m| m } |
There was a problem hiding this comment.
This gsub is a functional no-op (the block returns the full match unchanged), but it still forces an extra regex scan/allocation on large strings. Either remove it, or change it to actually normalize ordered-list spacing (similar to the quote/bullet replacements) without re-copying the string unnecessarily.
| text = text.gsub(/^\d+\.\s*+(.*)$/m) { |m| m } |
Hopefully fixes Regexp timeouts for very large cards. See https://3.basecamp.com/2914079/buckets/27/card_tables/cards/9700847055
Also cleans up the format_html helper.