Fix infinite entity expansion by detecting circular references#312
Open
naitoh wants to merge 3 commits intoruby:masterfrom
Open
Fix infinite entity expansion by detecting circular references#312naitoh wants to merge 3 commits intoruby:masterfrom
naitoh wants to merge 3 commits intoruby:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds circular entity-reference detection to prevent unbounded recursive expansion (previously resulting in SystemStackError) during streaming-style parsing.
Changes:
- Track the current entity-expansion chain in
REXML::Parsers::BaseParserand raise a clear error when a loop is detected. - Extend
BaseParser#unnormalize/#entityto thread the expansion state through nested expansions. - Add regression tests covering direct/indirect circular references (including a long entity value) via
StreamParser.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
lib/rexml/parsers/baseparser.rb |
Adds loop detection during entity expansion by propagating an “currently expanding” set through recursive unnormalization. |
test/test_entity.rb |
Adds regression tests ensuring circular entity references raise an appropriate error instead of stack overflow. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Member
|
Is this check heavy? |
…ML::Parsers::BaseParser
## Why?
Fix an issue where a `stack level too deep (SystemStackError)` error caused by a circular reference was not detected as an appropriate error.
```
require 'rexml/parsers/streamparser'
require 'rexml/streamlistener'
source = <<~XML
<!DOCTYPE root [
<!ENTITY x "&x;">
]>
<root>&x;</root>
XML
listener = Class.new { include REXML::StreamListener }.new
REXML::Parsers::StreamParser.new(source, listener).parse
```
- before
```
lib/rexml/parsers/baseparser.rb:544:in 'REXML::Parsers::BaseParser#entity': stack level too deep (SystemStackError)
```
- after
```
lib/rexml/parsers/baseparser.rb:545:in 'REXML::Parsers::BaseParser#entity': Detected an entity reference loop: x (RuntimeError)
```
…ML::Document ## Why? Fix an issue where a `stack level too deep (SystemStackError)` error caused by a circular reference was not detected as an appropriate error. ``` require 'rexml' source = <<~XML <!DOCTYPE root [ <!ENTITY x "&x;"> ]> <root>&x;</root> XML REXML::Document.new(source).root.text ``` - before ``` lib/rexml/text.rb:390:in 'String#gsub': stack level too deep (SystemStackError) ``` - after ``` lib/rexml/entity.rb:80:in 'REXML::Entity#unnormalized': Detected an entity reference loop: x (RuntimeError) ```
## Benchmark
```
$ benchmark-driver benchmark/parse_doctype.yaml
Calculating -------------------------------------
before after before(YJIT) after(YJIT)
single_entity(dom) 22.381k 24.492k 5.098k 5.211k i/s - 100.000 times in 0.004468s 0.004083s 0.019615s 0.019189s
single_entity(stream) 46.882k 48.054k 8.796k 8.691k i/s - 100.000 times in 0.002133s 0.002081s 0.011369s 0.011506s
chained_entities(dom) 2.665k 2.531k 2.349k 2.340k i/s - 100.000 times in 0.037520s 0.039509s 0.042574s 0.042726s
chained_entities(stream) 2.455k 2.387k 2.351k 2.290k i/s - 100.000 times in 0.040732s 0.041888s 0.042538s 0.043660s
many_entities(dom) 675.927 679.094 684.308 674.686 i/s - 100.000 times in 0.147945s 0.147255s 0.146133s 0.148217s
many_entities(stream) 2.602k 2.501k 2.403k 2.276k i/s - 100.000 times in 0.038427s 0.039986s 0.041607s 0.043944s
repeated_entity(dom) 419.090 414.295 445.286 444.575 i/s - 100.000 times in 0.238612s 0.241374s 0.224575s 0.224934s
repeated_entity(stream) 1.449k 1.401k 1.590k 1.582k i/s - 100.000 times in 0.069014s 0.071358s 0.062898s 0.063201s
Comparison:
single_entity(dom)
after: 24491.8 i/s
before: 22381.4 i/s - 1.09x slower
after(YJIT): 5211.3 i/s - 4.70x slower
before(YJIT): 5098.1 i/s - 4.80x slower
single_entity(stream)
after: 48053.8 i/s
before: 46882.3 i/s - 1.02x slower
before(YJIT): 8795.8 i/s - 5.46x slower
after(YJIT): 8691.1 i/s - 5.53x slower
chained_entities(dom)
before: 2665.2 i/s
after: 2531.1 i/s - 1.05x slower
before(YJIT): 2348.9 i/s - 1.13x slower
after(YJIT): 2340.5 i/s - 1.14x slower
chained_entities(stream)
before: 2455.1 i/s
after: 2387.3 i/s - 1.03x slower
before(YJIT): 2350.8 i/s - 1.04x slower
after(YJIT): 2290.4 i/s - 1.07x slower
many_entities(dom)
before(YJIT): 684.3 i/s
after: 679.1 i/s - 1.01x slower
before: 675.9 i/s - 1.01x slower
after(YJIT): 674.7 i/s - 1.01x slower
many_entities(stream)
before: 2602.3 i/s
after: 2500.9 i/s - 1.04x slower
before(YJIT): 2403.4 i/s - 1.08x slower
after(YJIT): 2275.6 i/s - 1.14x slower
repeated_entity(dom)
before(YJIT): 445.3 i/s
after(YJIT): 444.6 i/s - 1.00x slower
before: 419.1 i/s - 1.06x slower
after: 414.3 i/s - 1.07x slower
repeated_entity(stream)
before(YJIT): 1589.9 i/s
after(YJIT): 1582.3 i/s - 1.00x slower
before: 1449.0 i/s - 1.10x slower
after: 1401.4 i/s - 1.13x slower
```
- YJIT=ON : 0.95 - 1.09x faster
- YJIT=OFF : 0.94 - 1.02x faster
19dd4a9 to
26545fc
Compare
Contributor
Author
I fixed an issue where unnecessary memory copies were occurring. I believe this falls within an acceptable range in terms of performance. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why?
Fix an issue where a
stack level too deep (SystemStackError)error caused by a circular reference was not detected as an appropriate error.REXML::Parsers::StreamParser
REXML::Document
Benchmark