It is possible to create circumstances where the tree sitter parser detects multiple errors at the same location - claude believes that this is occuring due to the GLR engine forking due to earlier errors (but I have not verified this).
The issue then becomes the errors with the same (row, column) are deduplicated in a way that neither state nor sym is taken into account. As a result, when the surviving versions disagree on lookahead, the dedup picks one essentially arbitrarily and this can result in a generic "Parse error" rather than a more specific Q-code from the corpus.
I am not sure if it is possible to construct a situation where multiple errors with distinct Q-codes can be generated and how that should be handled.
Example
# Heading {key=val .class}
Second *bad paragraph.
printf "%s\n" "# Heading {key=val .class}" "" "Second *bad paragraph." \
| pampa --no-prune-errors -t native
Error: [Q-2-3] Key-value Pair Before Class Specifier in Attribute
╭─[ <stdin>:1:20 ]
│
1 │ # Heading {key=val .class}
│ ───┬─── ───┬──
│ ╰──────────── This key-value pair cannot appear before the class specifier.
│ │
│ ╰──── This class specifier appears after the key-value pair.
───╯
Error: Parse error
╭─[ <stdin>:3:23 ]
│
3 │ Second *bad paragraph.
│ ┬
│ ╰── unexpected character or token here
───╯
Control
First good paragraph.
Second *bad paragraph.
printf "%s\n" "First good paragraph." "" "Second *bad paragraph." \
| pampa --no-prune-errors -t native
Error: [Q-2-12] Unclosed Star Emphasis
╭─[ <stdin>:3:23 ]
│
3 │ Second *bad paragraph.
│ ───┬── ┬
│ ╰───────────────────── This is the opening '*' mark.
│ │
│ ╰── I reached the end of the block before finding a closing '*' for the emphasis.
───╯
Claude's assessment on what is happening under the hood
Running pampa with -v on the reproduction input shows that after the attribute-list error in paragraph 1, GLR forks:
detect_error lookahead:attribute_class
recover_to_previous state:2590, depth:2
process version:0, version_count:2, state:0, row:0, col:25
process version:1, version_count:2, state:2590, row:0, col:19
Both versions continue parsing forward through the blank line and into paragraph 3, and both reach the unclosed * at the same source position. Each emits its own detect_error event:
process version:0, version_count:2, state:1922, row:2, col:22
detect_error lookahead:_commonmark_single_quote_string_token2
process version:1, version_count:2, state:1922, row:2, col:22
detect_error lookahead:_close_block
Both events have (state=1922, row=2, col=22) but different lookahead symbols:
- version 0:
(state=1922, sym=_commonmark_single_quote_string_token2) — no entry in crates/pampa/resources/error-corpus/_autogen-table.json, so Merr falls through to the generic "Parse error" fallback at error_generation.rs:243-249.
- version 1:
(state=1922, sym=_close_block) — matches the Q-2-12 corpus entry at _autogen-table.json line ~2236 and would produce the specific diagnostic seen in the control case.
The collector in produce_diagnostic_messages (error_generation.rs:39-62) is:
let mut seen_errors: HashSet<(usize, usize)> = HashSet::new();
for parse in &tree_sitter_log.parses {
for process_log in parse.processes.values() {
for state in process_log.error_states.iter() {
if seen_errors.contains(&(state.row, state.column)) {
continue;
}
seen_errors.insert((state.row, state.column));
...
The dedup key is just (row, column). Whichever GLR version's detect_error event is visited first in the HashMap iteration claims that source position; the other version's event is silently dropped before it ever reaches lookup_error_entry.
parse.processes is a HashMap<usize, TreeSitterProcessLog> keyed on GLR version number, iterated via .values(). Rust's standard HashMap randomizes iteration order across processes, but with only two integer keys (0 and 1) on this build the order is effectively pinned in practice — running the reproduction ten times in a row produces the generic "Parse error" output every time, indicating version 0 (the non-corpus-matching version) consistently wins the dedup.
It is possible to create circumstances where the tree sitter parser detects multiple errors at the same location - claude believes that this is occuring due to the GLR engine forking due to earlier errors (but I have not verified this).
The issue then becomes the errors with the same
(row, column)are deduplicated in a way that neitherstatenorsymis taken into account. As a result, when the surviving versions disagree on lookahead, the dedup picks one essentially arbitrarily and this can result in a generic "Parse error" rather than a more specific Q-code from the corpus.I am not sure if it is possible to construct a situation where multiple errors with distinct Q-codes can be generated and how that should be handled.
Example
Control
Claude's assessment on what is happening under the hood
Running pampa with
-von the reproduction input shows that after the attribute-list error in paragraph 1, GLR forks:Both versions continue parsing forward through the blank line and into paragraph 3, and both reach the unclosed
*at the same source position. Each emits its owndetect_errorevent:Both events have
(state=1922, row=2, col=22)but different lookahead symbols:(state=1922, sym=_commonmark_single_quote_string_token2)— no entry incrates/pampa/resources/error-corpus/_autogen-table.json, so Merr falls through to the generic "Parse error" fallback aterror_generation.rs:243-249.(state=1922, sym=_close_block)— matches theQ-2-12corpus entry at_autogen-table.jsonline ~2236 and would produce the specific diagnostic seen in the control case.The collector in
produce_diagnostic_messages(error_generation.rs:39-62) is:The dedup key is just
(row, column). Whichever GLR version'sdetect_errorevent is visited first in the HashMap iteration claims that source position; the other version's event is silently dropped before it ever reacheslookup_error_entry.parse.processesis aHashMap<usize, TreeSitterProcessLog>keyed on GLR version number, iterated via.values(). Rust's standardHashMaprandomizes iteration order across processes, but with only two integer keys (0and1) on this build the order is effectively pinned in practice — running the reproduction ten times in a row produces the generic "Parse error" output every time, indicating version 0 (the non-corpus-matching version) consistently wins the dedup.