`detect_error` dedup loses specific Q-codes when GLR has forked earlier in the parse

It is possible to create circumstances where the tree sitter parser detects multiple errors at the same location - claude believes that this is occuring due to the GLR engine forking due to earlier errors (but I have not verified this).

The issue then becomes the errors with the same  `(row, column)` are deduplicated in a way that neither `state` nor `sym` is taken into account. As a result, when the surviving versions disagree on lookahead, the dedup picks one essentially arbitrarily and this can result in a generic "Parse error" rather than a more specific Q-code from the corpus. 

I am not sure if it is possible to construct a situation where multiple errors with distinct Q-codes can be generated and how that should be handled.


## Example

```
# Heading {key=val .class}

Second *bad paragraph.
```

```bash
printf  "%s\n" "# Heading {key=val .class}" "" "Second *bad paragraph." \
  | pampa --no-prune-errors -t native
```

```
Error: [Q-2-3] Key-value Pair Before Class Specifier in Attribute
   ╭─[ <stdin>:1:20 ]
   │
 1 │ # Heading {key=val .class}
   │            ───┬─── ───┬──
   │               ╰──────────── This key-value pair cannot appear before the class specifier.
   │                       │
   │                       ╰──── This class specifier appears after the key-value pair.
───╯

Error: Parse error
   ╭─[ <stdin>:3:23 ]
   │
 3 │ Second *bad paragraph.
   │                       ┬
   │                       ╰── unexpected character or token here
───╯
```

## Control

```
First good paragraph.

Second *bad paragraph.
```

```bash
printf  "%s\n" "First good paragraph." "" "Second *bad paragraph." \
  | pampa --no-prune-errors -t native
```

```
Error: [Q-2-12] Unclosed Star Emphasis
   ╭─[ <stdin>:3:23 ]
   │
 3 │ Second *bad paragraph.
   │ ───┬──                ┬
   │    ╰───────────────────── This is the opening '*' mark.
   │                       │
   │                       ╰── I reached the end of the block before finding a closing '*' for the emphasis.
───╯
```

## Claude's assessment on what is happening under the hood

Running pampa with `-v` on the reproduction input shows that after the attribute-list error in paragraph 1, GLR forks:

```
detect_error lookahead:attribute_class
recover_to_previous state:2590, depth:2
process version:0, version_count:2, state:0, row:0, col:25
process version:1, version_count:2, state:2590, row:0, col:19
```

Both versions continue parsing forward through the blank line and into paragraph 3, and both reach the unclosed `*` at the same source position. Each emits its own `detect_error` event:

```
process version:0, version_count:2, state:1922, row:2, col:22
detect_error lookahead:_commonmark_single_quote_string_token2

process version:1, version_count:2, state:1922, row:2, col:22
detect_error lookahead:_close_block
```

Both events have `(state=1922, row=2, col=22)` but different lookahead symbols:

- version 0: `(state=1922, sym=_commonmark_single_quote_string_token2)` — no entry in `crates/pampa/resources/error-corpus/_autogen-table.json`, so Merr falls through to the generic "Parse error" fallback at `error_generation.rs:243-249`.
- version 1: `(state=1922, sym=_close_block)` — matches the `Q-2-12` corpus entry at `_autogen-table.json` line ~2236 and would produce the specific diagnostic seen in the control case.

The collector in `produce_diagnostic_messages` (`error_generation.rs:39-62`) is:

```rust
let mut seen_errors: HashSet<(usize, usize)> = HashSet::new();
for parse in &tree_sitter_log.parses {
    for process_log in parse.processes.values() {
        for state in process_log.error_states.iter() {
            if seen_errors.contains(&(state.row, state.column)) {
                continue;
            }
            seen_errors.insert((state.row, state.column));
            ...
```

The dedup key is just `(row, column)`. Whichever GLR version's `detect_error` event is visited first in the HashMap iteration claims that source position; the other version's event is silently dropped before it ever reaches `lookup_error_entry`.

`parse.processes` is a `HashMap<usize, TreeSitterProcessLog>` keyed on GLR version number, iterated via `.values()`. Rust's standard `HashMap` randomizes iteration order across processes, but with only two integer keys (`0` and `1`) on this build the order is effectively pinned in practice — running the reproduction ten times in a row produces the generic "Parse error" output every time, indicating version 0 (the non-corpus-matching version) consistently wins the dedup.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`detect_error` dedup loses specific Q-codes when GLR has forked earlier in the parse #230

Example

Control

Claude's assessment on what is happening under the hood

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

detect_error dedup loses specific Q-codes when GLR has forked earlier in the parse #230

Description

Example

Control

Claude's assessment on what is happening under the hood

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`detect_error` dedup loses specific Q-codes when GLR has forked earlier in the parse #230