bugfix: multi-line block quote inside list item (bd-vet6) by cscheid · Pull Request #176 · quarto-dev/q2

cscheid · 2026-05-11T16:31:04Z

Summary

The tree-sitter qmd parser failed on a Pandoc-valid CommonMark construct: a list item containing a block quote whose paragraph spans multiple lines using the > continuation marker (e.g. - > a\n > b\n). Pandoc parses this as BulletList[[BlockQuote[Para[Str a, SoftBreak, Str b]]]]; we returned a parse error at line 2 col 6. The bug fires for every list marker (-, *, +, 1., 1), etc.), the 3+-line case, and any variant with content following the block quote — the original 2-line variant only escapes because the scanner's EOF path runs before the buggy code (pampa hides this by auto-appending a \n, so users always hit it).

Root cause is in crates/tree-sitter-qmd/tree-sitter-markdown/src/scanner.c. After a SOFT_LINE_ENDING, the scanner sets STATE_MATCHING | STATE_WAS_SOFT_LINE_BREAK. When the parser then asks for the next token at the trailing \n of the second block-quote-marked line, the STATE_MATCHING block at line 2040 calls match_line, which routes LIST_ITEM through its case 2 "blank-line continuation" branch — that branch advances past the \n. By the time control reaches the line-ending gate at line 2233 (which checks lookahead == '\n'), the newline is gone, so the gate skips. The scan returns false, tree-sitter retries with a different lex-state and gets _close_block which has no shift at the current parse state, and the parse errors out. The BLOCK_QUOTE match has no analogous \n-consuming branch, which is why nested block quotes (> > a\n> > b\n) work fine.

The fix bypasses the STATE_MATCHING block when STATE_WAS_SOFT_LINE_BREAK is set AND lookahead is \n/\r — the soft-line-break already accounted for the continuation prefix, so re-running match_line against the trailing newline of the same logical line is wrong; the line-ending gate handles it cleanly. Full investigation, neighborhood characterization, fix proposal, and end-to-end verification are in claude-notes/plans/2026-05-11-bq-multiline-in-list-item.md. The branch is structured as five commits, one per phase (failing tests → characterization → fix → e2e verification → close-out), so the reviewer can step through the work in order.

Test plan

tree-sitter test: 476/476 (6 new corpus tests for marker variants + 3-line case)
cargo nextest run -p pampa: 3685/3685 (7 new pandoc-match fixtures, all match Pandoc's native AST)
cargo nextest run --workspace: 8804/8804
cargo xtask verify (full, 9/9 steps including hub-client WASM build + tests): passed
Manual end-to-end on the original reporter file and all marker / multi-line / follow-up content variants; pampa output matches pandoc -t native byte-for-structure
Regression sanity checks: bq-in-bq, blank-line-separated paragraphs in list items, nested lists, and lazy continuation all still produce the same trees as before
No .snap files touched

Closes bd-vet6.

Plan: claude-notes/plans/2026-05-11-bq-multiline-in-list-item.md

Adds 6 failing tree-sitter corpus tests (24-29) and 7 pampa pandoc-match fixtures that exercise the multi-line block quote inside a list item bug. Tests fail because of the LIST_ITEM match() newline branch consuming the \n that the line-ending gate needs (see plan doc for full root cause). Refined bug scope captured in plan: the bug requires a trailing \n after the second blockquote-marked line. Pampa auto-appends one (Q-7-1), which is why users hit it.

Maps every STATE_MATCHING / STATE_WAS_SOFT_LINE_BREAK / match_line site, identifies the invariant the bug violates, surveys existing corpus tests that depend on LIST_ITEM case 2, and writes down the exact proposed guard before implementing.

Skip the STATE_MATCHING block in scanner.c when STATE_WAS_SOFT_LINE_BREAK is set AND lookahead is \n or \r. The LIST_ITEM match() case 2 was advancing past the trailing newline, leaving the line-ending gate at line 2233 with nothing to match against. The line-ending gate (or the EOF handler above) handles this case cleanly when not bypassed by match_line first. Results: - tree-sitter test: 476/476 pass (6 new corpus tests for marker variants and 3-line case) - cargo nextest -p pampa: 3685/3685 pass - cargo nextest --workspace: 8804/8804 pass No regressions in any existing tests.

Pampa output matches pandoc native AST for the original reporter file and every marker variant + multi-line + follow-up content case. Regression sanity checks confirm bq-in-bq, blank-line list paragraphs, nested lists, and lazy continuation all still produce the same trees as before.

Full cargo xtask verify passed (9/9 steps, including hub-client and trace-viewer). No snapshot files changed. Branch ready for review.

cscheid added 6 commits May 11, 2026 10:53

sync beads: bd-vet6 filed (multi-line bq inside list item)

d4c9b4a

Plan: claude-notes/plans/2026-05-11-bq-multiline-in-list-item.md

bd-vet6 phase 5: close-out (verify + plan updates)

90a5507

Full cargo xtask verify passed (9/9 steps, including hub-client and trace-viewer). No snapshot files changed. Branch ready for review.

cscheid mentioned this pull request May 11, 2026

ci: pin rust-toolchain.toml to nightly-2026-04-28 (bd-at72) #177

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfix: multi-line block quote inside list item (bd-vet6)#176

bugfix: multi-line block quote inside list item (bd-vet6)#176
cscheid wants to merge 6 commits into
mainfrom
bugfix/multiline-list-item

cscheid commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cscheid commented May 11, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant