Skip to content

Fix Windows lexer error on CRLF code blocks#2804

Open
hhugo wants to merge 1 commit into
mainfrom
fix-windows-crlf-codeblock
Open

Fix Windows lexer error on CRLF code blocks#2804
hhugo wants to merge 1 commit into
mainfrom
fix-windows-crlf-codeblock

Conversation

@hhugo
Copy link
Copy Markdown
Collaborator

@hhugo hhugo commented May 26, 2026

Summary

Fixes the Windows CI failure (e.g. run 25160035941) where {@ocaml[...]} blocks in .mld / .mli doc comments produce invalid code block: Ocamlformat_parser_extended.Lexer.Error(_, _) on CRLF-checked-out files.

In vendor/odoc-parser/odoc_parser.ml, handle_last_newline scans backward over trailing whitespace and stops on '\n', returning the substring up to that index. The symmetric handle_first_newline already eats a leading '\r' before matching '\n', but handle_last_newline did not — so the '\r' immediately preceding the final '\n' survived. deindent then splits on '\n' and rejoins with '\n', which leaves intermediate \rs followed by \n (tolerated by the lexer's \r*\n rule) but a bare '\r' on the last line, which falls through to Illegal_character.

The fix mirrors handle_first_newline: when handle_last_newline hits '\n', also drop a preceding '\r' so the full CRLF is consumed.

Test plan

  • Local repro on Linux: convert test/passing/tests/doc_repl.mld to CRLF and run ocamlformat --parse-toplevel-phrases — reproduces the failure before the patch, succeeds after.
  • dune build @runtest clean on Linux (no regressions on LF input).
  • Windows CI green.

@hhugo hhugo force-pushed the fix-windows-crlf-codeblock branch from 8af14ea to 3c7630c Compare May 26, 2026 09:50
handle_last_newline scans backward and matches '\n' immediately, leaving
the preceding '\r' (if any) on the stripped string. Because deindent then
splits on '\n' and rejoins with '\n', every '\r' on an intermediate line
ends up followed by a '\n' — but the '\r' on the last line is bare,
since String.concat puts no separator after the final element.

The OCaml lexer's newline regex is '\r*\n', so a bare trailing '\r'
falls through to the Illegal_character rule. This caused the Windows CI
to fail on any {@ocaml[...]} block in a CRLF-checked-out file.

Mirror handle_first_newline (which already eats a leading '\r' before
matching '\n'): when handle_last_newline finds '\n', also drop a
preceding '\r' so the full CRLF is consumed.
@hhugo hhugo force-pushed the fix-windows-crlf-codeblock branch from 3c7630c to 77b7b28 Compare May 26, 2026 10:05
@hhugo hhugo marked this pull request as draft May 26, 2026 10:08
@hhugo hhugo marked this pull request as ready for review May 26, 2026 15:10
Copy link
Copy Markdown
Collaborator

@Julow Julow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The build is failing in CI on linux with dune-pkg. Do you have an idea why ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants