Skip to content

fix: prevent GFM table tokens from greedily capturing trailing newlines#3926

Open
Puxhkar wants to merge 2 commits intomarkedjs:masterfrom
Puxhkar:fix/table-trailing-newlines
Open

fix: prevent GFM table tokens from greedily capturing trailing newlines#3926
Puxhkar wants to merge 2 commits intomarkedjs:masterfrom
Puxhkar:fix/table-trailing-newlines

Conversation

@Puxhkar
Copy link
Copy Markdown

@Puxhkar Puxhkar commented Mar 23, 2026

Summary

Following the fix for headings (#3513), I discovered that GFM table tokens also greedily capture multiple trailing newlines in their raw property. This PR brings the table tokenizer into consistency with other block-level elements.

Changes

  • Updated src/Tokenizer.ts: Applied rtrim(cap[0], '\n') to the raw field in the table() method.
  • Updated test/unit/marked.test.js: Adjusted the walkTokens assertion for tables to account for the newly separated space tokens.

Verification

  • Spec Compliance: All 1735 spec tests (CommonMark & GFM) passed.
  • Unit Tests: All existing unit tests passed after the structural update.
  • Consistency: Verified that table tokens no longer include multiple trailing blank lines, matching the behavior of hr and paragraph.

Fixes #3513 (Consistency follow-up)

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 23, 2026

@servis is attempting to deploy a commit to the MarkedJS Team on Vercel.

A member of the Team first needs to authorize it.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the tokenization process by ensuring that block-level elements, particularly GFM tables, do not include extraneous trailing newlines in their raw property. This change promotes consistency across the tokenizer, making the raw representation more precise and facilitating clearer separation of content from whitespace, which is now handled by dedicated space tokens. The update improves the accuracy of token representation and maintains compliance with CommonMark and GFM specifications.

Highlights

  • GFM Table Tokenization: Prevented GFM table tokens from greedily capturing multiple trailing newlines in their raw property, aligning their behavior with other block-level elements.
  • Tokenizer Consistency: Applied newline trimming to the raw property of heading, def, and table tokens in the tokenizer to ensure consistent handling of trailing newlines across different token types.
  • Test Updates: Adjusted unit tests for the lexer and walkTokens to reflect the new tokenization behavior, specifically the separation of trailing newlines into distinct space tokens.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The code changes refine the tokenization process by removing trailing newline characters from the raw property of block-level tokens, specifically headings, definitions, and tables, in src/Tokenizer.ts using rtrim. Corresponding unit tests in test/unit/Lexer.test.js and test/unit/marked.test.js have been updated to reflect this change, now explicitly including space tokens to represent the newlines that were previously implicitly part of the block tokens' raw content. This improves the precision of token representation by separating structural content from whitespace.

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 26, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marked-website Ready Ready Preview, Comment Mar 26, 2026 4:37am

Request Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Heading token with mutiple end-of-line characters does not tokenized into Space token

1 participant