Skip to content

Limiting parsed numeric entities #4

@Ygg01

Description

@Ygg01

Having a look at implementation of XML5 in JavaScript and this section of HTML5 spec, I added the current limit to character tokenization to just:

  • Character that are lesser than 0x10FFFF
  • Excluding characters in range: 0xD800 to 0xDFFF

But should we expand the list of restricted characters to full one used by HTML5?

 Otherwise, return a character token for the Unicode character whose code point is that number.   
 Additionally, if the number is in the range 0x0001 to 0x0008, 0x000D to 0x001F, 0x007F to 
 0x009F, 0xFDD0 to 0xFDEF, or is one of 0x000B, 0xFFFE, 0xFFFF, 0x1FFFE, 0x1FFFF, 
 0x2FFFE, 0x2FFFF, 0x3FFFE, 0x3FFFF, 0x4FFFE, 0x4FFFF, 0x5FFFE, 0x5FFFF, 0x6FFFE,   
 0x6FFFF, 0x7FFFE, 0x7FFFF, 0x8FFFE, 0x8FFFF, 0x9FFFE, 0x9FFFF, 0xAFFFE, 0xAFFFF,  
 0xBFFFE, 0xBFFFF, 0xCFFFE, 0xCFFFF, 0xDFFFE, 0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE,  
 0xFFFFF, 0x10FFFE, or 0x10FFFF, then this is a parse error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions