Skip to content

Markup declarations in DOCTYPEs are parsed backwards‑incompatibly #9

@ExE-Boss

Description

@ExE-Boss

Right now, XML1.0 parses:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE greeting [
  <!ELEMENT greeting (#PCDATA)>
  <!ELEMENT other (#PCDATA)>
]>
<greeting>Hello, world!</greeting>

as

├ xml: version="1.0" encoding="UTF-8"
├ DOCTYPE: greeting
│ │ // Note that these are ignored by non‑validating parsers, eg. browsers:
│ ├ ELEMENT: greeting (#PCDATA)
│ └ ELEMENT: other (#PCDATA)
└ greeting
  └ #text: Hello, world!

whereas XML5 parses it as:

├ xml: version="1.0" encoding="UTF-8"
├ DOCTYPE: greeting
├ #comment: ELEMENT other (#PCDATA)
├ #text: ]>
└ greeting
  └ #text: Hello, world!

Since the XML5 parser seems to be intended to parse current XML while ignoring DTDs, this seems like it should parse as:

├ xml: version="1.0" encoding="UTF-8"
├ DOCTYPE: greeting
└ greeting
  └ #text: Hello, world!

(The <!ELEMENT greeting (#PCDATA)> and <!ELEMENT other (#PCDATA)> entries are ignored)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions