feat(xml/unstable): add XML parsing and serialization module #6942

tomas-zijdemans · 2026-01-07T22:09:13Z

New XML parsing and serialization module

What @std/xml has:

Streaming parser, DOM-style parser, serialization
Browser compatible, position tracking, spec-compliant

What @std/xml doesn't have:

Namespace resolution, DTD/Schema validation, HTML entities
Custom entities, XPath/selectors, object-to-XML builder

Benchmark Results

Performance work never really ends, and you often find yourself comparing apples and oranges. Anyway. Here goes.

The challengers

Library	XML Spec compliant?	Streaming XML parsing?	Error position tracking?
SAX	No	Yes	Yes
saxes	Yes	Yes	Yes
fast-xml-parser	No	No	Yes
txml	No	Yes	No
xml2js	No	No	Partial
htmlparser2	No	Yes	Partial
deno std	Yes	Yes	Yes (configurable)

Error position tracking is nice for debugging, but really hurts performance. So I made it an option that defaults to true for non-streaming and false for streaming (streaming is usually for trusted data sources. Multi-GB feeds or logs where throughput is critical). The results below contain both with and without error position tracking.

Test data

I used the test files located in testdata for non-streaming. I used one 597MB file for the streaming benchmark (google product data), but didn't check that into testdata. Other payloads may give different results.

Small Files (<10KB) — Median Results

Parser	Time (ms)	vs Deno std
Deno std (no pos)	0.009	1.2x faster
txml	0.010	1.1x faster
Deno std (+pos)	0.011	baseline
saxes	0.016	1.5x slower
htmlparser2	0.022	2.0x slower
SAX	0.027	2.5x slower
fast-xml-parser	0.038	3.5x slower
xml2js	0.048	4.4x slower

1 Large File (301KB) — Median Results

Parser	Time (ms)	vs Deno std
Deno std (no pos)	1.90	1.0x same
Deno std (+pos)	1.90	baseline
txml	1.98	1.04x slower
saxes	2.36	1.2x slower
htmlparser2	4.17	2.2x slower
SAX	7.60	4.0x slower
fast-xml-parser	11.16	5.9x slower
xml2js	14.10	7.4x slower

Streaming (a 597MB file) — Median Results

Parser	Time (s)	vs Deno std
Deno std (no pos)	4.25	baseline
Deno std (+pos)	4.68	1.1x slower
saxes	5.46	1.3x slower
htmlparser2	6.66	1.6x slower
SAX	16.74	3.9x slower

crowlKats · 2026-01-08T01:48:18Z

could we get some benchmarks comparing to other parsers?

tomas-zijdemans · 2026-01-08T05:54:15Z

could we get some benchmarks comparing to other parsers?

Yes, that's a good idea. I'll look into it.

codecov · 2026-01-08T06:11:01Z

Codecov Report

❌ Patch coverage is 96.43403% with 70 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.21%. Comparing base (062aa94) to head (cfe999d).

Files with missing lines	Patch %	Lines
xml/_tokenizer.ts	95.29%	54 Missing and 3 partials ⚠️
xml/_parse_sync.ts	96.02%	10 Missing and 2 partials ⚠️
xml/parse_stream.ts	96.77%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6942      +/-   ##
==========================================
+ Coverage   94.08%   94.21%   +0.12%     
==========================================
  Files         600      610      +10     
  Lines       43553    45516    +1963     
  Branches     6997     7466     +469     
==========================================
+ Hits        40977    42881    +1904     
- Misses       2521     2574      +53     
- Partials       55       61       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

tomas-zijdemans · 2026-01-08T20:31:42Z

could we get some benchmarks comparing to other parsers?

Updated the description now. Let me know if you would like to benchmark against a specific package

tomas-zijdemans · 2026-01-08T21:15:17Z

import_map.json

Sorry, I have no idea why this formatting is happening 😅

timreichen · 2026-01-08T23:23:32Z

Ref: denoland/deno#24995
There was no reply if DOMParser or something similar was to be implemented in deno, so I like this PR in general.
However, it might be worth to check with the deno core team what their current stance is on this before merging anything.

tomas-zijdemans · 2026-01-09T09:05:16Z

Ref: denoland/deno#24995 There was no reply if DOMParser or something similar was to be implemented in deno, so I like this PR in general. However, it might be worth to check with the deno core team what their current stance is on this before merging anything.

Thanks, I was not aware of this discussion. Perhaps we could have it as an unstable module for now? Then we can always kick it out, should the core team decide to implement DOMParser

tomas-zijdemans · 2026-01-09T12:34:20Z

Updated again to increase streaming performance and get test coverage to 100%

tomas-zijdemans · 2026-01-14T22:23:48Z

More perf work. Will look into using callbacks instead of arrays of objects

…d serialization

tomas-zijdemans · 2026-01-24T21:17:03Z

More perf gains on non-streaming parsing. Updated benchmark results

crowlKats · 2026-01-24T21:20:51Z

@tomas-zijdemans the perf looks really good! what machine are you running this on for reference?

tomas-zijdemans · 2026-01-24T22:35:28Z

@crowlKats: A MacBook with M1 Max

crowlKats · 2026-01-24T23:22:57Z

could we add https://www.w3.org/XML/Test/ ?

tomas-zijdemans · 2026-01-25T07:22:45Z

could we add https://www.w3.org/XML/Test/ ?

Hmm, yes that would be good. It says it contains over 2000 files though, not small! 😅

tomas-zijdemans · 2026-01-25T10:42:24Z

could we add https://www.w3.org/XML/Test/ ?

I've looked into it, the test suite has about 2250 tests (several MB), so it's not practical to include in std. We could add a script that download it and creates a conformance report.

The parser is intentionally lenient, per now it handles the valid tests (for XML 1.0), but does not handle the not-well-formed ones very well (these are mostly already documented in the PR).

It seems like the usual way of dealing with the more "theoretical" edge cases is to add a strict mode that enforces XML conformance . WDYT, is that worth it? Per now, this initial PR is still open - so I don't know if there's appetite to include it in std yet.

tomas-zijdemans requested a review from kt3k as a code owner January 7, 2026 22:09

tomas-zijdemans force-pushed the xml branch from f99e4c1 to bda3483 Compare January 8, 2026 05:59

tomas-zijdemans force-pushed the xml branch 7 times, most recently from e20f201 to 9b1fe20 Compare January 8, 2026 21:07

tomas-zijdemans commented Jan 8, 2026

View reviewed changes

import_map.json

Copy link

Contributor Author

tomas-zijdemans Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I have no idea why this formatting is happening 😅

tomas-zijdemans added 12 commits January 24, 2026 19:08

feat(xml): add XML module with streaming parser, DOM-style parser, an…

c83f480

…d serialization

perf(xml): native TransformStream for 20% faster streaming

dcab2ec

refactor(xml): remove deprecated async generator APIs, sync all tests

9bc5a7f

perf(xml): use switch statement for named entity decoding

083f4f5

perf(xml): replace object lookups with switch in entity encoding

e2fb06e

perf(xml): use charCodeAt for tokenizer hot path

1737b8e

perf(xml): switch DOM parser to character code comparisons

0076530

perf(xml): add fast path for attribute value normalization

83db5bb

refactor(xml): remove helper functions

e824fb3

perf(xml): optimize switch

c9a0d1a

perf(xml): cache hot variables

db4df1a

feat(xml/unstable): add error position tracking as an aption

aa7bb5c

tomas-zijdemans added 12 commits January 24, 2026 19:08

perf(xml): introduce basic dedicated capture methods

0af9927

perf(xml): optimize CDATA capture with indexOf batch scanning

8eb12e6

refactor(xml): handle comment and PI capture

6a218e9

perf(xml): XmlName Caching when streaming

f1a480f

perf(xml): pending Start Element Reuse

2338711

perf(xml): optimize name parsing, add XmlName.raw property

478614b

fix tests

461d88e

feat(xml): callback based streaming core

49519c2

feat(xml): direct streaming

f47ba61

feat(xml): use callbacks for parse

c5faf45

test coverage

02cdbf4

fix(xml): avoid double parseName

4d53b03

tomas-zijdemans force-pushed the xml branch from fdd09f0 to 4d53b03 Compare January 24, 2026 18:09

tomas-zijdemans added 2 commits January 24, 2026 19:43

perf increase on sync

b0299ab

reduce position tracking overhead

cd6a761

reduce verbosity

4d4461e

lookup table

cfe999d

feat(xml/unstable): add XML parsing and serialization module #6942

Are you sure you want to change the base?

feat(xml/unstable): add XML parsing and serialization module #6942

Conversation

tomas-zijdemans commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New XML parsing and serialization module

Benchmark Results

The challengers

Test data

Small Files (<10KB) — Median Results

1 Large File (301KB) — Median Results

Streaming (a 597MB file) — Median Results

Uh oh!

crowlKats commented Jan 8, 2026

Uh oh!

tomas-zijdemans commented Jan 8, 2026

Uh oh!

codecov bot commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tomas-zijdemans commented Jan 8, 2026

Uh oh!

tomas-zijdemans Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

timreichen commented Jan 8, 2026

Uh oh!

tomas-zijdemans commented Jan 9, 2026

Uh oh!

tomas-zijdemans commented Jan 9, 2026

Uh oh!

tomas-zijdemans commented Jan 14, 2026

Uh oh!

tomas-zijdemans commented Jan 24, 2026

Uh oh!

crowlKats commented Jan 24, 2026

Uh oh!

tomas-zijdemans commented Jan 24, 2026

Uh oh!

crowlKats commented Jan 24, 2026

Uh oh!

tomas-zijdemans commented Jan 25, 2026

Uh oh!

tomas-zijdemans commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tomas-zijdemans commented Jan 7, 2026 •

edited

Loading

codecov bot commented Jan 8, 2026 •

edited

Loading

tomas-zijdemans commented Jan 25, 2026 •

edited

Loading