Skip to content

Conversation

@tomas-zijdemans
Copy link
Contributor

@tomas-zijdemans tomas-zijdemans commented Jan 7, 2026

New XML parsing and serialization module

What @std/xml has:

  • Streaming parser, DOM-style parser, serialization
  • Browser compatible, position tracking, spec-compliant

What @std/xml doesn't have:

  • Namespace resolution, DTD/Schema validation, HTML entities
  • Custom entities, XPath/selectors, object-to-XML builder

Benchmark Results

Performance work never really ends, and you often find yourself comparing apples and oranges. Anyway. Here goes.

The challengers

Library XML Spec compliant? Streaming XML parsing? Error position tracking?
SAX No Yes Yes
saxes Yes Yes Yes
fast-xml-parser No No Yes
txml No Yes No
xml2js No No Partial
htmlparser2 No Yes Partial
deno std Yes Yes Yes (configurable)

Error position tracking is nice for debugging, but really hurts performance. So I made it an option that defaults to true for non-streaming and false for streaming (streaming is usually for trusted data sources. Multi-GB feeds or logs where throughput is critical). The results below contain both with and without error position tracking.

Test data

I used the test files located in testdata for non-streaming. I used one 597MB file for the streaming benchmark (google product data), but didn't check that into testdata. Other payloads may give different results.

Small Files (<10KB) — Median Results

Parser Time (ms) vs Deno std
Deno std (no pos) 0.009 1.2x faster
txml 0.010 1.1x faster
Deno std (+pos) 0.011 baseline
saxes 0.016 1.5x slower
htmlparser2 0.022 2.0x slower
SAX 0.027 2.5x slower
fast-xml-parser 0.038 3.5x slower
xml2js 0.048 4.4x slower

1 Large File (301KB) — Median Results

Parser Time (ms) vs Deno std
Deno std (no pos) 1.90 1.0x same
Deno std (+pos) 1.90 baseline
txml 1.98 1.04x slower
saxes 2.36 1.2x slower
htmlparser2 4.17 2.2x slower
SAX 7.60 4.0x slower
fast-xml-parser 11.16 5.9x slower
xml2js 14.10 7.4x slower

Streaming (a 597MB file) — Median Results

Parser Time (s) vs Deno std
Deno std (no pos) 4.25 baseline
Deno std (+pos) 4.68 1.1x slower
saxes 5.46 1.3x slower
htmlparser2 6.66 1.6x slower
SAX 16.74 3.9x slower

@tomas-zijdemans tomas-zijdemans requested a review from kt3k as a code owner January 7, 2026 22:09
@crowlKats
Copy link
Member

could we get some benchmarks comparing to other parsers?

@tomas-zijdemans
Copy link
Contributor Author

could we get some benchmarks comparing to other parsers?

Yes, that's a good idea. I'll look into it.

@codecov
Copy link

codecov bot commented Jan 8, 2026

Codecov Report

❌ Patch coverage is 96.43403% with 70 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.21%. Comparing base (062aa94) to head (cfe999d).

Files with missing lines Patch % Lines
xml/_tokenizer.ts 95.29% 54 Missing and 3 partials ⚠️
xml/_parse_sync.ts 96.02% 10 Missing and 2 partials ⚠️
xml/parse_stream.ts 96.77% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6942      +/-   ##
==========================================
+ Coverage   94.08%   94.21%   +0.12%     
==========================================
  Files         600      610      +10     
  Lines       43553    45516    +1963     
  Branches     6997     7466     +469     
==========================================
+ Hits        40977    42881    +1904     
- Misses       2521     2574      +53     
- Partials       55       61       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@tomas-zijdemans
Copy link
Contributor Author

could we get some benchmarks comparing to other parsers?

Updated the description now. Let me know if you would like to benchmark against a specific package

@tomas-zijdemans tomas-zijdemans force-pushed the xml branch 7 times, most recently from e20f201 to 9b1fe20 Compare January 8, 2026 21:07
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I have no idea why this formatting is happening 😅

@timreichen
Copy link
Contributor

Ref: denoland/deno#24995
There was no reply if DOMParser or something similar was to be implemented in deno, so I like this PR in general.
However, it might be worth to check with the deno core team what their current stance is on this before merging anything.

@tomas-zijdemans
Copy link
Contributor Author

Ref: denoland/deno#24995 There was no reply if DOMParser or something similar was to be implemented in deno, so I like this PR in general. However, it might be worth to check with the deno core team what their current stance is on this before merging anything.

Thanks, I was not aware of this discussion. Perhaps we could have it as an unstable module for now? Then we can always kick it out, should the core team decide to implement DOMParser

@tomas-zijdemans
Copy link
Contributor Author

Updated again to increase streaming performance and get test coverage to 100%

@tomas-zijdemans
Copy link
Contributor Author

More perf work. Will look into using callbacks instead of arrays of objects

@tomas-zijdemans
Copy link
Contributor Author

More perf gains on non-streaming parsing. Updated benchmark results

@crowlKats
Copy link
Member

@tomas-zijdemans the perf looks really good! what machine are you running this on for reference?

@tomas-zijdemans
Copy link
Contributor Author

@crowlKats: A MacBook with M1 Max

@crowlKats
Copy link
Member

could we add https://www.w3.org/XML/Test/ ?

@tomas-zijdemans
Copy link
Contributor Author

could we add https://www.w3.org/XML/Test/ ?

Hmm, yes that would be good. It says it contains over 2000 files though, not small! 😅

@tomas-zijdemans
Copy link
Contributor Author

tomas-zijdemans commented Jan 25, 2026

could we add https://www.w3.org/XML/Test/ ?

I've looked into it, the test suite has about 2250 tests (several MB), so it's not practical to include in std. We could add a script that download it and creates a conformance report.

The parser is intentionally lenient, per now it handles the valid tests (for XML 1.0), but does not handle the not-well-formed ones very well (these are mostly already documented in the PR).

It seems like the usual way of dealing with the more "theoretical" edge cases is to add a strict mode that enforces XML conformance . WDYT, is that worth it? Per now, this initial PR is still open - so I don't know if there's appetite to include it in std yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants