Skip to content

Conversation

@CybotTM
Copy link
Contributor

@CybotTM CybotTM commented Jan 21, 2026

Summary

Optimizes RST parsing with instance reuse and O(1) hash set lookups for hyperlink validation.

Changes

  • ExternalReferenceResolver: Add SUPPORTED_SCHEMAS_LIST and isSupportedScheme() for O(1) hash set lookup
  • InlineParser: Reuse InlineLexer instance instead of creating new one per parse
  • InlineLexer: Use ExternalReferenceResolver::isSupportedScheme() for URI scheme validation (~6x faster)
  • LineChecker: Cache compiled regex patterns
  • Buffer: Cache unindent calculations

Performance Impact

See Performance Analysis Report for detailed benchmarks.

The hash set optimization for URI schemes provides approximately 6x speedup compared to the previous regex-based approach for the 371 IANA-registered schemes.

Merge Note

Both this PR and #1287 add the same isSupportedScheme() method to ExternalReferenceResolver. When the second PR merges, the conflict is trivially resolved by keeping the existing code.


Related PRs

PR Description Status
#1287 Rendering caching layer Independent (trivial merge conflict on ExternalReferenceResolver)
#1288 This PR - RST parsing optimizations
#1289 CLI container caching Independent
#1291 Symfony 8 compatibility ✅ Merged
#1293 ProjectNode O(1) document lookup Independent

All PRs can be merged independently in any order.

Add caching optimizations for hot paths in RST parsing:

- InlineParser: reuse single InlineLexer instance instead of creating
  new one per parse call (lexer state fully reset via setInput())
- InlineLexer: cache expensive hyperlink pattern built from
  SUPPORTED_SCHEMAS (5600+ chars) as static variable
- LineChecker: add static caches for isDirective(), isLink(), and
  isAnnotation() regex results with proper cache key handling
- Buffer: ensure unindented flag is reset in all mutators (set, pop,
  clear) for consistent cache invalidation
- CachableInlineRule: simplify type annotations

Note: Lexer reuse assumes single-threaded parsing. Concurrent parsing
would require separate instances.

See https://cybottm.github.io/render-guides/ for benchmark data.
Add SUPPORTED_SCHEMAS_LIST and isSupportedScheme() to ExternalReferenceResolver
for O(1) hash set lookup instead of regex matching against 371 IANA schemes.
This is ~6x faster than the 5600+ character regex pattern.

InlineLexer now uses ExternalReferenceResolver::isSupportedScheme() to
validate URI schemes during tokenization.

Note: This change is also in PR phpDocumentor#1287 - when both PRs merge, the conflict
is trivially resolved by keeping one version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant