Skip to content

Conversation

@CybotTM
Copy link
Contributor

@CybotTM CybotTM commented Jan 23, 2026

Summary

Adds core infrastructure for incremental documentation builds, enabling consumers to:

  • Track document dependencies - Know which documents import others via toctree, doc refs, and includes
  • Detect changes efficiently - Fast mtime-first detection with content hash verification
  • Propagate invalidation - When a document changes, find all dependents that need re-rendering
  • Persist state - Serialize/deserialize build state between runs

Core Components

Component Purpose
DependencyGraph Bidirectional graph tracking import/dependent relationships with cycle detection
DocumentExports Immutable container for document anchors, titles, citations
ChangeDetector Fast change detection using mtime + content hash strategy
ContentHasher File hashing with xxh128 (fast) or sha256 fallback
IncrementalBuildState Container service holding graph + exports during compilation
DependencyGraphPass Compiler pass extracting imports from document AST
ExportsCollectorPass Compiler pass collecting exports after compilation

Orchestration Layer

Component Purpose
IncrementalBuildCache Main cache orchestrator with sharded storage (256 buckets using MD5 hash)
DirtyPropagator Propagates dirty state through dependency graph with export comparison
GlobalInvalidationDetector Detects when full rebuild is needed (config/theme/toctree changes)
PropagationResult DTO for dirty propagation results
CacheVersioning Cache version validation (PHP version, format version)

Security Hardening

  • Resource exhaustion protection: Limits on documents (100k), edges (2M), exports (100k), imports per doc (1k)
  • Path traversal prevention: realpath() validation with prefix checking
  • Input validation: Comprehensive validation in all fromArray() deserialization
  • Depth limiting: Node traversal limited to 100 levels to prevent stack overflow
  • Propagation limits: MAX_PROPAGATION_VISITS (100k) prevents runaway propagation

Documentation

  • incremental-builds.rst - Consumer guide with usage examples and integration patterns
  • incremental-builds-architecture.rst - Maintainer guide with architecture, security model, and extension instructions

Usage

Consumer applications integrate by:

  1. Loading previous IncrementalBuildCache from cache (if exists)
  2. Running compilation (passes populate the state automatically)
  3. Using ChangeDetector to determine which documents need re-rendering
  4. Using DirtyPropagator::propagate() to find affected dependents (with export comparison)
  5. Using GlobalInvalidationDetector to check if full rebuild is needed
  6. Persisting state via IncrementalBuildCache::save() for next build

See docs/developers/incremental-builds.rst for detailed documentation.

Test Plan

  • 212 unit tests covering all components
  • Security tests for path traversal, symlink attacks, control characters
  • Boundary tests for resource limits (100k docs, 2M edges, 100k exports)
  • Stress tests for large graph operations (linear chains, fan-out patterns)
  • Pattern validation tests for GlobalInvalidationDetector
  • TOCTOU race condition handling documented and tested
  • PHPStan clean
  • Coding standards clean

@CybotTM CybotTM marked this pull request as draft January 23, 2026 15:46
@CybotTM CybotTM changed the title [WIP] perf: Add incremental build infrastructure perf: Add incremental build infrastructure Jan 23, 2026
@CybotTM CybotTM marked this pull request as ready for review January 23, 2026 16:30
@jaapio
Copy link
Member

jaapio commented Jan 23, 2026

Can you help me to understand why we need this? What are you trying to achieve with this PR? What is the problem we are solving? How is this going to work? It seems to be part of a more complicated system that is not finished yet?

@CybotTM
Copy link
Contributor Author

CybotTM commented Jan 23, 2026

@jaapio, so the added documentation does not answer your questions?

Can you help me to understand why we need this?

Incremental builds. You do not need to render the whole documentation if only one or some file changes.

What are you trying to achieve with this PR?

Incremental builds.

What is the problem we are solving?

It takes only 0.x seconds to update documentation.

How is this going to work?

By checking timestamps and hashes if source files. And render only what is necessary to render.
And it tracks relations, so it also re-renders documents referencing the changed one if necessary.

It seems to be part of a more complicated system that is not finished yet?

It is finished. It just must be utilized by the consuming app.
But it may also be directly integrated ... need to check this further.

@CybotTM
Copy link
Contributor Author

CybotTM commented Jan 23, 2026

Initially I thought I will provide the integration part with another PR, but does not make much sense, right?

Note: This is a WIP PR. The utilities are complete but integration with a parallel renderer is planned for a follow-up PR.

So I will add it here as additional commits.
And the bigger picture is still: https://cybottm.github.io/render-guides/ - render complete TYPO3 Core Changelog in under 1 minute.

@CybotTM CybotTM marked this pull request as draft January 24, 2026 08:18
@CybotTM CybotTM force-pushed the perf/incremental-build-infrastructure branch from ed4e101 to 9636daa Compare January 25, 2026 00:31
Add core classes for tracking document changes and dependencies:

- DependencyGraph: Bidirectional graph for import/dependent relationships
  with cycle detection, BFS propagation, and resource exhaustion protection
- DocumentExports: Immutable container for document anchors, titles, citations
  with comprehensive input validation
- ChangeDetector: Fast change detection using mtime + content hash strategy
- ContentHasher: File/content hashing with xxh128 (fast) or sha256 fallback
- ChangeDetectionResult: Simple DTO for categorizing file changes

Security hardening included:
- Resource limits (100k docs, 2M edges, 1k imports/doc)
- Path validation with control character rejection
- Hash format validation
- Depth-limited traversal protection
Container service that holds DependencyGraph and DocumentExports during
compilation. Supports:

- Current and previous exports for change detection
- Hash algorithm tracking for cache compatibility
- Serialization for persistence between builds
- Input directory configuration

Enforces MAX_EXPORTS limit (100k) consistent with DependencyGraph.
Add compiler passes that integrate incremental build with document compilation:

- NodeTraversalTrait: Depth-limited (100 levels) recursive node traversal
  to prevent stack overflow from malicious documents
- DependencyGraphPass (priority 9): Extracts imports from TocTree, Doc refs,
  and Include directives to build dependency graph
- ExportsCollectorPass (priority 10): Collects anchors, titles, citations
  from compiled documents with path traversal protection

Both passes run after standard compilation to capture final document state.
Register ContentHasher and IncrementalBuildState as services.
Compiler passes are auto-discovered via the existing glob pattern.
@CybotTM CybotTM force-pushed the perf/incremental-build-infrastructure branch from 9636daa to 4d1c5f2 Compare January 25, 2026 01:13
@CybotTM CybotTM changed the title perf: Add incremental build infrastructure feat: Add incremental build infrastructure for dependency tracking Jan 25, 2026
@CybotTM CybotTM marked this pull request as ready for review January 25, 2026 01:44
@CybotTM CybotTM marked this pull request as draft January 25, 2026 02:15
@CybotTM CybotTM force-pushed the perf/incremental-build-infrastructure branch 2 times, most recently from 43f76d3 to 78a2681 Compare January 25, 2026 07:57
Document the incremental build infrastructure including:

- Architecture overview and component responsibilities
- How to use IncrementalBuildState in consumer applications
- Persistence format and cache invalidation strategies
- Security considerations and resource limits
@CybotTM CybotTM force-pushed the perf/incremental-build-infrastructure branch from 78a2681 to 6bb3265 Compare January 25, 2026 09:19
Extract higher-level incremental build classes from render-guides:

- PropagationResult: DTO for dirty propagation results
- CacheVersioning: Cache version validation (PHP version, format version)
- DirtyPropagator: Propagates dirty state through dependency graph
- GlobalInvalidationDetector: Detects when full rebuild is needed
- IncrementalBuildCache: Main cache orchestrator with sharded storage

Security hardening includes:
- MAX_EXPORTS and MAX_OUTPUT_PATHS limits (100k each)
- Input validation for all JSON data
- Path validation for sharded storage
- SplQueue for O(1) queue operations

All classes have comprehensive unit tests (75 new tests).
@CybotTM CybotTM force-pushed the perf/incremental-build-infrastructure branch from 6bb3265 to fce5c30 Compare January 25, 2026 10:09
@CybotTM CybotTM marked this pull request as ready for review January 25, 2026 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants