-
Notifications
You must be signed in to change notification settings - Fork 0
Description
WP Code Check: v1.x vs v2.x Architecture Comparison
10 Major Differences Matrix
| # | Aspect | v1.x (Development) | v2.x (Current) | Impact |
|---|---|---|---|---|
| 1 | Architecture | Monolithic (~6087 lines) | Modular + External Patterns (~6082 core) | v2 separates concerns; easier to maintain & extend |
| 2 | Pattern Loading | Loads patterns one-at-a-time per scan | Loads patterns one-at-a-time per scan | ✅ BOTH IDENTICAL - No memory difference |
| 3 | Pattern Registry | None (files discovered at runtime) | PATTERN-LIBRARY.json (Single Source of Truth) + per-scan cache file |
v2 enables faster discovery, versioning, metadata; cache reduces repeated parsing |
| 4 | Pattern Count | 34 patterns | 53 patterns | v2 covers 56% more issues (19 CRITICAL, 16 HIGH) |
| 5 | Cache Strategy | None (re-parses on each access) | Per-scan temp file (/tmp/wpcc-pattern-registry.XXXXXX) |
v2 reduces Python subprocesses; whitespace-safe encoding for Bash 3 compatibility |
| 6 | Helper Libraries | 3 shared libs (colors, common-helpers, false-positive-filters) | 4 shared libs + json-helpers.sh |
v2 adds JSON parsing utilities for consistency |
| 7 | HTML Generation | Bash-based (json-to-html.sh) |
Python-based (json-to-html.py) |
v2 is faster, more reliable, no bash subprocess issues |
| 8 | AI Integration | None | ai-triage.py (Phase 2 AI triage) |
v2 enables AI-assisted false positive filtering |
| 9 | MCP Support | None | mcp-server.js (Tier 1 - Resources) |
v2 integrates with Claude Desktop, Cline, other AI tools |
| 10 | External Tools | 2 tools (json-to-html.sh, pattern-library-manager.sh) | 6+ tools (Python converters, MCP, triage, GitHub integration) | v2 has richer ecosystem for automation |
Pattern Loading: v1.x vs v2.x (IDENTICAL)
Both versions load patterns ONE-AT-A-TIME, NOT all into memory at startup:
Cache Strategy: v1.x vs v2.x (DIFFERENT)
v1.x: No caching - re-parses pattern JSON files on each access
v2.x: Per-scan cache file - reads PATTERN-LIBRARY.json once, exports to temp file
v1.x Pattern Loading Loop
# v1.x: Discovers patterns at runtime, loads one per iteration
SIMPLE_PATTERNS=$(find "$REPO_ROOT/patterns" -name "*.json" -type f | while read -r pattern_file; do
detection_type=$(grep '"detection_type"' "$pattern_file" | head -1 | sed 's/.*"detection_type"[[:space:]]*:[[:space:]]*"\([^"]*\)".*/\1/')
if [ "$detection_type" = "simple" ]; then
echo "$pattern_file"
fi
done)
# Process each pattern ONE AT A TIME
while IFS= read -r pattern_file; do
[ -z "$pattern_file" ] && continue
# Load pattern metadata (reads JSON file)
if load_pattern "$pattern_file"; then
# Run grep scan
matches=$(cached_grep $include_args -E "$pattern_search" || true)
# Process matches
fi
done <<< "$SIMPLE_PATTERNS"v2.x Pattern Loading Loop (with Cache)
# v2.x: Tries registry first, falls back to file discovery
if SIMPLE_PATTERNS=$(get_patterns_from_registry "simple:direct" "php" "false"); then
:
else
# Fallback: Discover patterns at runtime (same as v1.x)
SIMPLE_PATTERNS=$(find "$REPO_ROOT/patterns" -maxdepth 1 -name "*.json" -type f 2>/dev/null | ...)
fi
# Process each pattern ONE AT A TIME (identical to v1.x)
while IFS= read -r pattern_file; do
[ -z "$pattern_file" ] && continue
# Load pattern metadata
if load_pattern "$pattern_file"; then
# On first call, load_pattern() creates per-scan cache:
# 1. Check registry state (staleness detection)
# 2. Create temp file: /tmp/wpcc-pattern-registry.XXXXXX
# 3. Read PATTERN-LIBRARY.json via Python (ONE TIME)
# 4. Export all patterns to cache file (whitespace-safe encoding)
# 5. Lookup this pattern in cache file via grep
# Subsequent patterns reuse the same cache file (no re-reading registry)
# Run grep scan
matches=$(cached_grep $include_args -E "$pattern_search" || true)
# Process matches
fi
done <<< "$SIMPLE_PATTERNS"Cache File Format:
pattern-id search_pattern=<len>:<value> file_patterns=<len>:pat1,pat2 validator_script=<len>:<value> mitigation_enabled=<len>:true
Why whitespace-safe encoding?
- Values can contain spaces (regex patterns, file paths)
- Bash 3 compatible (no associative arrays)
- Each value prefixed with byte length:
key=<len>:<value> - Parser can safely extract values without word-splitting
Key Finding: Pattern Loading is IDENTICAL, Caching is DIFFERENT
Memory Model (IDENTICAL):
- ✅ v1.x: Loads patterns one-at-a-time
- ✅ v2.x: Loads patterns one-at-a-time
- ✅ NO difference in memory usage
Discovery Strategy (DIFFERENT):
- v1.x: Discovers patterns by scanning
dist/patterns/directory at runtime - v2.x: Tries to use
PATTERN-LIBRARY.jsonregistry first (faster), falls back to file discovery
Caching Strategy (DIFFERENT):
- v1.x: No caching - re-parses pattern JSON files on each access
- v2.x: Per-scan cache file - reads PATTERN-LIBRARY.json once via Python, exports to
/tmp/wpcc-pattern-registry.XXXXXX
Why v2.x is better:
- ✅ Registry enables pattern discovery without filesystem scan
- ✅ Per-scan cache reduces repeated JSON parsing
- ✅ Reduces Python subprocesses (1 cache build vs 53 individual calls)
- ✅ Enables filtering by severity/category
- ✅ Enables version tracking
- ✅ Enables documentation generation
- ❌ Does NOT load all patterns into memory at startup
- ❌ Still loads patterns one-at-a-time (not bulk)
Detailed Breakdown
1. Architecture: Monolithic → Modular
v1.x: Single check-performance.sh file contains all logic
- All pattern checks inline
- All helper functions embedded
- Difficult to extend without modifying core
v2.x: Modular design with external patterns
- Core scanner delegates to pattern registry
- Patterns loaded from
dist/patterns/directory - Helpers in
dist/bin/lib/(reusable across tools) - Specialized tools in
dist/bin/(ai-triage.py, json-to-html.py, mcp-server.js)
2-3. Pattern Management & Registry
v1.x:
- 34 individual JSON files in
dist/patterns/ - Each file loaded independently at runtime
- No centralized metadata
v2.x:
- 53 patterns (56% increase)
PATTERN-LIBRARY.json= Single Source of Truth- Includes statistics: severity breakdown, category counts, pattern types
- Enables pattern discovery and filtering
4. Helper Libraries
v1.x (3 libs):
colors.sh- Terminal colorscommon-helpers.sh- Timestamps, utilitiesfalse-positive-filters.sh- Comment detection, guard detection
v2.x (4 libs):
- All of v1.x +
json-helpers.sh- JSON parsing utilities (new)
5. HTML Generation
v1.x: Bash-based (json-to-html.sh)
- Uses jq + bash templating
- Subprocess overhead
- Potential for hanging
v2.x: Python-based (json-to-html.py)
- Standalone, no dependencies
- Faster, more reliable
- Auto-opens reports (macOS/Linux)
6-7. AI Integration & MCP
v1.x: None
v2.x:
ai-triage.py- AI-assisted triage (Phase 2)mcp-server.js- Model Context Protocol (Tier 1)- Exposes scan results as resources
- Works with Claude Desktop, Cline, etc.
8. Pattern Count Growth
| Severity | v1.x | v2.x | Change |
|---|---|---|---|
| CRITICAL | 14 | 19 | +5 |
| HIGH | 10 | 16 | +6 |
| MEDIUM | 7 | 13 | +6 |
| LOW | 3 | 4 | +1 |
| Total | 34 | 53 | +19 (+56%) |
9. External Tools Ecosystem
v1.x (2 tools):
json-to-html.shpattern-library-manager.sh
v2.x (6+ tools):
json-to-html.py(faster HTML generation)ai-triage.py(AI-assisted triage)mcp-server.js(AI integration)create-github-issue.sh(GitHub automation)pattern-library-manager.sh(enhanced)wp-audit(unified CLI)
10. Metadata Centralization
v1.x: Metadata scattered across 34 JSON files
- No single source of truth
- Difficult to query patterns
- No version tracking
v2.x: PATTERN-LIBRARY.json (centralized)
- Single source of truth
- Includes version, generation timestamp
- Statistics: by severity, category, pattern type
- Enables pattern discovery and filtering
Performance Optimization: In-Memory Loading Decision
Original Goal
v2.x was designed to load all patterns into memory at startup to achieve 6-12x speedup (600-1200ms → ~55ms per scan).
Why This Was NOT Pursued Further
After implementing Phase 1 (registry discovery) and Phase 2 (extended schema), the full in-memory loader (Phase 3) was abandoned due to:
- Bash 3 Compatibility - No associative arrays, complex workarounds needed
- Memory Constraints - Risk of hitting limits on resource-constrained systems
- Implementation Complexity - 3x more complex for only 20% additional speedup
- Diminishing Returns - Phase 1-2 already achieved 80% of benefits
What Was Implemented Instead
Per-scan cache file approach (pragmatic middle ground):
- ✅ Bash 3 compatible, simpler design, easier debugging
- ✅ Still achieves most performance gains (1 Python call vs 53)
- ✅ Lower risk for production use
- ❌ Trade-off: ~3-5x speedup instead of 6-12x, but with much lower complexity
Result: Significant performance improvement with acceptable trade-offs
Summary: Why v2.x is Better
Memory & Performance
⚠️ Pattern Loading: IDENTICAL in both versions (one-at-a-time, not bulk-loaded)- ✅ Pattern Discovery: v2.x registry is faster (no filesystem scan needed)
- ✅ Pattern Caching: v2.x per-scan cache reduces repeated parsing (1 Python call vs 53)
- ✅ HTML Generation: v2.x Python-based is faster and more reliable
Architecture & Maintainability
✅ Maintainability: Modular design, external patterns, centralized registry
✅ Extensibility: Easy to add new patterns without modifying core
✅ Coverage: 56% more patterns (53 vs 34)
✅ Caching: Per-scan cache file reduces I/O and Python subprocesses
AI & Automation
✅ AI-Ready: Built-in AI triage and MCP support
✅ Automation: Rich tooling ecosystem for CI/CD integration
✅ Discovery: Centralized pattern registry enables filtering and documentation
Generated: 2026-01-20
Updated: 2026-01-20 (Pattern loading & caching verification)
Comparison: v1.x (development branch) vs v2.x (current workspace)
Verification: Code review of both versions + actual codebase inspection