Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions .claude/hooks/check-dead-exports.sh
Original file line number Diff line number Diff line change
Expand Up @@ -62,19 +62,60 @@ if [ -z "$FILES_TO_CHECK" ]; then
fi

# Single Node.js invocation: check all files in one process
# Excludes exports that are re-exported from index.js (public API) or consumed
# via dynamic import() — codegraph's static graph doesn't track those edges.
DEAD_EXPORTS=$(node -e "
const fs = require('fs');
const path = require('path');
const root = process.argv[1];
const files = process.argv[2].split('\n').filter(Boolean);

const { exportsData } = require(path.join(root, 'src/queries.js'));

// Build set of names exported from index.js (public API surface)
const indexSrc = fs.readFileSync(path.join(root, 'src/index.js'), 'utf8');
const publicAPI = new Set();
// Match: export { foo, bar as baz } from '...'
for (const m of indexSrc.matchAll(/export\s*\{([^}]+)\}/g)) {
for (const part of m[1].split(',')) {
const name = part.trim().split(/\s+as\s+/).pop().trim();
if (name) publicAPI.add(name);
}
}
// Match: export default ...
if (/export\s+default\b/.test(indexSrc)) publicAPI.add('default');

// Scan all src/ files for dynamic import() consumers
const srcDir = path.join(root, 'src');
function scanDynamic(dir) {
for (const ent of fs.readdirSync(dir, { withFileTypes: true })) {
if (ent.isDirectory()) { scanDynamic(path.join(dir, ent.name)); continue; }
if (!ent.name.endsWith('.js')) continue;
try {
const src = fs.readFileSync(path.join(dir, ent.name), 'utf8');
// Multi-line-safe: match const { ... } = [await] import('...')
for (const m of src.matchAll(/const\s*\{([^}]+)\}\s*=\s*(?:await\s+)?import\s*\(['"]/gs)) {
for (const part of m[1].split(',')) {
const name = part.trim().split(/\s+as\s+/).pop().trim().split('\n').pop().trim();
if (name && /^\w+$/.test(name)) publicAPI.add(name);
}
}
// Also match single-binding: const X = [await] import('...') (default import)
for (const m of src.matchAll(/const\s+(\w+)\s*=\s*(?:await\s+)?import\s*\(['"]/g)) {
publicAPI.add(m[1]);
}
} catch {}
}
}
scanDynamic(srcDir);

const dead = [];
for (const file of files) {
try {
const data = exportsData(file, undefined, { noTests: true, unused: true });
if (data && data.results) {
for (const r of data.results) {
if (publicAPI.has(r.name)) continue; // public API or dynamic import consumer
dead.push(r.name + ' (' + data.file + ':' + r.line + ')');
}
}
Expand Down
20 changes: 19 additions & 1 deletion .claude/hooks/check-readme.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
#!/bin/bash
# Hook: block git commit if README.md, CLAUDE.md, or ROADMAP.md might need updating but aren't staged.
# Runs as a PreToolUse hook on Bash tool calls.
#
# Policy:
# - If NO docs are staged but source files changed → deny (docs weren't considered)
# - If SOME docs are staged → allow (developer reviewed and chose which to update)
# - If commit message contains "docs check acknowledged" → allow (explicit bypass)

INPUT=$(cat)
COMMAND=$(echo "$INPUT" | node -e "
Expand All @@ -17,11 +22,16 @@ if ! echo "$COMMAND" | grep -qE '^\s*git\s+commit'; then
exit 0
fi

# Allow explicit bypass via commit message
if echo "$COMMAND" | grep -q 'docs check acknowledged'; then
exit 0
fi

# Check which docs are staged
STAGED_FILES=$(git diff --cached --name-only 2>/dev/null)
README_STAGED=$(echo "$STAGED_FILES" | grep -c '^README.md$' || true)
CLAUDE_STAGED=$(echo "$STAGED_FILES" | grep -c '^CLAUDE.md$' || true)
ROADMAP_STAGED=$(echo "$STAGED_FILES" | grep -c '^ROADMAP.md$' || true)
ROADMAP_STAGED=$(echo "$STAGED_FILES" | grep -c 'ROADMAP.md$' || true)

# If all three are staged, all good
if [ "$README_STAGED" -gt 0 ] && [ "$CLAUDE_STAGED" -gt 0 ] && [ "$ROADMAP_STAGED" -gt 0 ]; then
Expand All @@ -32,6 +42,14 @@ fi
NEEDS_CHECK=$(echo "$STAGED_FILES" | grep -cE '(src/|cli\.js|constants\.js|parser\.js|package\.json|grammars/)' || true)

if [ "$NEEDS_CHECK" -gt 0 ]; then
DOCS_STAGED=$((README_STAGED + CLAUDE_STAGED + ROADMAP_STAGED))

# If at least one doc is staged, developer considered docs — allow with info
if [ "$DOCS_STAGED" -gt 0 ]; then
exit 0
fi

# No docs staged at all — block
MISSING=""
[ "$README_STAGED" -eq 0 ] && MISSING="README.md"
[ "$CLAUDE_STAGED" -eq 0 ] && MISSING="${MISSING:+$MISSING, }CLAUDE.md"
Expand Down
1 change: 1 addition & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ JS source is plain JavaScript (ES modules) in `src/`. No transpilation step. The
| `native.js` | Native napi-rs addon loader with WASM fallback |
| `registry.js` | Global repo registry (`~/.codegraph/registry.json`) for multi-repo MCP |
| `resolve.js` | Import resolution (supports native batch mode) |
| `ast-analysis/` | Unified AST analysis framework: shared DFS walker (`visitor.js`), engine orchestrator (`engine.js`), extracted metrics (`metrics.js`), and pluggable visitors for complexity, dataflow, and AST-store |
| `complexity.js` | Cognitive, cyclomatic, Halstead, MI computation from AST; `complexity` CLI command |
| `communities.js` | Louvain community detection, drift analysis |
| `manifesto.js` | Configurable rule engine with warn/fail thresholds; CI gate |
Expand Down
50 changes: 29 additions & 21 deletions docs/roadmap/ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -562,36 +562,44 @@ Plus updated enums on existing tools (edge_kinds, symbol kinds).

**Context:** Phases 2.5 and 2.7 added 38 modules and grew the codebase from 5K to 26,277 lines without introducing shared abstractions. The dual-function anti-pattern was replicated across 19 modules. Three independent AST analysis engines (complexity, CFG, dataflow) totaling 4,801 lines share the same fundamental pattern but no infrastructure. Raw SQL is scattered across 25+ modules touching 13 tables. The priority ordering has been revised based on actual growth patterns -- the new #1 priority is the unified AST analysis framework.

### 3.1 -- Unified AST Analysis Framework ★ Critical (New)
### 3.1 -- Unified AST Analysis Framework ★ Critical 🔄

Unify the three independent AST analysis engines (complexity, CFG, dataflow) plus AST node storage into a shared visitor framework. These four modules total 5,193 lines and independently implement the same pattern: per-language rules map → AST walk → collect data → write to DB → query → format.
Unify the independent AST analysis engines (complexity, CFG, dataflow) plus AST node storage into a shared visitor framework. These four modules independently implement the same pattern: per-language rules map → AST walk → collect data → write to DB → query → format.

| Module | Lines | Languages | Pattern |
|--------|-------|-----------|---------|
| `complexity.js` | 2,163 | 8 | Per-language rules → AST walk → collect metrics |
| `cfg.js` | 1,451 | 9 | Per-language rules → AST walk → build basic blocks |
| `dataflow.js` | 1,187 | 1 (JS/TS) | Scope stack → AST walk → collect flows |
| `ast.js` | 392 | 1 (JS/TS) | AST walk → extract stored nodes |

The extractors refactoring (Phase 2.7.6) proved the pattern: split per-language rules into files, share the engine. Apply it to all four AST analysis passes.
**Completed:** Phases 1-7 implemented a pluggable visitor framework with a shared DFS walker (`walkWithVisitors`), an analysis engine orchestrator (`runAnalyses`), and three visitors (complexity, dataflow, AST-store) that share a single tree traversal per file. `builder.js` collapsed from 4 sequential `buildXxx` blocks into one `runAnalyses` call.

```
src/
ast-analysis/
visitor.js # Shared AST visitor with hook points
engine.js # Single-pass or multi-pass orchestrator
metrics.js # Halstead, MI, LOC/SLOC (language-agnostic)
cfg-builder.js # Basic-block + edge construction
rules/
complexity/{lang}.js # Cognitive/cyclomatic rules per language
cfg/{lang}.js # Basic-block rules per language
dataflow/{lang}.js # Define-use chain rules per language
ast-store/{lang}.js # Node extraction rules per language
visitor.js # Shared DFS walker with pluggable visitor hooks
engine.js # Orchestrates all analyses in one coordinated pass
metrics.js # Halstead, MI, LOC/SLOC (extracted from complexity.js)
visitor-utils.js # Shared helpers (functionName, extractParams, etc.)
visitors/
complexity-visitor.js # Cognitive/cyclomatic/nesting + Halstead
ast-store-visitor.js # new/throw/await/string/regex extraction
dataflow-visitor.js # Scope stack + define-use chains
shared.js # findFunctionNode, rule factories, ext mapping
rules/ # Per-language rule files (unchanged)
```

A single AST walk with pluggable visitors eliminates 3 redundant tree traversals per function, shares language-specific node type mappings, and allows new analyses to plug in without creating another 1K+ line module.
- ✅ Shared DFS walker with `enterNode`/`exitNode`/`enterFunction`/`exitFunction` hooks, `skipChildren` per-visitor, nesting/scope tracking
- ✅ Complexity visitor (cognitive, cyclomatic, max nesting, Halstead) — file-level and function-level modes
- ✅ AST-store visitor (new/throw/await/string/regex extraction)
- ✅ Dataflow visitor (define-use chains, arg flows, mutations, scope stack)
- ✅ Engine orchestrator: unified pre-walk stores results as pre-computed data on `symbols`, then delegates to existing `buildXxx` for DB writes
- ✅ `builder.js` → single `runAnalyses` call replaces 4 sequential blocks + WASM pre-parse
- ✅ Extracted pure computations to `metrics.js` (Halstead derived math, LOC, MI)
- ✅ Extracted shared helpers to `visitor-utils.js` (from dataflow.js)
- 🔲 **CFG visitor rewrite** (see below)

**Remaining: CFG visitor rewrite.** `buildFunctionCFG` (813 lines) uses a statement-level traversal (`getStatements` + `processStatement` with `loopStack`, `labelMap`, `blockIndex`) that is fundamentally incompatible with the node-level DFS used by `walkWithVisitors`. This is why the engine runs CFG as a separate Mode B pass — the only analysis that can't participate in the shared single-DFS walk.

Rewrite the CFG algorithm as a node-level visitor that builds basic blocks and edges incrementally via `enterNode`/`exitNode` hooks, tracking block boundaries at branch/loop/return nodes the same way the complexity visitor tracks nesting. This eliminates the last redundant tree traversal during build and lets CFG share the exact same DFS pass as complexity, dataflow, and AST extraction. The statement-level `getStatements` helper and per-language `CFG_RULES.statementTypes` can be replaced by detecting block-terminating node types in `enterNode`. Also simplifies `engine.js` by removing the Mode A/B split and WASM pre-parse special-casing for CFG.

**Remaining: Derive cyclomatic complexity from CFG.** Once CFG participates in the unified walk, cyclomatic complexity can be derived directly from CFG edge/block counts (`edges - nodes + 2`) rather than independently computed by the complexity visitor. This creates a single source of truth for control flow metrics and eliminates redundant computation. Can also be done as a simpler SQL-only approach against stored `cfg_blocks`/`cfg_edges` tables (see backlog ID 45).

**Affected files:** `src/complexity.js`, `src/cfg.js`, `src/dataflow.js`, `src/ast.js` -> split into `src/ast-analysis/`
**Affected files:** `src/complexity.js`, `src/cfg.js`, `src/dataflow.js`, `src/ast.js` split into `src/ast-analysis/`

### 3.2 -- Command/Query Separation ★ Critical 🔄

Expand Down
Loading