optave · carlos-alm · Mar 9, 2026 · Mar 9, 2026
diff --git a/docs/roadmap/BACKLOG.md b/docs/roadmap/BACKLOG.md
@@ -135,6 +135,30 @@ Six commands already produce Mermaid/DOT output: `export`, `diff-impact -f merma
 | 69 | Node annotations in `diff-impact` / `branch-compare` / `communities` | Use 61's annotation formatter to show top exports on file nodes in `diff-impact -f mermaid`, `branch-compare --format mermaid`, and `communities` output. | Visualization | All visual tools show file API surfaces inline, not just in `export` | ✓ | ✓ | 3 | No | 61 |
 | 70 | Drift/risk subgraph labels in `communities` and `triage` | Use 62's semantic label system to annotate `communities` subgraphs with drift status (`**(drifted)**` / `**(cohesive)**`) and add a new `--format mermaid` to `triage` with risk-severity group labels. | Intelligence | Community and triage diagrams communicate structural health directly in the layout | ✓ | ✓ | 3 | No | 62 |
 
+### Tier 1h — Core accuracy improvements (resolve known analysis gaps)
+
+These address fundamental limitations in the parsing and resolution pipeline that reduce graph accuracy for real-world codebases. All are zero-dep (tree-sitter AST + existing resolution infrastructure), non-breaking (purely additive — more edges, better resolution), and high problem-fit (directly prevent hallucinated or missing dependencies).
+
+| ID | Title | Description | Category | Benefit | Zero-dep | Foundation-aligned | Problem-fit (1-5) | Breaking | Depends on |
+|----|-------|-------------|----------|---------|----------|-------------------|-------------------|----------|------------|
+| 71 | Basic type inference for typed languages | Extract type annotations from TypeScript and Java AST nodes (variable declarations, function parameters, return types, generics) to resolve method calls through typed references. Currently `const x: Router = express.Router(); x.get(...)` produces no edge because `x.get` can't be resolved without knowing `x` is a `Router`. Tree-sitter already parses type annotations — we just don't use them for resolution. Start with declared types (no flow inference), which covers the majority of TS/Java code. | Resolution | Dramatically improves call graph completeness for TypeScript and Java — the two languages where developers annotate types explicitly and expect tooling to use them. Directly prevents hallucinated "no callers" results for methods called through typed variables | ✓ | ✓ | 5 | No | — |
+| 72 | Interprocedural dataflow analysis | Extend the existing intraprocedural dataflow (ID 14) to propagate `flows_to`/`returns`/`mutates` edges across function boundaries. When function A calls B with argument X, and B's dataflow shows X flows to its return value, connect A's call site to the downstream consumers of B's return. Requires stitching per-function dataflow summaries at call edges — no new parsing, just graph traversal over existing `dataflow` + `edges` tables. Start with single-level propagation (caller↔callee), not transitive closure. | Analysis | Current dataflow stops at function boundaries, missing the most important flows — data passing through helper functions, middleware chains, and factory patterns. Single-function scope means `dataflow` can't answer "where does this user input end up?" across call boundaries. Cross-function propagation is the difference between toy dataflow and useful taint-like analysis | ✓ | ✓ | 5 | No | 14 |
+| 73 | Improved dynamic call resolution | Upgrade the current "best-effort" dynamic dispatch resolution for Python, Ruby, and JavaScript. Three concrete improvements: **(a)** receiver-type tracking — when `x = SomeClass()` is followed by `x.method()`, resolve `method` to `SomeClass.method` using the assignment chain (leverages existing `ast_nodes` + `dataflow` tables); **(b)** common pattern recognition — resolve `EventEmitter.on('event', handler)` callback registration, `Promise.then/catch` chains, `Array.map/filter/reduce` with named function arguments, and decorator/annotation patterns; **(c)** confidence-tiered edges — mark dynamically-resolved edges with a confidence score (high for direct assignment, medium for pattern match, low for heuristic) so consumers can filter by reliability. | Resolution | In Python/Ruby/JS, 30-60% of real calls go through dynamic dispatch — method calls on variables, callbacks, event handlers, higher-order functions. The current best-effort resolution misses most of these, leaving massive gaps in the call graph for the languages where codegraph is most commonly used. Even partial improvement here has outsized impact on graph completeness | ✓ | ✓ | 5 | No | — |
+
+### Tier 1i — Search, navigation, and monitoring improvements
+
+These close gaps in search expressiveness, cross-repo navigation, implementation tracking, and proactive monitoring. All are zero-dep and foundation-aligned.
+
+| ID | Title | Description | Category | Benefit | Zero-dep | Foundation-aligned | Problem-fit (1-5) | Breaking | Depends on |
+|----|-------|-------------|----------|---------|----------|-------------------|-------------------|----------|------------|
+| 74 | Interface and trait implementation tracking | Extract `implements`/`extends`/trait-impl relationships from tree-sitter AST and store as `implements` edges in the graph. New `codegraph implementations <interface>` command returns all concrete types that implement a given interface, abstract class, or trait. Inverse: `codegraph interfaces <class>` returns what a type implements. Cross-reference with existing `contains` edges for full type hierarchy. Covers TypeScript interfaces, Java interfaces/abstract classes, Go interfaces (structural matching via method set comparison), Rust traits, C# interfaces, PHP interfaces. | Navigation | Agents can answer "who implements this interface?" and "what contract does this type satisfy?" in one call — currently impossible without reading every file. Directly prevents missed blast radius when an interface signature changes, since all implementors are affected but invisible to the current call-graph-only impact analysis | ✓ | ✓ | 5 | No | — |
+| 75 | Diff and commit content search | Search within git diffs and commit messages using the existing graph's file/symbol awareness. `codegraph search-history "pattern" --since 30d` searches `git log -p` output, returning matches with commit SHA, author, date, file, and enclosing function (resolved via line-number intersection with `nodes` table). Supports `--author`, `--file`, `--kind` filters to scope by symbol type. Unlike `co-change` (which tracks statistical co-occurrence), this searches actual diff content — "find every commit that modified the `authenticate` function" or "find when `TODO: hack` was introduced." | Search | Agents can trace when and why a function changed without leaving the graph — answers "who introduced this bug?" and "what changed in this module last month?" in one query. Eliminates manual `git log -p` + grep workflows that burn tokens on raw diff output | ✓ | ✓ | 4 | No | — |
+| 76 | Regression watchers (query-based commit monitors) | Define persistent watch rules that evaluate graph queries against each new commit during `build --watch` or incremental rebuild. Rules are declared in `.codegraphrc.json` under `monitors[]` — each rule has a name, a query type (`check` predicate, `search` pattern, `ast` pattern, or custom SQL), and an action (`warn`, `fail`, or `webhook`). Examples: "alert when a new call to `deprecatedAPI()` appears in a diff", "fail when a new `eval()` AST node is added", "warn when fan-in of any function exceeds 20 after this commit." Results surfaced in CLI output during watch mode and as a `monitors` section in `diff-impact`. | CI | Proactive detection of regressions as they happen — agents and CI pipelines get immediate feedback when a commit introduces banned patterns, exceeds thresholds, or violates architectural rules. Shifts detection left from periodic audits to per-commit triggers | ✓ | ✓ | 4 | No | — |
+| 77 | Metric trend tracking (code insights) | `codegraph trends` computes key graph metrics (total symbols, avg complexity, dead code count, cycle count, community drift score, boundary violations) at historical git revisions and outputs a time-series table or JSON. Uses `git stash && git checkout <rev> && build && collect && restore` loop over sampled commits (configurable `--samples N` defaulting to 10 evenly-spaced commits). Stores results in a `metric_snapshots` table for incremental updates. `--since` and `--until` for date range. `--metric` to select specific metrics. Enables tracking migration progress ("how many files still use old API?"), tech debt trends, and codebase growth over time without external dashboards. | Intelligence | Agents and teams can answer "is our codebase getting healthier or worse?" with data instead of intuition — tracks complexity trends, dead code accumulation, architectural drift, and migration progress over time. Historical backfill from git history means instant visibility into months of trends | ✓ | ✓ | 3 | No | — |
+| 78 | Cross-repo symbol resolution | In multi-repo mode, resolve import edges that cross repository boundaries. When repo A imports `@org/shared-lib`, and repo B is `@org/shared-lib` in the registry, create cross-repo edges linking A's import to B's actual exported symbol. Requires matching npm/pip/go package names to registered repos. Store cross-repo edges with a `repo` qualifier in the `edges` table. Enables cross-repo `fn-impact` (changing a shared library function shows impact across all consuming repos), cross-repo `path` queries, and cross-repo `diff-impact`. | Navigation | Multi-repo mode currently treats each repo as isolated — agents can search across repos but can't trace dependencies between them. Cross-repo edges enable "if I change this shared utility, which downstream repos break?" — the highest-value question in monorepo and multi-repo architectures | ✓ | ✓ | 5 | No | — |
+| 79 | Advanced query language with boolean operators and output shaping | Extend `codegraph search` and `codegraph where` with a structured query syntax supporting: **(a)** boolean operators — `kind:function AND file:src/` , `name:parse OR name:extract`, `NOT kind:class`; **(b)** compound filters — `kind:method AND complexity.cognitive>15 AND role:core`; **(c)** output shaping — `--select symbols` (just names), `--select files` (distinct files), `--select owners` (CODEOWNERS for matches), `--select stats` (aggregate counts by kind/file/role); **(d)** result aggregation — `--group-by file`, `--group-by kind`, `--group-by community` with counts. Parse the query into a SQL WHERE clause against the `nodes`/`function_complexity`/`edges` tables. Expose as `query_language` MCP tool parameter. | Search | Current search is either keyword/semantic (fuzzy) or exact-name (`where`). Agents needing "all core functions with cognitive complexity > 15 in src/api/" must chain multiple commands and filter manually — wasting tokens on intermediate results. A structured query language answers compound questions in one call | ✓ | ✓ | 4 | No | — |
+| 80 | Find implementations in impact analysis | When a function signature or interface definition changes, automatically include all implementations/subtypes in `fn-impact` and `diff-impact` blast radius. Currently impact only follows `calls` edges — changing an interface method signature breaks every implementor, but this is invisible. Requires ID 74's `implements` edges. Add `--include-implementations` flag (on by default) to impact commands. | Analysis | Catches the most dangerous class of missed blast radius — interface/trait changes that silently break all implementors. A single method signature change on a widely-implemented interface can break dozens of files, none of which appear in the current call-graph-only impact analysis | ✓ | ✓ | 5 | No | 74 |
+
 ### Tier 2 — Foundation-aligned, needs dependencies
 
 Ordered by problem-fit: