Skip to content

fix(validate): resolve nodes by frontmatter id, not filename#47

Open
cleak wants to merge 4 commits into
masterfrom
claude/silly-davinci-55e5f2
Open

fix(validate): resolve nodes by frontmatter id, not filename#47
cleak wants to merge 4 commits into
masterfrom
claude/silly-davinci-55e5f2

Conversation

@cleak
Copy link
Copy Markdown
Owner

@cleak cleak commented May 26, 2026

Summary

  • validate --fix crashed mid-run when a node's filename didn't match its id: (e.g. a hub at <dir>/README.md with id: hub-<dir>-readme). The crash also left partial reverse-edge writes in unrelated files, which then needed manual de-duping.
  • Root cause: find_node_file (and the parallel resolve_node_id used by MCP/CLI) reconstructed lookups from the filename and bailed via ? after earlier writes had already landed. Fix: fall back to scanning frontmatter on filename miss, and have repair_reverse_edges use each target's file_path from the loaded graph and stage all edits in memory before any write.

What changed

  • find_node_file (crates/tempyr-core/src/ops.rs): added a slow-path fallback that parses frontmatter when the filename match misses; dropped max_depth(2) so it matches Graph::load_from_directory.
  • resolve_node_id: same two-phase lookup — falls back to frontmatter id when the filename match misses. Otherwise MCP/CLI node lookups (graph_update_node, graph_get_node, add-edge, status, …) silently failed against the same misaligned files.
  • repair_reverse_edges: now uses graph.get_node(target_id).file_path directly. Accumulates all reverse-edge additions into an in-memory HashMap<String, Node> first, then writes every file in a single commit pass. Errors during collection abort before any filesystem write.
  • Four regression tests in ops::tests.

Test plan

  • cargo test --workspace --lib — 490 passed, 0 failed
  • cargo clippy -p tempyr-core --lib --tests -- -D warnings — clean
  • cargo fmt --check — clean
  • End-to-end smoke with the bug-report schema (type: hub, directory = "hubs", file at notes/README.md with id: hub-notes-readme):
    • reverse edge lands in notes/README.md
    • no phantom hubs/hub-notes-readme.md is created

Bug-report items left for follow-ups

The original report also flagged three smaller quirks that aren't in scope here:

  • graph_dir = "." in config.toml is ignored
  • updated: rejects bare YAML date scalars with a misleading "premature end of input"
  • no --version flag

Happy to tackle these as separate PRs.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Improved node lookup: filename-first search with a fallback that reads frontmatter to match IDs; still supports suffix-based matches for deeply nested nodes.
    • More robust reverse-edge repairs: collect and apply all updates safely in-memory, preserve original file locations, avoid creating phantom paths or partial writes, and apply a consistent timestamp.
  • Tests

    • Added regressions for mismatched filename/ID resolution, deep-node suffix resolution, and safe edge-repair behavior.

Review Change Stack

`validate --fix` crashed mid-run when a node's on-disk filename didn't
match its `id:` (e.g. a hub at `<dir>/README.md` with `id: hub-<dir>-readme`),
because `find_node_file` reconstructed the path as `<type-dir>/<id>.md`
and bailed via `?` after earlier reverse-edge writes had already landed —
leaving partial writes in unrelated files.

- `find_node_file` now falls back to scanning frontmatter when the
  filename match misses, and walks at any depth >= 2 to match
  `Graph::load_from_directory`.
- `repair_reverse_edges` uses each target's `file_path` from the loaded
  graph (skipping `find_node_file` entirely) and accumulates all edits
  in memory before any write. Errors during collection abort before
  touching the filesystem.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 70d81abe-1735-4a96-a63c-f5de7d890725

📥 Commits

Reviewing files that changed from the base of the PR and between 7c21b1b and 5d84aa3.

📒 Files selected for processing (1)
  • crates/tempyr-core/src/ops.rs

📝 Walkthrough

Walkthrough

resolve_node_id/find_node_file now try exact {id}.md matches across deep subtrees and, if missing, scan markdown files and resolve by parsed frontmatter id. repair_reverse_edges accumulates all repairs in-memory keyed by target node, then sorts and atomically writes each modified node file with a single updated timestamp. Tests add regressions for these behaviors.

Changes

Node graph file resolution and reverse-edge repair robustness

Layer / File(s) Summary
resolve_node_id two-phase lookup
crates/tempyr-core/src/ops.rs
resolve_node_id first tries {query}.md filename matches (depth ≥ 2). If not found, it scans *.md files, parses frontmatter, and returns the node whose id equals the query; suffix-only fallback now verifies parsed ids.
find_node_file filename fast-path + frontmatter fallback
crates/tempyr-core/src/ops.rs
find_node_file fast-path matches <node_id>.md across any depth ≥ 2; on fast-path miss it scans markdown files, parses frontmatter, and returns the path whose parsed id equals node_id.
Two-phase reverse-edge repair implementation
crates/tempyr-core/src/ops.rs
repair_reverse_edges accumulates pending target-node mutations in-memory keyed by target_id (using loaded Node objects with their file_path), merges multiple additions, then sorts edges, sets a single updated timestamp, and performs atomic_write per affected node.
Regression tests for resolution and repair
crates/tempyr-core/src/ops.rs
Tests added for frontmatter-based resolution when filenames differ, for find_node_file resolving by parsed id, for repair_reverse_edges writing to the actual misaligned on-disk file (no phantom files), and for skipping dangling forward-edge targets missing from the loaded graph.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant repair_reverse_edges
  participant loaded_graph
  participant pending_map
  participant atomic_write
  Caller->>repair_reverse_edges: call with edges to repair
  repair_reverse_edges->>loaded_graph: resolve target nodes by id
  loaded_graph-->>repair_reverse_edges: Node instances (with file_path)
  repair_reverse_edges->>pending_map: accumulate/merge reverse-edge additions
  repair_reverse_edges->>repair_reverse_edges: sort edges, set single updated timestamp
  repair_reverse_edges->>atomic_write: atomic_write(Node.file_path, serialized Node)
  atomic_write-->>repair_reverse_edges: write complete
  repair_reverse_edges-->>Caller: return
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

I hop through nodes and parse their hearts,
Filename first, then frontmatter parts,
I gather broken backlinks in a heap,
Commit them once — safe dreams and sleep,
No phantom files — the graph sings sweet. 🐇✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: fixing node resolution to use frontmatter id instead of filename, which is the core problem and solution described in the PR objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/silly-davinci-55e5f2

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/tempyr-core/src/ops.rs (1)

547-589: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate the frontmatter-ID fallback into resolve_node_id as well.

This lookup is only half-fixed end to end. find_node_file now handles filename != id, but callers such as crates/tempyr-mcp/src/handler.rs:1339-1347 still call resolve_node_id first, and Lines 102-146 here still resolve exact IDs by filename/suffix only. For a node like features/README.md with id: feat-overview, graph_get_node("feat-overview") will still fail before this slow path runs. Please teach resolve_node_id the same frontmatter-ID fallback, or bypass it for exact-ID lookups.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/tempyr-core/src/ops.rs` around lines 547 - 589, resolve_node_id
currently only checks filename==id and misses files whose frontmatter id
differs; update resolve_node_id to reuse the two-phase lookup implemented in
find_node_file (or call find_node_file directly) so it first searches for
<id>.md in WalkDir::min_depth(2) and, if not found, scans .md files reading
content and calling parse_node(...) to match node.id() against the requested id;
this keeps behavior consistent with find_node_file and fixes lookups like
features/README.md with id: feat-overview.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/tempyr-core/src/ops.rs`:
- Around line 547-589: resolve_node_id currently only checks filename==id and
misses files whose frontmatter id differs; update resolve_node_id to reuse the
two-phase lookup implemented in find_node_file (or call find_node_file directly)
so it first searches for <id>.md in WalkDir::min_depth(2) and, if not found,
scans .md files reading content and calling parse_node(...) to match node.id()
against the requested id; this keeps behavior consistent with find_node_file and
fixes lookups like features/README.md with id: feat-overview.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 52488d16-575f-4253-9080-74abfbe1a9de

📥 Commits

Reviewing files that changed from the base of the PR and between e8c4d14 and a0baccd.

📒 Files selected for processing (1)
  • crates/tempyr-core/src/ops.rs

Same shape of bug as find_node_file had: resolve_node_id only matched
filename == `<query>.md` (and capped at depth 2), so MCP/CLI lookups
against a misaligned file (`<dir>/README.md` with `id: feat-overview`)
failed even though Graph could load the node.

Add a slow-path frontmatter scan after the filename miss and drop the
depth cap, mirroring find_node_file. The suffix-only path is left as-is
since it's a separate concern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/tempyr-core/src/ops.rs (1)

144-146: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Inconsistent depth constraint in suffix-only match.

The fast/slow paths above now walk without max_depth, matching Graph::load_from_directory and find_node_file. However, the suffix-only match still has max_depth(2), meaning suffix resolution won't find files at depth > 2 while exact-id and frontmatter lookups will.

If nested directories are intentionally excluded for suffix matching, consider documenting this in the doc comment. Otherwise, remove max_depth(2) for consistency.

♻️ Suggested diff if suffix lookup should also be unbounded
     if id::is_valid_suffix(query) {
         let suffix_pattern = format!("-{query}.md");
         let mut matches = Vec::new();

         for entry in WalkDir::new(graph_dir)
             .min_depth(2)
-            .max_depth(2)
             .into_iter()
             .filter_map(|e| e.ok())
         {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/tempyr-core/src/ops.rs` around lines 144 - 146, The suffix-only lookup
loop uses WalkDir::new(graph_dir).min_depth(2).max_depth(2) which makes suffix
resolution inconsistent with the other paths (Graph::load_from_directory and
find_node_file) that walk unbounded; remove the .max_depth(2) call from the
WalkDir iterator in the suffix-only match so it walks the same depth range as
the other code paths (or if the intent was to restrict suffix matches, add a doc
comment on the suffix-only branch making that restriction explicit).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/tempyr-core/src/ops.rs`:
- Around line 144-146: The suffix-only lookup loop uses
WalkDir::new(graph_dir).min_depth(2).max_depth(2) which makes suffix resolution
inconsistent with the other paths (Graph::load_from_directory and
find_node_file) that walk unbounded; remove the .max_depth(2) call from the
WalkDir iterator in the suffix-only match so it walks the same depth range as
the other code paths (or if the intent was to restrict suffix matches, add a doc
comment on the suffix-only branch making that restriction explicit).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0d24e98b-6c65-4cc2-b214-1fdde0bcf8ea

📥 Commits

Reviewing files that changed from the base of the PR and between a0baccd and 8832ade.

📒 Files selected for processing (1)
  • crates/tempyr-core/src/ops.rs

The suffix-only branch of resolve_node_id still had .max_depth(2),
inconsistent with Graph::load_from_directory, find_node_file, and the
two phases above it — a node at e.g. `tasks/2026-q2/<id>.md` (depth 3)
would load into the graph but be unfindable by suffix.

Drop the depth cap and filter for `.md` extension explicitly. Adds a
regression test that resolves a suffix against a node nested below the
type directory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/tempyr-core/src/ops.rs (1)

143-158: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't synthesize node IDs from suffix-matched filenames.

Line 157 rebuilds the resolved id from the basename. Once filename/frontmatter mismatches are valid, abc123 can resolve to the filename stem instead of the real node id. For example, tasks/overview-abc123.md with id: ship-it-abc123 would return overview-abc123, and downstream add-edge/status flows would then operate on a non-existent id. Parse the matched file and return node.id() after confirming the parsed id still carries that suffix. The exact-filename fast path above has the same canonicalization problem.

Proposed fix
         for entry in WalkDir::new(graph_dir)
             .min_depth(2)
             .into_iter()
             .filter_map(|e| e.ok())
         {
             let path = entry.path();
             if path.extension().is_none_or(|ext| ext != "md") {
                 continue;
             }
             let name = entry.file_name().to_string_lossy();
             if name.ends_with(&suffix_pattern) {
-                let stem = name.strip_suffix(".md").unwrap();
-                matches.push(stem.to_string());
+                let Ok(content) = std::fs::read_to_string(path) else {
+                    continue;
+                };
+                let Ok(node) = parse_node(&content, path.to_path_buf()) else {
+                    continue;
+                };
+                if id::parse_node_id(node.id())
+                    .is_some_and(|parsed| parsed.suffix == query)
+                {
+                    matches.push(node.id().to_string());
+                }
             }
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/tempyr-core/src/ops.rs` around lines 143 - 158, The current loop
builds candidate IDs by taking the filename stem (via name.strip_suffix(".md"))
which can diverge from the real node id in frontmatter; instead, open and parse
the matched file and use the parsed node.id() (after verifying it ends_with the
expected suffix_pattern) when pushing into matches rather than the stem; update
the same canonicalization logic in the earlier exact-filename fast path so both
branches canonicalize by parsing the file and using node.id() not the filename
stem; reference the WalkDir loop, the name.ends_with(&suffix_pattern) check, and
replace pushing stem.to_string() with the parsed node.id() after validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/tempyr-core/src/ops.rs`:
- Around line 143-158: The current loop builds candidate IDs by taking the
filename stem (via name.strip_suffix(".md")) which can diverge from the real
node id in frontmatter; instead, open and parse the matched file and use the
parsed node.id() (after verifying it ends_with the expected suffix_pattern) when
pushing into matches rather than the stem; update the same canonicalization
logic in the earlier exact-filename fast path so both branches canonicalize by
parsing the file and using node.id() not the filename stem; reference the
WalkDir loop, the name.ends_with(&suffix_pattern) check, and replace pushing
stem.to_string() with the parsed node.id() after validation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 52401ea3-9850-484d-b11c-6715989c21d3

📥 Commits

Reviewing files that changed from the base of the PR and between 8832ade and 7c21b1b.

📒 Files selected for processing (1)
  • crates/tempyr-core/src/ops.rs

Both fast-path and suffix-only branches were trusting the on-disk
filename. If a file's name has drifted from its frontmatter `id:` (the
authoritative source), callers would get back a stem that no node
actually answers to — exactly the misaligned-file shape this PR set
out to handle.

- Fast path: after a `{query}.md` filename hit, parse the file and
  return `node.id()` rather than `query`.
- Suffix path: use the filename pattern as a cheap prefilter, then
  parse and push `node.id()` only when it also ends with `-{suffix}`.

Two regression tests cover the divergence cases (`wrong-name.md` with
`id: real-name`, and `foo-abc123.md` with `id: bar-abc123`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant