fix(validate): resolve nodes by frontmatter id, not filename by cleak · Pull Request #47 · cleak/tempyr

cleak · 2026-05-26T16:34:50Z

Summary

validate --fix crashed mid-run when a node's filename didn't match its id: (e.g. a hub at <dir>/README.md with id: hub-<dir>-readme). The crash also left partial reverse-edge writes in unrelated files, which then needed manual de-duping.
Root cause: find_node_file (and the parallel resolve_node_id used by MCP/CLI) reconstructed lookups from the filename and bailed via ? after earlier writes had already landed. Fix: fall back to scanning frontmatter on filename miss, and have repair_reverse_edges use each target's file_path from the loaded graph and stage all edits in memory before any write.

What changed

find_node_file (crates/tempyr-core/src/ops.rs): added a slow-path fallback that parses frontmatter when the filename match misses; dropped max_depth(2) so it matches Graph::load_from_directory.
resolve_node_id: same two-phase lookup — falls back to frontmatter id when the filename match misses. Otherwise MCP/CLI node lookups (graph_update_node, graph_get_node, add-edge, status, …) silently failed against the same misaligned files.
repair_reverse_edges: now uses graph.get_node(target_id).file_path directly. Accumulates all reverse-edge additions into an in-memory HashMap<String, Node> first, then writes every file in a single commit pass. Errors during collection abort before any filesystem write.
Four regression tests in ops::tests.

Test plan

cargo test --workspace --lib — 490 passed, 0 failed
cargo clippy -p tempyr-core --lib --tests -- -D warnings — clean
cargo fmt --check — clean
End-to-end smoke with the bug-report schema (type: hub, directory = "hubs", file at notes/README.md with id: hub-notes-readme):
- reverse edge lands in notes/README.md
- no phantom hubs/hub-notes-readme.md is created

Bug-report items left for follow-ups

The original report also flagged three smaller quirks that aren't in scope here:

graph_dir = "." in config.toml is ignored
updated: rejects bare YAML date scalars with a misleading "premature end of input"
no --version flag

Happy to tackle these as separate PRs.

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Improved node lookup: filename-first search with a fallback that reads frontmatter to match IDs; still supports suffix-based matches for deeply nested nodes.
- More robust reverse-edge repairs: collect and apply all updates safely in-memory, preserve original file locations, avoid creating phantom paths or partial writes, and apply a consistent timestamp.
Tests
- Added regressions for mismatched filename/ID resolution, deep-node suffix resolution, and safe edge-repair behavior.

`validate --fix` crashed mid-run when a node's on-disk filename didn't match its `id:` (e.g. a hub at `<dir>/README.md` with `id: hub-<dir>-readme`), because `find_node_file` reconstructed the path as `<type-dir>/<id>.md` and bailed via `?` after earlier reverse-edge writes had already landed — leaving partial writes in unrelated files. - `find_node_file` now falls back to scanning frontmatter when the filename match misses, and walks at any depth >= 2 to match `Graph::load_from_directory`. - `repair_reverse_edges` uses each target's `file_path` from the loaded graph (skipping `find_node_file` entirely) and accumulates all edits in memory before any write. Errors during collection abort before touching the filesystem. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-26T16:35:04Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 70d81abe-1735-4a96-a63c-f5de7d890725

📥 Commits

Reviewing files that changed from the base of the PR and between 7c21b1b and 5d84aa3.

📒 Files selected for processing (1)

crates/tempyr-core/src/ops.rs

📝 Walkthrough

Walkthrough

resolve_node_id/find_node_file now try exact {id}.md matches across deep subtrees and, if missing, scan markdown files and resolve by parsed frontmatter id. repair_reverse_edges accumulates all repairs in-memory keyed by target node, then sorts and atomically writes each modified node file with a single updated timestamp. Tests add regressions for these behaviors.

Changes

Node graph file resolution and reverse-edge repair robustness

Layer / File(s)	Summary
resolve_node_id two-phase lookup `crates/tempyr-core/src/ops.rs`	`resolve_node_id` first tries `{query}.md` filename matches (depth ≥ 2). If not found, it scans `*.md` files, parses frontmatter, and returns the node whose `id` equals the query; suffix-only fallback now verifies parsed ids.
find_node_file filename fast-path + frontmatter fallback `crates/tempyr-core/src/ops.rs`	`find_node_file` fast-path matches `<node_id>.md` across any depth ≥ 2; on fast-path miss it scans markdown files, parses frontmatter, and returns the path whose parsed `id` equals `node_id`.
Two-phase reverse-edge repair implementation `crates/tempyr-core/src/ops.rs`	`repair_reverse_edges` accumulates pending target-node mutations in-memory keyed by `target_id` (using loaded `Node` objects with their `file_path`), merges multiple additions, then sorts edges, sets a single `updated` timestamp, and performs `atomic_write` per affected node.
Regression tests for resolution and repair `crates/tempyr-core/src/ops.rs`	Tests added for frontmatter-based resolution when filenames differ, for find_node_file resolving by parsed `id`, for repair_reverse_edges writing to the actual misaligned on-disk file (no phantom files), and for skipping dangling forward-edge targets missing from the loaded graph.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant repair_reverse_edges
  participant loaded_graph
  participant pending_map
  participant atomic_write
  Caller->>repair_reverse_edges: call with edges to repair
  repair_reverse_edges->>loaded_graph: resolve target nodes by id
  loaded_graph-->>repair_reverse_edges: Node instances (with file_path)
  repair_reverse_edges->>pending_map: accumulate/merge reverse-edge additions
  repair_reverse_edges->>repair_reverse_edges: sort edges, set single updated timestamp
  repair_reverse_edges->>atomic_write: atomic_write(Node.file_path, serialized Node)
  atomic_write-->>repair_reverse_edges: write complete
  repair_reverse_edges-->>Caller: return

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

I hop through nodes and parse their hearts,
Filename first, then frontmatter parts,
I gather broken backlinks in a heap,
Commit them once — safe dreams and sleep,
No phantom files — the graph sings sweet. 🐇✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main change: fixing node resolution to use frontmatter id instead of filename, which is the core problem and solution described in the PR objectives.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/silly-davinci-55e5f2

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/tempyr-core/src/ops.rs (1)
547-589: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate the frontmatter-ID fallback into resolve_node_id as well.

This lookup is only half-fixed end to end. find_node_file now handles filename != id, but callers such as crates/tempyr-mcp/src/handler.rs:1339-1347 still call resolve_node_id first, and Lines 102-146 here still resolve exact IDs by filename/suffix only. For a node like features/README.md with id: feat-overview, graph_get_node("feat-overview") will still fail before this slow path runs. Please teach resolve_node_id the same frontmatter-ID fallback, or bypass it for exact-ID lookups.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/tempyr-core/src/ops.rs` around lines 547 - 589, resolve_node_id
currently only checks filename==id and misses files whose frontmatter id
differs; update resolve_node_id to reuse the two-phase lookup implemented in
find_node_file (or call find_node_file directly) so it first searches for
<id>.md in WalkDir::min_depth(2) and, if not found, scans .md files reading
content and calling parse_node(...) to match node.id() against the requested id;
this keeps behavior consistent with find_node_file and fixes lookups like
features/README.md with id: feat-overview.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/tempyr-core/src/ops.rs`:
- Around line 547-589: resolve_node_id currently only checks filename==id and
misses files whose frontmatter id differs; update resolve_node_id to reuse the
two-phase lookup implemented in find_node_file (or call find_node_file directly)
so it first searches for <id>.md in WalkDir::min_depth(2) and, if not found,
scans .md files reading content and calling parse_node(...) to match node.id()
against the requested id; this keeps behavior consistent with find_node_file and
fixes lookups like features/README.md with id: feat-overview.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 52488d16-575f-4253-9080-74abfbe1a9de

📥 Commits

Reviewing files that changed from the base of the PR and between e8c4d14 and a0baccd.

📒 Files selected for processing (1)

crates/tempyr-core/src/ops.rs

Same shape of bug as find_node_file had: resolve_node_id only matched filename == `<query>.md` (and capped at depth 2), so MCP/CLI lookups against a misaligned file (`<dir>/README.md` with `id: feat-overview`) failed even though Graph could load the node. Add a slow-path frontmatter scan after the filename miss and drop the depth cap, mirroring find_node_file. The suffix-only path is left as-is since it's a separate concern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/tempyr-core/src/ops.rs (1)
144-146: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Inconsistent depth constraint in suffix-only match.

The fast/slow paths above now walk without max_depth, matching Graph::load_from_directory and find_node_file. However, the suffix-only match still has max_depth(2), meaning suffix resolution won't find files at depth > 2 while exact-id and frontmatter lookups will.

If nested directories are intentionally excluded for suffix matching, consider documenting this in the doc comment. Otherwise, remove max_depth(2) for consistency.
♻️ Suggested diff if suffix lookup should also be unbounded
     if id::is_valid_suffix(query) {
         let suffix_pattern = format!("-{query}.md");
         let mut matches = Vec::new();

         for entry in WalkDir::new(graph_dir)
             .min_depth(2)
-            .max_depth(2)
             .into_iter()
             .filter_map(|e| e.ok())
         {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/tempyr-core/src/ops.rs` around lines 144 - 146, The suffix-only lookup
loop uses WalkDir::new(graph_dir).min_depth(2).max_depth(2) which makes suffix
resolution inconsistent with the other paths (Graph::load_from_directory and
find_node_file) that walk unbounded; remove the .max_depth(2) call from the
WalkDir iterator in the suffix-only match so it walks the same depth range as
the other code paths (or if the intent was to restrict suffix matches, add a doc
comment on the suffix-only branch making that restriction explicit).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/tempyr-core/src/ops.rs`:
- Around line 144-146: The suffix-only lookup loop uses
WalkDir::new(graph_dir).min_depth(2).max_depth(2) which makes suffix resolution
inconsistent with the other paths (Graph::load_from_directory and
find_node_file) that walk unbounded; remove the .max_depth(2) call from the
WalkDir iterator in the suffix-only match so it walks the same depth range as
the other code paths (or if the intent was to restrict suffix matches, add a doc
comment on the suffix-only branch making that restriction explicit).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0d24e98b-6c65-4cc2-b214-1fdde0bcf8ea

📥 Commits

Reviewing files that changed from the base of the PR and between a0baccd and 8832ade.

📒 Files selected for processing (1)

crates/tempyr-core/src/ops.rs

The suffix-only branch of resolve_node_id still had .max_depth(2), inconsistent with Graph::load_from_directory, find_node_file, and the two phases above it — a node at e.g. `tasks/2026-q2/<id>.md` (depth 3) would load into the graph but be unfindable by suffix. Drop the depth cap and filter for `.md` extension explicitly. Adds a regression test that resolves a suffix against a node nested below the type directory. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/tempyr-core/src/ops.rs (1)

143-158: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't synthesize node IDs from suffix-matched filenames.

Line 157 rebuilds the resolved id from the basename. Once filename/frontmatter mismatches are valid, abc123 can resolve to the filename stem instead of the real node id. For example, tasks/overview-abc123.md with id: ship-it-abc123 would return overview-abc123, and downstream add-edge/status flows would then operate on a non-existent id. Parse the matched file and return node.id() after confirming the parsed id still carries that suffix. The exact-filename fast path above has the same canonicalization problem.

Proposed fix

         for entry in WalkDir::new(graph_dir)
             .min_depth(2)
             .into_iter()
             .filter_map(|e| e.ok())
         {
             let path = entry.path();
             if path.extension().is_none_or(|ext| ext != "md") {
                 continue;
             }
             let name = entry.file_name().to_string_lossy();
             if name.ends_with(&suffix_pattern) {
-                let stem = name.strip_suffix(".md").unwrap();
-                matches.push(stem.to_string());
+                let Ok(content) = std::fs::read_to_string(path) else {
+                    continue;
+                };
+                let Ok(node) = parse_node(&content, path.to_path_buf()) else {
+                    continue;
+                };
+                if id::parse_node_id(node.id())
+                    .is_some_and(|parsed| parsed.suffix == query)
+                {
+                    matches.push(node.id().to_string());
+                }
             }
         }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/tempyr-core/src/ops.rs` around lines 143 - 158, The current loop
builds candidate IDs by taking the filename stem (via name.strip_suffix(".md"))
which can diverge from the real node id in frontmatter; instead, open and parse
the matched file and use the parsed node.id() (after verifying it ends_with the
expected suffix_pattern) when pushing into matches rather than the stem; update
the same canonicalization logic in the earlier exact-filename fast path so both
branches canonicalize by parsing the file and using node.id() not the filename
stem; reference the WalkDir loop, the name.ends_with(&suffix_pattern) check, and
replace pushing stem.to_string() with the parsed node.id() after validation.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/tempyr-core/src/ops.rs`:
- Around line 143-158: The current loop builds candidate IDs by taking the
filename stem (via name.strip_suffix(".md")) which can diverge from the real
node id in frontmatter; instead, open and parse the matched file and use the
parsed node.id() (after verifying it ends_with the expected suffix_pattern) when
pushing into matches rather than the stem; update the same canonicalization
logic in the earlier exact-filename fast path so both branches canonicalize by
parsing the file and using node.id() not the filename stem; reference the
WalkDir loop, the name.ends_with(&suffix_pattern) check, and replace pushing
stem.to_string() with the parsed node.id() after validation.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 52401ea3-9850-484d-b11c-6715989c21d3

📥 Commits

Reviewing files that changed from the base of the PR and between 8832ade and 7c21b1b.

📒 Files selected for processing (1)

crates/tempyr-core/src/ops.rs

Both fast-path and suffix-only branches were trusting the on-disk filename. If a file's name has drifted from its frontmatter `id:` (the authoritative source), callers would get back a stem that no node actually answers to — exactly the misaligned-file shape this PR set out to handle. - Fast path: after a `{query}.md` filename hit, parse the file and return `node.id()` rather than `query`. - Suffix path: use the filename pattern as a cheap prefilter, then parse and push `node.id()` only when it also ends with `-{suffix}`. Two regression tests cover the divergence cases (`wrong-name.md` with `id: real-name`, and `foo-abc123.md` with `id: bar-abc123`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(validate): resolve nodes by frontmatter id, not filename#47

fix(validate): resolve nodes by frontmatter id, not filename#47
cleak wants to merge 4 commits into
masterfrom
claude/silly-davinci-55e5f2

cleak commented May 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 26, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cleak commented May 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Test plan

Bug-report items left for follow-ups

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cleak commented May 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 26, 2026 •

edited

Loading