fix(graph): cloud pull sha over stable fields + code-graph PRD by efenocchi · Pull Request #228 · activeloopai/hivemind

efenocchi · 2026-06-03T00:44:13Z

Summary

Two graph-scoped changes that belong together:

fix(graph): cloud pull sha verification — P1 bug. Push stored snapshot_sha256 computed over the stable fields only (excludes observation), but pull was hashing the entire payload (observation included) and comparing the two. Every real cloud row was rejected with snapshot_sha256 mismatch — team pull silently failed; only legacy rows without a populated sha column slipped through. Pull now parses the payload, validates nodes/links shape, and recomputes the hash via computeSnapshotSha256 (the same function push uses), comparing the stable-only hash against the column.
docs(graph): code-graph PRD — turns the planning notes into a PRD with problem / goals / non-goals / invariant / success metrics / dispatch instructions, plus the detailed A–H backlog (per-task acceptance, effort, files). LLM enrichment (block E) and embeddings (block F) are strictly opt-in, gated on OPENAI_API_KEY / OPENROUTER_API_KEY (or the local embeddings daemon for F). Default stays 100% AST, zero network, zero cost.

How the bug was found

Cross-read of our src/graph/ against graphify and Understand-Anything (cloned into a sibling dir), then a Codex review pass flagged the push/pull hash divergence. Confirmed by quoting deeplake-push.ts:96 (stable hash) vs deeplake-pull.ts:155 (full-payload hash, now fixed).

Tests

tests/shared/graph/deeplake-pull.test.ts: 23/23 green (+5 new regression tests).
- Row whose observation differs from build time → accepts (was the production failure).
- Tampered payload claiming the original sha → rejects.
- Non-JSON / shape-invalid payload → rejects.
- Asserts the column hash (stable) ≠ full-payload hash, so a future "fix" collapsing the two fails the test.
npm run build green (tsc + esbuild).
Full suite: pre-existing failures only in tests/cli/install-consent.test.ts / cli-index.test.ts (untouched by this diff).

Notes

A5 in the PRD is marked ✅ done — it's the fix in this PR.
Non-git pull path unchanged (pull is git-only today).

Session Context

Session transcript

Summary by CodeRabbit

Documentation
- Added comprehensive Code Graph feature roadmap documentation with architecture, goals, and implementation timeline
Bug Fixes
- Enhanced snapshot validation to detect and reject corrupted or tampered code graph snapshots
- Improved error handling for malformed snapshot data
Tests
- Added regression test suite for snapshot integrity verification

Push stores snapshot_sha256 computed over the stable fields only (excludes observation), but pull was hashing the entire payload bytes (observation included) and comparing the two. Every real cloud row was therefore rejected with 'snapshot_sha256 mismatch' — team pull silently failed and only legacy rows without a populated sha column slipped through. Pull now parses the payload, validates nodes/links shape, and recomputes the hash via computeSnapshotSha256 (the same function push uses), comparing the stable-only hash against the column. Adds a regression suite covering: a row whose observation differs accepts, a tampered payload rejects, and non-JSON / shape-invalid payloads reject.

Turns the planning notes into a PRD: problem, goals/non-goals, project invariant, success metrics, and dispatch instructions (one block = one unit of work an agent/team can pick up). Keeps the detailed A-H backlog with per-task acceptance/effort/files. LLM enrichment (block E) and embeddings (block F) are strictly opt-in, gated on OPENAI_API_KEY / OPENROUTER_API_KEY (or the local embeddings daemon for F). Default stays 100% AST, zero network, zero cost. A5 (cloud pull sha bug) is marked done — fixed in the preceding commit.

coderabbitai · 2026-06-03T00:44:23Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a91f32c8-6d9f-4e82-9a3e-eee8ea053b80

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR implements deterministic snapshot hash verification for cloud-based graph snapshots: pullSnapshot now parses and validates the cloud payload structure (including required nodes and links arrays) before computing a stable-field hash, rejecting malformed or incomplete payloads. The changes include updated test fixtures and comprehensive regression tests, alongside a complete Code Graph feature roadmap documenting the multi-block implementation strategy.

Changes

Code Graph Snapshot Verification and Roadmap

Layer / File(s)	Summary
Cloud Snapshot Hash Verification Logic `src/graph/deeplake-pull.ts`	Imports `computeSnapshotSha256` and `GraphSnapshot` type. `pullSnapshot` now parses `cloudPayload` as JSON, validates the presence of `nodes` and `links` arrays, computes the stable-field hash via `computeSnapshotSha256`, and returns error outcomes for parsing failures or missing required arrays instead of writing corrupt snapshots.
Hash Verification Test Fixtures and Regression Suite `tests/shared/graph/deeplake-pull.test.ts`	Imports `canonicalSnapshot` and `computeSnapshotSha256`. Test fixture is updated to derive `CLOUD_PAYLOAD` and `CLOUD_PAYLOAD_SHA` from a typed `GraphSnapshot` using canonical and hashing functions to enforce the stable-vs-full-payload hash contract. New regression test suite verifies stable-field hash divergence from raw payload hash, acceptance when column hash matches stable contract, rejection of tampered/malformed/incomplete payloads, and proper error handling for invalid JSON and missing arrays.
Code Graph Project Roadmap PRD `docs/GRAPH-ROADMAP.md`	Comprehensive PRD for the Code Graph initiative: project status, scope, objectives/non-objectives, invariants (AST-first; opt-in LLM/embeddings; no sync-build LLM), success metrics, and execution roadmap spanning Blocks A–H. Block A covers infrastructure, VFS interception, and Cursor parity (A5 "fix cloud pull hash" marked ✅ DONE). Block B specifies deterministic structural graph with cross-file `calls`/`imports`/`extends` and multi-language support. Blocks C–D cover VFS query surface (`query/<pattern>` one-shot expansion) and enhanced search (token OR/fuzzy ranking). Blocks E–F detail async opt-in LLM enrichment workers and optional hybrid embeddings with deterministic fallback. Blocks G–H cover team/memory integration and nice-to-have items. Includes baseline status, key files, environment variables, testing methodology, milestones (M1–M6), competitor reference table, and changelog.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

kaghni

Poem

🐰 A snapshot once flowed from the cloud unconfirmed,
But now our wee rabbit ensures payloads are learned—
We parse, we validate, hash stable and true,
With tests and a roadmap, the graph blooms anew!
Block A is complete, and the vision runs deep. 📊✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes both main changes: the cloud pull sha256 fix (P1 bug) over stable fields and the code-graph PRD documentation addition.
Description check	✅ Passed	The PR description comprehensively covers all required sections: Summary (detailed), How the bug was found, Tests (with specific results), and Notes. However, the Version Bump section from the template is not explicitly addressed.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/graph-pull-hash-mismatch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-03T00:45:58Z

Coverage Report

Scope: files changed in this PR. Enforced threshold: 90% per metric (per file via vitest.config.ts).

Status	Category	Percentage	Covered / Total
🟢	Lines	94.89% (🎯 90%)	427 / 450
🟢	Statements	93.73% (🎯 90%)	478 / 510
🟢	Functions	90.00% (🎯 90%)	45 / 50
🔴	Branches	84.41% (🎯 90%)	341 / 404

File Coverage — 7 files changed

File	Stmts	Branches	Functions	Lines
`src/cli/install-cursor.ts`	🟢 96.7%	🔴 86.1%	🟢 100.0%	🟢 100.0%
`src/graph/deeplake-pull.ts`	🔴 81.9%	🔴 69.4%	🔴 77.8%	🔴 83.5%
`src/graph/graph-command.ts`	🟢 96.4%	🔴 87.0%	🟢 100.0%	🟢 100.0%
`src/hooks/codex/pre-tool-use.ts`	🟢 96.9%	🔴 86.4%	🔴 81.8%	🟢 98.0%
`src/hooks/cursor/pre-tool-use.ts`	🔴 89.5%	🟢 90.0%	🟢 100.0%	🔴 87.1%
`src/hooks/cursor/session-start.ts`	🟢 98.6%	🟢 91.7%	🔴 87.5%	🟢 100.0%
`src/hooks/hermes/pre-tool-use.ts`	🟢 91.9%	🟢 90.0%	🟢 100.0%	🔴 89.7%

_{Generated for commit 99741e5.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/graph/deeplake-pull.ts (1)
168-177: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Verify the payload's graph identity matches the selected row.

The SHA check only proves the payload is self-consistent; it does not prove it belongs to this head/repoKey. A mis-keyed row whose graph.commit_sha or graph.repo_key disagrees with the SELECT key will still be written to ${head}.json and recorded in .last-build.json, which poisons the local cache with a different snapshot. Add explicit equality checks against head and repoKey before the write path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/graph/deeplake-pull.ts` around lines 168 - 177, The SHA check only
ensures payload integrity but not that it belongs to the requested head/repoKey;
add explicit identity checks before any write path: after computing and
validating computedSha (using computeSnapshotSha256 and parsedSnapshot) verify
parsedSnapshot.graph.commit_sha === head and parsedSnapshot.graph.repo_key ===
repoKey (or their canonical forms) and if either mismatches return
errorOutcome("SELECT cloud row", new Error(...)) with a clear message indicating
which field mismatched (commit_sha or repo_key) and the expected vs actual
values so the mis-keyed row is rejected rather than written to `${head}.json` /
`.last-build.json`.

🧹 Nitpick comments (1)

docs/GRAPH-ROADMAP.md (1)

9-9: ⚡ Quick win

Replace machine-local /tmp reference with a durable repo/PR artifact link.

Line 9 points to /tmp/codex-graph-output.md, which is not shareable/reproducible for reviewers. Prefer a committed file path or a PR comment/link.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/GRAPH-ROADMAP.md` at line 9, In docs/GRAPH-ROADMAP.md update the table
entry that references `/tmp/codex-graph-output.md` (the line containing
"graphify e Understand-Anything in `_graph-comparison/`; review Codex in
`/tmp/codex-graph-output.md`") so it points to a durable, shareable artifact —
either commit and reference a file in the repo (e.g.,
`_graph-comparison/codex-graph-output.md`) or replace the `/tmp/...` path with a
PR comment/link or CI/artifact URL; ensure the new reference is accessible to
reviewers and update the table cell text accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/GRAPH-ROADMAP.md`:
- Around line 159-173: Add language identifiers to the two fenced code blocks
that currently open without a language (the block containing the roadmap diagram
"Blocco A ... Blocco G" and the block starting "Query: 'pushSnapshot' — 2
matches...") so markdownlint rule MD040 is satisfied; replace the opening triple
backticks with triple backticks plus a language (use "text") for both fenced
blocks around the roadmap diagram and the query/example block.

In `@src/graph/deeplake-pull.ts`:
- Around line 158-166: After JSON.parse you must guard against non-object
results (e.g., JSON null) before accessing properties; update the code around
parsedSnapshot (the variable produced in the try/catch) to first check that
typeof parsedSnapshot === "object" && parsedSnapshot !== null and return
errorOutcome("parse cloud snapshot", new Error("snapshot not an object")) if
that fails, then perform the existing Array.isArray checks on (parsedSnapshot as
{ nodes?: unknown }).nodes and .links; ensure no property access occurs until
the non-null object guard passes so pullSnapshot returns the documented error
outcome for invalid payloads.

---

Outside diff comments:
In `@src/graph/deeplake-pull.ts`:
- Around line 168-177: The SHA check only ensures payload integrity but not that
it belongs to the requested head/repoKey; add explicit identity checks before
any write path: after computing and validating computedSha (using
computeSnapshotSha256 and parsedSnapshot) verify parsedSnapshot.graph.commit_sha
=== head and parsedSnapshot.graph.repo_key === repoKey (or their canonical
forms) and if either mismatches return errorOutcome("SELECT cloud row", new
Error(...)) with a clear message indicating which field mismatched (commit_sha
or repo_key) and the expected vs actual values so the mis-keyed row is rejected
rather than written to `${head}.json` / `.last-build.json`.

---

Nitpick comments:
In `@docs/GRAPH-ROADMAP.md`:
- Line 9: In docs/GRAPH-ROADMAP.md update the table entry that references
`/tmp/codex-graph-output.md` (the line containing "graphify e
Understand-Anything in `_graph-comparison/`; review Codex in
`/tmp/codex-graph-output.md`") so it points to a durable, shareable artifact —
either commit and reference a file in the repo (e.g.,
`_graph-comparison/codex-graph-output.md`) or replace the `/tmp/...` path with a
PR comment/link or CI/artifact URL; ensure the new reference is accessible to
reviewers and update the table cell text accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5b2e3eb7-e710-4ef5-a809-74edffa3eec2

📥 Commits

Reviewing files that changed from the base of the PR and between 2bf77fb and daafa32.

📒 Files selected for processing (3)

docs/GRAPH-ROADMAP.md
src/graph/deeplake-pull.ts
tests/shared/graph/deeplake-pull.test.ts

coderabbitai · 2026-06-03T00:48:08Z

+```
+Blocco A  Infra (Cursor parity, skill, pull hash)     ← blocker operativo
+    ↓
+Blocco B  Cross-file calls TS                        ← valore core del grafo
+    ↓
+Blocco C  query/ one-shot (2-in-1) + VFS arricchito  ← UX agente
+    ↓
+Blocco D  Ricerca fuzzy su find/query                 ← search migliore
+    ↓
+Blocco E  Enrichment LLM (async, opt-in)             ← summary, tags, layers
+    ↓
+Blocco F  Embeddings + hybrid search                 ← NL query, ranking vector
+    ↓
+Blocco G  Team enriched sync + altre lingue          ← dopo core solido
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks to satisfy markdownlint.

Line 159 and Line 504 open fenced blocks without a language, which triggers MD040 and can fail docs lint pipelines.

Proposed patch

-``` +```text Blocco A Infra (Cursor parity, skill, pull hash) ← blocker operativo ↓ Blocco B Cross-file calls TS ← valore core del grafo @@ Blocco G Team enriched sync + altre lingue ← dopo core solido -``` +``` -``` +```text Query: "pushSnapshot" — 2 matches, expanded 1 hop MATCH [1] src/graph/deeplake-push.ts:pushSnapshot:function (exported, fan_out=3) @@ MATCH [2] ... -``` +```

Also applies to: 504-512

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 159-159: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/GRAPH-ROADMAP.md` around lines 159 - 173, Add language identifiers to the two fenced code blocks that currently open without a language (the block containing the roadmap diagram "Blocco A ... Blocco G" and the block starting "Query: 'pushSnapshot' — 2 matches...") so markdownlint rule MD040 is satisfied; replace the opening triple backticks with triple backticks plus a language (use "text") for both fenced blocks around the roadmap diagram and the query/example block.

coderabbitai · 2026-06-03T00:48:08Z

+  let parsedSnapshot: GraphSnapshot;
+  try {
+    parsedSnapshot = JSON.parse(cloudPayload) as GraphSnapshot;
+  } catch (err) {
+    return errorOutcome("parse cloud snapshot", err);
+  }
+  if (!Array.isArray((parsedSnapshot as { nodes?: unknown }).nodes) ||
+      !Array.isArray((parsedSnapshot as { links?: unknown }).links)) {
+    return errorOutcome("parse cloud snapshot", new Error("snapshot missing nodes/links arrays"));


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject JSON null and other non-object payloads before touching nodes/links.

JSON.parse("null") succeeds here, and the subsequent property access throws before pullSnapshot can return the documented error outcome. Please first guard that the parsed value is a non-null object, then validate the snapshot shape before hashing or writing.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/graph/deeplake-pull.ts` around lines 158 - 166, After JSON.parse you must guard against non-object results (e.g., JSON null) before accessing properties; update the code around parsedSnapshot (the variable produced in the try/catch) to first check that typeof parsedSnapshot === "object" && parsedSnapshot !== null and return errorOutcome("parse cloud snapshot", new Error("snapshot not an object")) if that fails, then perform the existing Array.isArray checks on (parsedSnapshot as { nodes?: unknown }).nodes and .links; ensure no property access occurs until the non-null object guard passes so pullSnapshot returns the documented error outcome for invalid payloads.

…t + codex) - F0: embeddings are explicitly opt-in and persisted (settings.json env toggle via update-config, or graph_embeddings flag in ~/.deeplake/config.json). Default OFF; missing backend falls back to lexical with no error. - Section 6 Definition of Done, applies to every block: tests, MANDATORY cross-agent verification (Cursor was silently broken and we didn't know), and Codex review via the gstack codex skill, capped at 2 invocations/block.

- Section 7 Progress: GitHub-renderable checklist of every block grouped by milestone; A5 ticked (done), rest open. - F1: note that the embeddings daemon already exists and today only embeds memory (summary/message), not the graph; this block reuses it.

…d wiring Closes the silent gap where the code graph only worked on Claude Code. - New shared parser src/graph/graph-command.ts (tryGraphRead + parseReadTargetPath): given a rewritten shell command, detects a cat/head/tail/ls on the /graph/* virtual subtree and synthesizes the body via handleGraphVfs. One source of truth so per-agent hooks stop drifting. - Wire it into Cursor, Codex, and Hermes pre-tool-use hooks (Claude Code already had inline handling). Each returns its agent-correct decision shape (cursor updated_input echo; codex/hermes {action:block}). - Cursor SessionStart now emits graphContextLine (was missing). - Cursor auto-build: add graph-on-stop to the cursor esbuild bundle (with tree-sitter externalized) and register it on stop + sessionEnd in install-cursor.ts buildHookConfig(). Hardened per two rounds of Codex review: only cat<path> or a single trailing |head/|tail is a graph read (a |grep falls through); reject .. path traversal; strip quotes and skip leading flags; require exactly one file operand (multi-file reads fall through). Codex hook uses input.cwd. Tests: tests/shared/graph/graph-command.test.ts (16) cover the parser edge cases and routing; full suite 3751 green.

Read-only skill teaching the model when/how to query the code graph via the memory mount (index.md / find/ / show/), with anti-patterns: Incoming (0) != dead code (intra-file calls only), the graph can be stale, find/ is lexical not semantic, and never run a build (hooks handle it). Cursor and pi have no skill loader and rely on the SessionStart inject instead.

A1-A4 ticked in the progress checklist; baseline table and a DoD 6.2 cross-agent status block record what works where (read intercept on CC/Cursor/Codex/Hermes; OpenClaw/pi gaps and Codex/Hermes auto-build deferred to G3) so the next session doesn't rediscover them by surprise.

efenocchi added 2 commits June 3, 2026 00:42

coderabbitai Bot requested a review from kaghni June 3, 2026 00:44

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

efenocchi added 5 commits June 3, 2026 00:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(graph): cloud pull sha over stable fields + code-graph PRD#228

fix(graph): cloud pull sha over stable fields + code-graph PRD#228
efenocchi wants to merge 7 commits into
mainfrom
fix/graph-pull-hash-mismatch

efenocchi commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Uh oh!

coderabbitai Bot Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

efenocchi commented Jun 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How the bug was found

Tests

Notes

Session Context

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

efenocchi commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading

github-actions Bot commented Jun 3, 2026 •

edited

Loading