Skip to content

sync-gbrain memory stage rm -rf'd my repo root: resume path trusts checkpoint.dir without verifying it's a gstack staging dir #1802

@andrefogelman

Description

@andrefogelman

Summary

/sync-gbrain's memory-ingest stage recursively deleted my entire repo working directory (rm -rf equivalent on the repo root, including all source). The deletion happened because the resume path trusted a stale ~/.gbrain/import-checkpoint.json whose dir field pointed at the repo root (cwd) instead of a gstack-owned staging dir, and then ran the staging-dir cleanup (rmSync(dir, { recursive: true, force: true })) on it.

Restored from GitHub afterward (no commits lost), but untracked files were destroyed.

  • gstack version: 1.52.2.0
  • OS: macOS (darwin)
  • gbrain: 0.41.5.0, engine postgres (Supabase), MCP local-stdio, pooler transaction-mode

Root cause

bin/gstack-memory-ingest.ts stages prepared pages to a throwaway temp dir and, in a finally block, cleans it up:

// :1712
} finally {
  cleanupStagingDir(stagingDir);   // :1713  → rmSync(dir, { recursive: true, force: true })  (:1263)
  _activeStagingDir = null;
}

The resume feature (#1611) reuses a prior staging dir when a previous run was SIGTERM'd, reading the dir from gbrain's checkpoint:

// gstack-memory-ingest.ts:1493-1502
const resumeDir = process.env.GSTACK_INGEST_RESUME_DIR;
const resuming = !remoteHttpMode && typeof resumeDir === "string"
  && resumeDir.length > 0 && existsSync(resumeDir);
const stagingDir = resuming ? resumeDir! : ... makeStagingDir();

GSTACK_INGEST_RESUME_DIR is set by the orchestrator from the checkpoint (gstack-gbrain-sync.ts:892-898, decideResume() :169-190). decideResume() only checks that checkpoint.dir exists and is a directory — it never verifies the path is a gstack-owned staging dir (i.e. under ~/.gstack/.staging-ingest-*).

So when the checkpoint's dir is the repo root, the run "resumes" with stagingDir = <repo root>, skips writeStaged, runs gbrain import, then the finally block rmSync's the repo root.

How the checkpoint got poisoned

In my session the pooler was in transaction-mode and gbrain import kept failing with prepared statement "" does not exist (the GBRAIN_PREPARE=true issue). A run got interrupted/SIGTERM'd while cwd was the repo root, and gbrain wrote import-checkpoint.json with dir = cwd = repo root. The next /sync-gbrain hit the resume path and deleted it.

Log evidence from the runs:

[sync:memory] previous checkpoint stale (staging dir /Users/.../anf-nano gone), restaging from scratch
[sync:memory] resuming from gbrain checkpoint (0/0 files staged at /Users/.../anf-nano)

(The path is the repo root, not a .staging-ingest-* dir.)

Suggested fix

Guard decideResume() (and/or the resuming check in gstack-memory-ingest.ts) so a checkpoint dir is only honored when it is a gstack-owned staging path. Minimal:

import { resolve } from "path";
const stagingRoot = resolve(GSTACK_HOME);  // ~/.gstack
const d = resolve(cp.dir);
const isOwned = d.startsWith(resolve(stagingRoot, ".staging-ingest-"))
             || d.startsWith(resolve(GSTACK_HOME, "transcripts")); // remote-http persistent dir
if (!isOwned) return { kind: "stale-staging-missing", stagingDir: cp.dir };

Defense-in-depth: cleanupStagingDir() itself should refuse to rmSync any path that isn't under the staging root, regardless of how it was derived. A deletion helper that can be pointed at an arbitrary path is the real footgun.

Impact

force: true recursive delete of a user's repo (incl. untracked/uncommitted work) with no confirmation. High severity even though commits survive on the remote.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions