Skip to content

fix: prevent silent hang in --delta sync when Local API ignores edited.since#85

Open
lavarius wants to merge 1 commit into
jcfischer:mainfrom
lavarius:fix/bug-6-delta-sync-progress
Open

fix: prevent silent hang in --delta sync when Local API ignores edited.since#85
lavarius wants to merge 1 commit into
jcfischer:mainfrom
lavarius:fix/bug-6-delta-sync-progress

Conversation

@lavarius
Copy link
Copy Markdown

TL;DR

supertag sync index --delta silently hangs for an hour or more on large workspaces. Root cause: the Tana Desktop Local API drops the edited.since filter on /nodes/search, so what's supposed to be an incremental sync walks the entire graph. This PR doesn't fix the upstream filter (that's on Tana). It makes the failure visible and bounded so users don't wait an hour to find out something's wrong, and it points them to the fast recovery path (supertag sync index, no --delta).

This is a safety patch. The delta-sync feature itself is only as good as the upstream filter; until that's fixed, full-refresh is the practical workflow.

Symptom

On a 739K-node workspace:

$ supertag sync index --delta
[INFO]  [tana-sync]  Workspace: main
...
(silence for 1h+)

No progress, no error, eventually completes with most of the workspace re-merged. First-time users will assume the CLI is broken or hung.

Root cause (upstream, not this CLI)

Direct probe of the Tana Desktop Local API, bypassing this CLI entirely:

since=5 min ago    -> page1 100  page2 100  page3 100
since=1 hour ago   -> page1 100  page2 100  page3 100
since=1 day ago    -> page1 100  page2 100  page3 100
since=1 week ago   -> page1 100  page2 100  page3 100
since=1 year ago   -> page1 100  page2 100  page3 100
since=1 (epoch)    -> page1 100  page2 100  page3 100

Six orders of magnitude of since values, all return identical paginated results. The filter is being silently dropped. This is a Tana Desktop bug, not a supertag-cli bug. To my knowledge, this is the first surfaced report (no open or closed issues on this repo mention edited.since, /nodes/search, or delta-sync pagination behavior).

The wider context: full sync is much faster anyway

While debugging this, I measured all three sync transports end-to-end on the same 739K-node workspace:

Path Mechanism Wall time
supertag-export run main Downloads Tana's server-cached snapshot (173 MB) 9.1s
supertag sync index Reads local snapshot JSON, bulk-inserts to SQLite 4.3s
supertag sync index --delta (working) Paginated HTTP to Local API, 100 nodes/page minutes-to-hours, scales with changeset
supertag sync index --delta (broken, today) Same path but server returns whole workspace hangs indefinitely

So a full refresh is ~13 seconds total on a 739K-node graph. Even if the upstream filter is fixed tomorrow, --delta only wins when the actual changeset is small enough that page-by-page HTTP roundtrips beat a 9-second cached-snapshot download. That's a narrower regime than the current docs imply.

This PR is not advocating to remove --delta (there are legitimate uses: long-running watch loops, environments where Playwright is unavailable). But the recovery hint in the error message points users at the path they probably wanted anyway.

Changes

src/services/delta-sync.ts

  • Per-page progress logging. Logs on page 1, then every 10th page. Turns the silent hang into observable progress.
  • Auto-scaled abort cap. Cap = ceil(sync_metadata.total_nodes * 0.25 / 100) pages. The broken-API failure mode returns ~100% of the workspace, so any sub-100% threshold catches it; 25% leaves headroom for legitimately large deltas while bounding wasted work. Falls back to 1000 pages if total_nodes is missing.
  • Preserves last_sync_timestamp on abort. When the cap fires, sync_metadata.last_sync_timestamp is not advanced. The next attempt replays from the same point and re-merges any rows that were already committed, idempotently. Error message tells the user to run supertag sync index (no --delta) for full resync.

src/types/local-api.ts

  • Adds optional maxPages to DeltaSyncOptions so callers can override the auto-scaled cap.

src/commands/sync.ts

  • Wires the existing logger through to DeltaSyncService in both the standalone runDeltaSync path and the watch path. Without this, the new progress lines defaulted to a no-op logger when invoked from the CLI and never surfaced. Drive-by but necessary for the fix to be visible to users.

tests/unit/delta-sync-pagination.test.ts

7 new tests:

  1. Progress line fires on page 1.
  2. Heartbeat fires at pages 1, 10, 20 over a 25-page run.
  3. Auto-scaled cap derives from total_nodes (24K nodes gives a 60-page cap).
  4. Auto-scaled cap scales down proportionally (4K nodes gives a 10-page cap, no minimum floor).
  5. Explicit maxPages override wins over the auto-scaled value.
  6. Abort throws with the documented error message plus supertag sync index hint.
  7. After abort, last_sync_timestamp is unchanged; merged rows are present (idempotent replay).

Fixture seed bumped: total_nodes 1000 to 1000000 so the new auto-scaled cap doesn't trip unrelated pagination tests.

All 19 tests in delta-sync-pagination.test.ts pass. No other test files modified.

Verification

Live-tested against the 739K-node workspace that originally exhibited the hang:

  • Per-page progress lines surfaced as expected.
  • Auto-scaled cap fired at 1849 pages (ceil(739455 * 0.25 / 100)), aborted with the documented error.
  • Total runtime in the minutes range instead of 1h+.

Recommended follow-ups (out of scope for this PR)

  1. Confirm and file the upstream Tana bug. Probe evidence above is reproducible; happy to share the test script.
  2. Consider updating --delta docs to mention the full-sync alternative and its current speed advantage. (Not making doc changes in this PR to keep the scope tight.)
  3. Eventually: if Tana fixes the filter, the cap becomes a dormant safety net. If they don't, it might be worth deprecating --delta in favor of cached-snapshot refresh as the default sync path.

Notes for reviewers

  • The MAX_PAGES_RATIO = 0.25 constant is the main tunable. I considered 0.5 (more headroom) but settled on 0.25 because the broken-API failure mode is so distinctive (whole-workspace return) that there's no risk of false-positive aborts at 25%, and lower thresholds reduce wasted work.
  • No floor on the auto-scaled cap on purpose: very small workspaces should also be protected proportionally. Test 4 covers this.
  • Holding last_sync_timestamp steady on abort is the subtle correctness property. It's what makes the abort safe to retry rather than corrupt-on-retry. Test 7 pins it.

…es edited.since

The Tana Desktop Local API silently ignores the `edited.since` filter on
`/nodes/search` (verified via direct probe: six different `since` values
from 5 minutes through epoch all return identical paginated results).
On a 739K-node workspace this turns delta-sync into a 1h+ hang as it
pages through the entire graph as if everything changed.

This is an upstream Tana Desktop bug — `supertag-cli` sends the filter
correctly; the server discards it. Until that's fixed, mitigate from the
client:

- Per-page progress logging (page 1, then every 10th page) so the hang
  stops being silent.
- Auto-scaled abort cap: 25% of `sync_metadata.total_nodes`, with a
  1000-page fallback if total_nodes is missing. The broken-API failure
  mode returns ~100% of the workspace, so any sub-100% threshold trips
  it; 25% leaves plenty of headroom for legitimately large deltas while
  capping wasted work. Override via `DeltaSyncOptions.maxPages`.
- On abort, `sync_metadata.last_sync_timestamp` is NOT advanced — so the
  next delta-sync replays from the same point and re-merges the same
  rows idempotently. Error message points users to `supertag sync index`
  for a full resync.
- Wire `logger` through `commands/sync.ts` to `DeltaSyncService` so the
  new progress lines actually surface in the CLI (previously the service
  defaulted to a no-op logger when invoked from the command layer).

Tests: 7 new pagination tests covering progress logging, auto-scaling
from total_nodes, explicit override, and that last_sync_timestamp is
preserved on abort. Test fixture seed bumped from total_nodes=1000 to
1000000 so the new auto-scaled cap doesn't trip unrelated pagination
tests. All 19 tests in delta-sync-pagination green.

Verified live on the 739K-node workspace: cap fired at 1849 pages
(25% of total / 100), aborted cleanly with the documented error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant