Skip to content

fix(d1-votes): prune orphan vote rows and fail verify on surplus keys#571

Open
jony376 wants to merge 3 commits into
JSONbored:mainfrom
jony376:fix/d1-vote-sync-prune-orphans
Open

fix(d1-votes): prune orphan vote rows and fail verify on surplus keys#571
jony376 wants to merge 3 commits into
JSONbored:mainfrom
jony376:fix/d1-vote-sync-prune-orphans

Conversation

@jony376
Copy link
Copy Markdown

@jony376 jony376 commented May 28, 2026

Summary

Prune orphan D1 vote rows during sync and fail verification when surplus keys remain, using a shared enumerator for content vote keys.

Related Issues

Closes #564

Type of Change

  • Bug fix
  • New feature
  • Refactor
  • Documentation
  • Other (describe below)

Testing

  • Tests added/updated
  • Manually tested

Checklist

  • Code follows project style guidelines
  • Self-review completed
  • Changes are documented (if applicable)

Co-authored-by: Cursor <cursoragent@cursor.com>
@jony376 jony376 requested a review from JSONbored as a code owner May 28, 2026 23:24
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Improved vote synchronization with centralized enumeration and validation of vote keys.
    • Added optional pruning mode to automatically remove orphaned vote entries from the database.
    • Enhanced verification process to detect and report orphaned records and inconsistencies.
  • Tests

    • Added test suite for vote key enumeration and orphan detection functionality.

Walkthrough

This PR introduces shared vote catalog utilities and refactors sync and verify scripts to detect and prune orphaned vote entries. A new enumerate-content-vote-keys.mjs library exports key enumeration and orphan-detection functions used by both scripts. The sync script now queries D1 for existing keys and conditionally deletes orphan rows. The verify script detects orphans and reports them as test failures.

Changes

Vote Sync and Verify Orphan Pruning

Layer / File(s) Summary
Shared vote catalog utilities: key enumeration and orphan detection
scripts/lib/enumerate-content-vote-keys.mjs
enumerateContentVoteKeys() traverses content/<category>/ directories and extracts category:slug keys from MDX frontmatter (with filename fallback), returning a sorted Set. findOrphanVoteKeys() filters actual keys to identify those missing from the expected set, returning a sorted array.
Sync script: use shared utilities, add D1 query, execute inserts and optional prune deletes
scripts/sync-votes-to-d1.mjs
Replaces inline content scanning with enumerateContentVoteKeys() for expected keys. Adds getVoteEntryKeys() to fetch existing entry keys from D1 via wrangler and parse the JSON response. Updated applyMode() batches INSERT statements and, when pruning is enabled, computes orphans and deletes them in chunks with proper quote escaping. Final log includes prune status.
Verify script: use shared utilities, detect orphans, extend failure conditions and reporting
scripts/verify-d1-votes.mjs
Refactors to use enumerateContentVoteKeys() instead of local directory scanning. Detects orphan rows via findOrphanVoteKeys(). Extended failure condition to mark the run as failed when orphan rows exist. Updated summary log and added conditional listing of first orphan rows (up to 20) when detected.
Test suite: orphan detection and content key enumeration
tests/d1-vote-sync.test.ts
New Vitest suite with unit tests for findOrphanVoteKeys() against a manual expected set and filesystem-based integration test for enumerateContentVoteKeys() validating non-empty results and category:slug key format.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

contributor:verified, pr:verified

Suggested reviewers

  • JSONbored

Poem

🗝️ Keys once lost now found and fixed,
Orphans pruned from D1's mix,
Content syncs without a hitch—
No stray votes in the ditch! 🎉

🚥 Pre-merge checks | ✅ 8
✅ Passed checks (8 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the primary changes: pruning orphan D1 vote rows during sync and failing verification on surplus keys.
Description check ✅ Passed The PR description covers the summary, related issues, type of change, testing, and key checklist items; however, it uses a minimal template that differs from the repository's detailed template structure.
Linked Issues check ✅ Passed The PR successfully implements all core coding requirements from #564: shared enumerateContentVoteKeys(), prune mode in sync, orphan detection in verify, and unit tests for orphan logic.
Out of Scope Changes check ✅ Passed All changes align with #564 scope: new helpers, sync/verify script updates, and tests. No UI components, generated artifacts, or unrelated refactoring are present.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Security Pattern Review ✅ Passed Code uses proper SQL escaping, safe execFileSync with array args, validates mode argument, uses path.join() safely, logs only catalog data (non-sensitive).
Client/Server Boundary Validation ✅ Passed PR modifies only Node.js scripts and tests; no React components, server components, or API routes present. Custom check applicability: N/A (out of scope).
Logging Standards Compliance ✅ Passed All changed files (scripts/lib, scripts/, tests/) are outside the logging standards check scope, which only applies to apps/web/src and packages/web-runtime/src.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
✨ Simplify code
  • Create PR with simplified code

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jony376 jony376 force-pushed the fix/d1-vote-sync-prune-orphans branch from 7464061 to 82e5e06 Compare May 28, 2026 23:24
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
scripts/verify-d1-votes.mjs (2)

21-33: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Verify votes_by_client too, not just votes_entries.

This check only inspects votes_entries. If sync leaves orphaned keys in votes_by_client, verify still reports green even though the D1 vote state is out of sync with the catalog. Given the stated invariant for both tables, verification should compare orphan/missing keys across both datasets.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/verify-d1-votes.mjs` around lines 21 - 33, The current getRows
function only queries votes_entries, so update it to fetch both tables and
return both result sets for verification: modify the args assembled in getRows
(referencing getRows, d1Binding, runMode, and args) to either run two SQL
commands (e.g., two SELECTs: one for entry_key, upvote_count FROM votes_entries
and one for the relevant columns from votes_by_client such as client_id,
entry_key) or a single script that returns both result sets, then adjust the
caller to consume both results so the verifier can compare orphan/missing keys
across votes_entries and votes_by_client. Ensure the remote/local flag logic
using runMode and d1Binding remains unchanged.

48-50: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail on surplus rows, not only shortages.

actual deduplicates repeated entry_keys, so a duplicate row for an expected key won't show up in orphans. With the current totalRows < totalExpected check, that case passes verification even though D1 still has extra rows. This should fail on any row-count mismatch, not just missing rows.

Minimal fix
   if (
     result.missing.length > 0 ||
     result.negativeCounts.length > 0 ||
-    result.totalRows < result.totalExpected ||
+    result.totalRows !== result.totalExpected ||
     result.orphans.length > 0
   ) {
     failed = true;
   }

Also applies to: 83-87

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/verify-d1-votes.mjs` around lines 48 - 50, The verification currently
only fails when totalRows < totalExpected (allowing surplus duplicate rows to
slip through); change the check to fail on any row-count mismatch by replacing
the strict "less than" check with a strict inequality (totalRows !==
totalExpected) using the raw rows.length (variable totalRows) and update the
associated error/exit path accordingly; apply the same change to the similar
block that appears at the later lines (the other totalRows/totalExpected
comparison) so any surplus or shortage causes verification to fail even though
the deduplicated Map actual may hide duplicates.
🧹 Nitpick comments (1)
tests/d1-vote-sync.test.ts (1)

8-25: 🏗️ Heavy lift

Add one regression test for the actual sync/verify invariants.

These tests are good for the helper layer, but they stop short of the behavior this PR is fixing. A script-level test that seeds extra keys and asserts prune/verify handle them would catch regressions like orphaned votes_by_client rows or duplicate-row false negatives.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/d1-vote-sync.test.ts` around lines 8 - 25, Add a regression test to
tests/d1-vote-sync.test.ts that goes beyond unit helpers (findOrphanVoteKeys,
enumerateContentVoteKeys) by seeding extra vote keys (including an orphan like
"mcp:old-slug" and duplicated rows) into the test DB/store, then invoking the
script-level sync/verify entrypoint (the module that performs prune/verify;
e.g., the project’s runVoteSync/pruneOrphanVotes/syncAndVerify function) and
asserting post-run that orphaned keys were removed and duplicates
flagged/handled; use findOrphanVoteKeys and enumerateContentVoteKeys to build
expected sets and assertions so the test would catch regressions around orphaned
votes_by_client rows and duplicate-row false negatives.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/sync-votes-to-d1.mjs`:
- Around line 89-108: The prune routine only deletes orphan rows from
votes_entries, leaving orphaned rows in votes_by_client; update the loop that
builds the orphan key set (using getVoteEntryKeys and orphans) so the
runWrangler DELETE covers both tables: execute DELETE FROM votes_entries WHERE
entry_key IN (...) and DELETE FROM votes_by_client WHERE entry_key IN (...), or
run a single transaction/command that deletes from both tables using the same
inList; ensure the same quoting logic (key.replaceAll("'", "''")) and the same
runWrangler invocation are reused so both tables are reconciled for runMode
remote/local.

---

Outside diff comments:
In `@scripts/verify-d1-votes.mjs`:
- Around line 21-33: The current getRows function only queries votes_entries, so
update it to fetch both tables and return both result sets for verification:
modify the args assembled in getRows (referencing getRows, d1Binding, runMode,
and args) to either run two SQL commands (e.g., two SELECTs: one for entry_key,
upvote_count FROM votes_entries and one for the relevant columns from
votes_by_client such as client_id, entry_key) or a single script that returns
both result sets, then adjust the caller to consume both results so the verifier
can compare orphan/missing keys across votes_entries and votes_by_client. Ensure
the remote/local flag logic using runMode and d1Binding remains unchanged.
- Around line 48-50: The verification currently only fails when totalRows <
totalExpected (allowing surplus duplicate rows to slip through); change the
check to fail on any row-count mismatch by replacing the strict "less than"
check with a strict inequality (totalRows !== totalExpected) using the raw
rows.length (variable totalRows) and update the associated error/exit path
accordingly; apply the same change to the similar block that appears at the
later lines (the other totalRows/totalExpected comparison) so any surplus or
shortage causes verification to fail even though the deduplicated Map actual may
hide duplicates.

---

Nitpick comments:
In `@tests/d1-vote-sync.test.ts`:
- Around line 8-25: Add a regression test to tests/d1-vote-sync.test.ts that
goes beyond unit helpers (findOrphanVoteKeys, enumerateContentVoteKeys) by
seeding extra vote keys (including an orphan like "mcp:old-slug" and duplicated
rows) into the test DB/store, then invoking the script-level sync/verify
entrypoint (the module that performs prune/verify; e.g., the project’s
runVoteSync/pruneOrphanVotes/syncAndVerify function) and asserting post-run that
orphaned keys were removed and duplicates flagged/handled; use
findOrphanVoteKeys and enumerateContentVoteKeys to build expected sets and
assertions so the test would catch regressions around orphaned votes_by_client
rows and duplicate-row false negatives.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: ad4d8943-139d-4a94-9812-f0285c563847

📥 Commits

Reviewing files that changed from the base of the PR and between 6ed7681 and 82e5e06.

📒 Files selected for processing (4)
  • scripts/lib/enumerate-content-vote-keys.mjs
  • scripts/sync-votes-to-d1.mjs
  • scripts/verify-d1-votes.mjs
  • tests/d1-vote-sync.test.ts
📜 Review details
🔇 Additional comments (1)
scripts/lib/enumerate-content-vote-keys.mjs (1)

11-37: LGTM!

Also applies to: 44-52

Comment on lines +89 to +108
const actualKeys = getVoteEntryKeys(runMode);
const orphans = actualKeys.filter((key) => !expected.has(key));
if (orphans.length === 0) {
console.log(`${runMode}: no orphan vote rows to prune`);
return;
}

for (let index = 0; index < orphans.length; index += chunkSize) {
const chunk = orphans.slice(index, index + chunkSize);
const inList = chunk
.map((key) => `'${key.replaceAll("'", "''")}'`)
.join(", ");
runWrangler([
"d1",
"execute",
d1Binding,
runMode === "remote" ? "--remote" : "--local",
"--command",
`DELETE FROM votes_entries WHERE entry_key IN (${inList});`,
]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Prune both vote tables during reconciliation.

Right now the prune path only deletes from votes_entries. Any orphaned votes_by_client.entry_key rows survive a slug rename/delete, so D1 is still inconsistent and the old vote data stays stranded even after sync. The fix here needs to reconcile both tables off the same orphan key set.

Possible direction
-  const actualKeys = getVoteEntryKeys(runMode);
+  const actualKeys = getVoteEntryKeys(runMode);
   const orphans = actualKeys.filter((key) => !expected.has(key));
   if (orphans.length === 0) {
     console.log(`${runMode}: no orphan vote rows to prune`);
     return;
   }

   for (let index = 0; index < orphans.length; index += chunkSize) {
     const chunk = orphans.slice(index, index + chunkSize);
     const inList = chunk
       .map((key) => `'${key.replaceAll("'", "''")}'`)
       .join(", ");
     runWrangler([
       "d1",
       "execute",
       d1Binding,
       runMode === "remote" ? "--remote" : "--local",
       "--command",
-      `DELETE FROM votes_entries WHERE entry_key IN (${inList});`,
+      [
+        `DELETE FROM votes_by_client WHERE entry_key IN (${inList});`,
+        `DELETE FROM votes_entries WHERE entry_key IN (${inList});`,
+      ].join(" "),
     ]);
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const actualKeys = getVoteEntryKeys(runMode);
const orphans = actualKeys.filter((key) => !expected.has(key));
if (orphans.length === 0) {
console.log(`${runMode}: no orphan vote rows to prune`);
return;
}
for (let index = 0; index < orphans.length; index += chunkSize) {
const chunk = orphans.slice(index, index + chunkSize);
const inList = chunk
.map((key) => `'${key.replaceAll("'", "''")}'`)
.join(", ");
runWrangler([
"d1",
"execute",
d1Binding,
runMode === "remote" ? "--remote" : "--local",
"--command",
`DELETE FROM votes_entries WHERE entry_key IN (${inList});`,
]);
const actualKeys = getVoteEntryKeys(runMode);
const orphans = actualKeys.filter((key) => !expected.has(key));
if (orphans.length === 0) {
console.log(`${runMode}: no orphan vote rows to prune`);
return;
}
for (let index = 0; index < orphans.length; index += chunkSize) {
const chunk = orphans.slice(index, index + chunkSize);
const inList = chunk
.map((key) => `'${key.replaceAll("'", "''")}'`)
.join(", ");
runWrangler([
"d1",
"execute",
d1Binding,
runMode === "remote" ? "--remote" : "--local",
"--command",
[
`DELETE FROM votes_by_client WHERE entry_key IN (${inList});`,
`DELETE FROM votes_entries WHERE entry_key IN (${inList});`,
].join(" "),
]);
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/sync-votes-to-d1.mjs` around lines 89 - 108, The prune routine only
deletes orphan rows from votes_entries, leaving orphaned rows in
votes_by_client; update the loop that builds the orphan key set (using
getVoteEntryKeys and orphans) so the runWrangler DELETE covers both tables:
execute DELETE FROM votes_entries WHERE entry_key IN (...) and DELETE FROM
votes_by_client WHERE entry_key IN (...), or run a single transaction/command
that deletes from both tables using the same inList; ensure the same quoting
logic (key.replaceAll("'", "''")) and the same runWrangler invocation are reused
so both tables are reconciled for runMode remote/local.

@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label May 30, 2026
Copy link
Copy Markdown
Owner

@JSONbored JSONbored left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jony376 this fixes only part of the vote-orphan problem.

  • Prune orphan rows from both votes_entries and votes_by_client.
  • Do not rely only on ON DELETE CASCADE; make the script explicitly reconcile votes_by_client.entry_key too.
  • Update verify-d1-votes.mjs so surplus keys in votes_by_client fail verification as well as surplus keys in votes_entries.
  • Add tests covering orphan client-vote rows after a slug rename or deletion.
  • Update the branch against current main and rerun the D1 vote sync/verify tests.
  • Expected validation: focused D1 vote tests, pnpm test, and git diff --check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(d1-votes): vote sync never prunes orphaned D1 rows; verify ignores surplus keys

2 participants