Skip to content

Handle cases when dreyfus checkpoint is out-of-sync with the index#5931

Merged
nickva merged 1 commit intomainfrom
handle-purge-seq-and-checkpoint-oddness-better
Mar 20, 2026
Merged

Handle cases when dreyfus checkpoint is out-of-sync with the index#5931
nickva merged 1 commit intomainfrom
handle-purge-seq-and-checkpoint-oddness-better

Conversation

@nickva
Copy link
Contributor

@nickva nickva commented Mar 20, 2026

Currently, there are two places where the index purge seq is tracked: in the index and in the db local doc checkpoints. Purge sequence folding should never start below the value in the checkpoint document as that could raise an invalid_start_purge_seq. Normally both sequences should match, but if they don't try to be explicit about what should happen:

  • Index pseq > checkpoint pseq. Index somehow got ahead of the checkpoint. Use the checkpoint seq and re-process some purges through the index. This will do extra work but should be safe.

  • Index pseq < checkpoint pseq. Index somehow got behind the checkpoint and it looks like it could have skipped purges. For views we reset the index, and arguably that's the most correct solution. However, we never really had a reset facility for clouseau, so instead choose to emit an error log and let the user intervene manually but otherwise keep updating the index.

When updating the purge sequence in clouseau, save an rpc call if we're not advancing clouseau's purge sequence. Clouseau as of recently already has a check to return ok right away if new purge_seq is somehow less or equal to the current one, but it's still nice not have to do an extra round-trip.

It was a bit surprising to discover that we had a bunch of nice dreyfus eunit purge tests around but they never actually ran. The test functions there were not discoverable by EUnit. Switching them to be discoverable still wouldn't work as the test suite would need clouseau running during EUnit tests. Since we don't really have a framework for that, let's switch them to Elixir test and run them alongside other search tests.

Currently, there are two places where the index purge seq is tracked: in the
index and in the db local doc checkpoints. Purge sequence folding should never
start below the value in the checkpoint document as that could raise an
`invalid_start_purge_seq`. Normally both sequences should match, but if they
don't try to be explicit about what should happen:

 * Index pseq > checkpoint pseq. Index somehow got ahead of the checkpoint. Use
 the checkpoint seq and re-process some purges through the index. This will do
 extra work but should be safe.

 * Index pseq < checkpoint pseq. Index somehow got behind the checkpoint and it
 looks like it could have skipped purges. For views we reset the index, and
 arguably that's the most correct solution. However, we never really had a
 reset facility for clouseau, so instead choose to emit an error log and let
 the user intervene manually but otherwise keep updating the index.

When updating the purge sequence in clouseau, save an rpc call if we're not
advancing clouseau's purge sequence. Clouseau as of recently already has a
check to return `ok` right away if new purge_seq is somehow less or equal to
the current one, but it's still nice not have to do an extra round-trip.

It was a bit surprising to discover that we had a bunch of nice dreyfus eunit purge
tests around but they never actually ran. The test functions there were not
discoverably by EUnit. Switching them to be discoverable still wouldn't work as
the test suite would need clouseau running during EUnit tests. Since we don't
really have a framework for that, let's switch them to Elixir test and run them
alongside other search tests.
@nickva nickva merged commit e273f70 into main Mar 20, 2026
60 checks passed
@nickva nickva deleted the handle-purge-seq-and-checkpoint-oddness-better branch March 20, 2026 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants