Skip to content

Implement auto-repair#107

Merged
albe merged 26 commits intomainfrom
auto-repair
Mar 22, 2026
Merged

Implement auto-repair#107
albe merged 26 commits intomainfrom
auto-repair

Conversation

@albe
Copy link
Copy Markdown
Owner

@albe albe commented May 31, 2020

Implements automatic recovery from unfinished commits and makes sure that indexes are not containing elements that were lost.
Also detects if the primary index has fallen behind. A repair for that case will be added after #254 is merged.

Related to #31

@albe albe mentioned this pull request Jun 27, 2020
Comment thread src/EventStore.js Outdated
* @private
*/
checkTornWrite() {
let position = this.storage.length;
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix torn writes (which should be a storage level concern) we need to infer the position from the actual documents in the storage rather than the information from the index. Hence the storage needs to read the last document of each partition and return the highest document header sequenceNumber.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #155

Comment thread src/EventStore.js Outdated
const indexes = [];

let leastConsistentEntry = this.storage.index.lastEntry;
this.storage.forEachSecondaryIndex(index => {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Updating secondary indexes is problematic, because they can fall back behind the primary index. However, we do not know if an index has actually fallen behind or if it just didn't match any later documents. This leads to the worst case of one secondary index only matching the first document written, but no other, which would mean the whole storage needs to be reindexed.

Comment thread src/EventStore.js Outdated
};
for (let event of events) {
const eventMetadata = Object.assign({ commitId, committedAt }, metadata, { commitVersion, streamVersion });
const eventMetadata = Object.assign({ commitId, committedAt }, metadata, { commitVersion, commitSize, streamVersion });
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already implemented as of #159

@albe
Copy link
Copy Markdown
Owner Author

albe commented May 16, 2021

This needs to be rewritten to only deal with unfinished commits as the torn writes has already been covered by #155

@albe
Copy link
Copy Markdown
Owner Author

albe commented Mar 20, 2026

@copilot continue working on this. Take the previous comments into consideration and check why the tests are currently not finishing

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 20, 2026

@albe I've opened a new pull request, #251, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 2 commits March 20, 2026 20:09
Fix auto-repair: remove broken incomplete code causing infinite loop in checkUnfinishedCommits
@coveralls
Copy link
Copy Markdown

coveralls commented Mar 20, 2026

Coverage Status

coverage: 97.958% (+0.1%) from 97.826%
when pulling 967ba2d on auto-repair
into 28d4e34 on main.

@albe
Copy link
Copy Markdown
Owner Author

albe commented Mar 20, 2026

@copilot add tests that check that a repair (torn write/commit) will not lose any previously committed data.

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 20, 2026

@albe I've opened a new pull request, #252, to work on those changes. Once the pull request is ready, I'll request review from you.

@albe
Copy link
Copy Markdown
Owner Author

albe commented Mar 20, 2026

@copilot find uncovered lines in this change and improve test coverage, since coveralls is still reporting -0.3% overall coverage on this PR

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 20, 2026

@albe I've opened a new pull request, #253, to work on those changes. Once the pull request is ready, I'll request review from you.

…remove dead createStorage, add stale-index test

Co-authored-by: albe <4259532+albe@users.noreply.github.com>
Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/3c271323-430d-4355-a4ff-66d289bf2e48
Comment thread src/Partition/WritablePartition.js Outdated
try {
this.readFrom(after);
// copy all truncated documents to some delete log
this.branchOff('truncated-' + Date.now(), after);
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot actually, putting this into the try was wrong. It needs to be guaranteed that when the partition file is truncated, the truncated part is stored into a new file (which is not supposed to be read by the library, but for manual recovery attempts). This is important when recovery finds corrupted data. The event store needs to return to a consistent state, but the malformed data might still contain valuable information.

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 21, 2026

@albe I've opened a new pull request, #259, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 2 commits March 21, 2026 15:05
Guarantee truncated data is saved before partition truncation, even when corrupted
@albe
Copy link
Copy Markdown
Owner Author

albe commented Mar 21, 2026

@copilot take a look at the failing test

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 21, 2026

@albe I've opened a new pull request, #260, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 2 commits March 21, 2026 15:31
Fix bench-index: use per-call unique filenames to avoid consistency error
@albe
Copy link
Copy Markdown
Owner Author

albe commented Mar 21, 2026

@copilot there is still a failing test:

  1) EventStore
       preRead
         aborts the read when the hook throws:
     Uncaught Error: Not authorized to read
      at WritableStorage.<anonymous> (test/EventStore.spec.js:1290:27)
      at WritableStorage.emit (node:events:524:28)
      at WritableStorage.readFrom (src/Storage/ReadableStorage.js:87:173)
      at WritableStorage.read (src/Storage/ReadableStorage.js:94:545)
      at EventStore.checkUnfinishedCommits (src/EventStore.js:27:302)
      at /home/runner/work/node-event-storage/node-event-storage/src/EventStore.js:23:854
      at /home/runner/work/node-event-storage/node-event-storage/src/EventStore.js:36:507
      at FSReqCallback.oncomplete (node:fs:187:23)

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 21, 2026

@albe I've opened a new pull request, #261, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 2 commits March 21, 2026 15:51
Fix uncaught error when preRead hook throws during checkUnfinishedCommits
@albe albe changed the title Start implementing auto-repair Implement auto-repair Mar 21, 2026
@albe albe mentioned this pull request Mar 21, 2026
@albe
Copy link
Copy Markdown
Owner Author

albe commented Mar 22, 2026

@copilot there is one more thing to consider: is there some scenario in which the primary index can run short of the actual documents stored in the storage? If so, how can we detect this status efficiently at startup time? When we do detect this condition, with #254 we will then be able to also repair the primary index. But in this PR we just want to detect this situation if it's even realistic, not yet resolve it.

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 22, 2026

@albe I've opened a new pull request, #264, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 4 commits March 22, 2026 10:39
…Header in checkTornWrite()

Co-authored-by: albe <4259532+albe@users.noreply.github.com>
Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/cd6d455a-7dd3-462e-a3be-61c6eccf8112
Detect primary index lagging behind partition data at startup
@albe albe merged commit bf45844 into main Mar 22, 2026
10 checks passed
@albe albe deleted the auto-repair branch March 22, 2026 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants