Skip to content

implement snapshot retention based on trigger buffer age #383

Open
snowp wants to merge 24 commits intomainfrom
retention-handle-eviction
Open

implement snapshot retention based on trigger buffer age #383
snowp wants to merge 24 commits intomainfrom
retention-handle-eviction

Conversation

@snowp
Copy link
Contributor

@snowp snowp commented Jan 28, 2026

  • Adds a callback that can be installed onto trigger buffers that allow us to update the snapshot retention time based on the age of the buffer entries
  • Adds a mechanism to peek the next entry to read - this allows configuring the initial retention handle value without having to wait for an eviction
  • Adds a runtime flag that controls how many snapshots we'll keep at any given time, this provides a way for us to control the number of snapshot files being left on disk.

Fixes BIT-7270

Copy link
Contributor

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool, some small comments. We definitely have to do some aggressive manual QA on all of this.

let record_start = (next_read.start + guard.extra_bytes_per_record) as usize;
let record_end = record_start + next_read_size as usize;
let record_data = if record_end <= guard.memory().len() {
guard.memory()[record_start .. record_end].to_vec()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to copy here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this working by 1) adding a const_memory that allows non-mut access to the data and 2) move reading the slice to after we advance the read cursor. I think this is correct but some eyes on it would be helpful

Comment on lines 960 to 962
// Controls the maximum number of snapshots to retain for persistent state.
// A value of 0 disables count-based cleanup.
int_feature_flag!(MaxSnapshotCount, "state.max_snapshot_count", 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should default this on? Otherwise won't we get unbounded growth?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah let me set this to some sane default for now and we can handle in general. This is only there in cases where we end up rotating a lot and creating a lot of snapshots within the time period permitted by the retention handles, so it wouldn't necessarily grow unbounded but a default makes sense. With the 1mb snapshot size you'd have to write a ton of state updates before actually running into this limit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to leave this at 0 by default so we can turn off snapshot retention completely by default which seems like the safer option given how we handle default runtime values as part of crash loop detection

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry I didn't realize what this meant. I thought it meant that it just doesn't clean anything up which is what the comment sounds like. Update the comment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants