implement snapshot retention based on trigger buffer age #383
implement snapshot retention based on trigger buffer age #383
Conversation
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
mattklein123
left a comment
There was a problem hiding this comment.
Very cool, some small comments. We definitely have to do some aggressive manual QA on all of this.
| let record_start = (next_read.start + guard.extra_bytes_per_record) as usize; | ||
| let record_end = record_start + next_read_size as usize; | ||
| let record_data = if record_end <= guard.memory().len() { | ||
| guard.memory()[record_start .. record_end].to_vec() |
There was a problem hiding this comment.
Why do we need to copy here?
There was a problem hiding this comment.
I got this working by 1) adding a const_memory that allows non-mut access to the data and 2) move reading the slice to after we advance the read cursor. I think this is correct but some eyes on it would be helpful
| // Controls the maximum number of snapshots to retain for persistent state. | ||
| // A value of 0 disables count-based cleanup. | ||
| int_feature_flag!(MaxSnapshotCount, "state.max_snapshot_count", 0); |
There was a problem hiding this comment.
Seems like we should default this on? Otherwise won't we get unbounded growth?
There was a problem hiding this comment.
Yeah let me set this to some sane default for now and we can handle in general. This is only there in cases where we end up rotating a lot and creating a lot of snapshots within the time period permitted by the retention handles, so it wouldn't necessarily grow unbounded but a default makes sense. With the 1mb snapshot size you'd have to write a ton of state updates before actually running into this limit
There was a problem hiding this comment.
Going to leave this at 0 by default so we can turn off snapshot retention completely by default which seems like the safer option given how we handle default runtime values as part of crash loop detection
There was a problem hiding this comment.
Oh sorry I didn't realize what this meant. I thought it meant that it just doesn't clean anything up which is what the comment sounds like. Update the comment?
Fixes BIT-7270