Skip to content

[WIP] fix: rollout recorder not adding turns to rollout after an error until next restart#15095

Draft
JaviSoto wants to merge 1 commit intomainfrom
dev/javi/rollout-recorder-stop-on-failure
Draft

[WIP] fix: rollout recorder not adding turns to rollout after an error until next restart#15095
JaviSoto wants to merge 1 commit intomainfrom
dev/javi/rollout-recorder-stop-on-failure

Conversation

@JaviSoto
Copy link
Contributor

I have been chasing this bug for quite some time. I noticed that sometimes when I closed the Codex app, or I tried to resume a CLI thread, that a bunch of messages were missing.

Turns out that this can happen if the rollout recorder hits an error (for example if you’re out of disk space). After that happens, it will stop writing to disk.

I’m not confident about the Codex-written fix, but I’m sharing it as-is at least to highlight the problem and start the conversation!

@JaviSoto JaviSoto requested a review from jif-oai March 18, 2026 21:23
…r until next restart

I have been chasing this bug for quite some time. I noticed that
sometimes when I closed the Codex app, or I tried to resume a CLI
thread, that a bunch of messages were missing.

Turns out that this can happen if the rollout recorder hits an error
(for example if you’re out of disk space). After that happens, it will
stop writing to disk.

I’m not confident about the Codex-written fix, but I’m sharing it as-is
at least to highlight the problem and start the conversation!
#[cfg(test)]
fail_next_writes: Option<Arc<AtomicUsize>>,
#[cfg(test)]
fail_next_flushes: Option<Arc<AtomicUsize>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid adding test-only fields to a prod type? Or is there a reason we need this for tests?

}

async fn write_rollout_items(&mut self, rollout_items: &[RolloutItem]) -> std::io::Result<()> {
#[cfg(test)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the prod code a bit hard to read, do we have to nest test-only code in a prod function like this?


let result = async {
self.file.write_all(json.as_bytes()).await?;
#[cfg(test)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants