Skip to content

perf(recording): reduce in-recording memory pressure hotspots#1828

Open
JYZ-LESLIE wants to merge 2 commits into
CapSoftware:mainfrom
JYZ-LESLIE:codex/algora-109-memory-guard
Open

perf(recording): reduce in-recording memory pressure hotspots#1828
JYZ-LESLIE wants to merge 2 commits into
CapSoftware:mainfrom
JYZ-LESLIE:codex/algora-109-memory-guard

Conversation

@JYZ-LESLIE
Copy link
Copy Markdown

@JYZ-LESLIE JYZ-LESLIE commented May 16, 2026

/claim #109

Summary

This PR takes a scoped pass on issue #109 (memory/CPU pressure during recording), focusing on two low-risk hotspots that scale with recording duration.

1) Cursor incremental flush path: remove full-vector clone/string allocation

  • Reworked cursor flush serialization in crates/recording/src/cursor.rs to write directly to file via serde_json::to_writer.
  • Removed the periodic to_vec() + to_string_pretty() full-copy path for cursor events.
  • Added flush change-detection so periodic snapshots only write when event counts changed.

2) Lower default instant muxer queue depth + per-mode override

  • Reduced DEFAULT_MP4_MUXER_BUFFER_SIZE_INSTANT from 240 to 96 in crates/recording/src/output_pipeline/macos.rs.
  • Added CAP_MP4_MUXER_BUFFER_SIZE_INSTANT env override for instant mode.
  • Kept CAP_MP4_MUXER_BUFFER_SIZE as the global fallback.
  • Added/updated tests for precedence + fallback behavior.

Why this helps

  • The cursor path previously re-allocated/serialized the full event history every flush interval as recordings grew longer.
  • Large instant-mode queue capacity can amplify resident memory under sustained encode/upload backpressure.

Validation

  • Environment limitation in this runner: Rust toolchain is unavailable (cargo/rustc not installed), so I could not execute Rust tests locally here.
  • I did run git diff --check to ensure clean patch formatting.

Greptile Summary

This PR reduces in-recording memory pressure by (1) rewriting cursor flush serialization to stream directly to disk via BufWriter/serde_json::to_writer instead of cloning the full event vector and pretty-printing it on every interval, and (2) cutting the instant-mode MP4 muxer queue default from 240 to 96 frames with a new per-mode env override.

  • cursor.rs: The streaming serialization path avoids the O(n) allocation on every 5-second tick; change-detection tracking (last_flushed_cursor_counts) correctly skips no-op flushes when no new events have arrived.
  • macos.rs: The env-var precedence logic (CAP_MP4_MUXER_BUFFER_SIZE_INSTANT \u2192 CAP_MP4_MUXER_BUFFER_SIZE \u2192 default) is clean, and the new tests cover the key precedence/fallback combinations.

Confidence Score: 3/5

The muxer queue change is safe to merge; the cursor serialization rewrite has a real silent-data-loss gap in the BufWriter flush path that should be addressed before shipping.

The cursor flush path drops the BufWriter without an explicit flush call — on a disk-full or I/O error, the final buffered bytes are silently discarded and the cursor file is left truncated with no log message. This can happen during any flush in a long recording and would produce corrupted cursor replay data.

crates/recording/src/cursor.rs — specifically the BufWriter drop path in flush_cursor_data.

Important Files Changed

Filename Overview
crates/recording/src/cursor.rs Replaces full-copy serialization with BufWriter-based streaming write and adds change-detection to skip no-op flushes; BufWriter is not explicitly flushed, risking silent truncation on disk errors.
crates/recording/src/output_pipeline/macos.rs Reduces instant-mode muxer queue default from 240 to 96 and adds a per-mode env override; new tests correctly cover precedence but are not isolated from parallel test execution.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
crates/recording/src/cursor.rs:66-73
Silent flush failure on BufWriter drop. `serde_json::to_writer` serializes in chunks via `Write::write_all`, so BufWriter's internal buffer typically still holds the last few KB when serialization finishes. Rust's `BufWriter` drop calls `flush()` but discards the error — if it fails (e.g. disk full), the cursor file is silently truncated. An explicit `writer.flush()` before the block ends captures and logs that error.

```suggestion
            let mut writer = BufWriter::new(file);
            if let Err(e) = serde_json::to_writer(&mut writer, &events) {
                tracing::error!(
                    "Failed to serialize cursor data to {}: {}",
                    output_path.display(),
                    e
                );
            } else if let Err(e) = writer.flush() {
                tracing::error!(
                    "Failed to flush cursor data to {}: {}",
                    output_path.display(),
                    e
                );
            }
```

### Issue 2 of 2
crates/recording/src/output_pipeline/macos.rs:1237-1281
Env-var tests are not safe to run in parallel. Both new tests (`instant_env_override_takes_precedence_over_global_override` and `invalid_instant_override_falls_back_to_global_override`) mutate process-wide env vars (`CAP_MP4_MUXER_BUFFER_SIZE` and `CAP_MP4_MUXER_BUFFER_SIZE_INSTANT`) without any locking. Because `cargo test` runs unit tests on multiple threads by default, these tests — and the pre-existing ones in this module — can observe each other's `set_var`/`remove_var` calls, producing spurious failures or incorrect passing results. The `serial_test` crate (or a shared `Mutex` in a `lazy_static`) is the idiomatic fix.

Reviews (1): Last reviewed commit: "perf(recording): reduce cursor flush ove..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

@superagent-security superagent-security Bot added the contributor:verified Contributor passed trust analysis. label May 16, 2026
@JYZ-LESLIE
Copy link
Copy Markdown
Author

/claim #109

@superagent-security superagent-security Bot added the pr:verified PR passed security analysis. label May 16, 2026
Comment on lines +66 to +73
let mut writer = BufWriter::new(file);
if let Err(e) = serde_json::to_writer(&mut writer, &events) {
tracing::error!(
"Failed to serialize cursor data to {}: {}",
output_path.display(),
e
);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Silent flush failure on BufWriter drop. serde_json::to_writer serializes in chunks via Write::write_all, so BufWriter's internal buffer typically still holds the last few KB when serialization finishes. Rust's BufWriter drop calls flush() but discards the error — if it fails (e.g. disk full), the cursor file is silently truncated. An explicit writer.flush() before the block ends captures and logs that error.

Suggested change
let mut writer = BufWriter::new(file);
if let Err(e) = serde_json::to_writer(&mut writer, &events) {
tracing::error!(
"Failed to serialize cursor data to {}: {}",
output_path.display(),
e
);
}
let mut writer = BufWriter::new(file);
if let Err(e) = serde_json::to_writer(&mut writer, &events) {
tracing::error!(
"Failed to serialize cursor data to {}: {}",
output_path.display(),
e
);
} else if let Err(e) = writer.flush() {
tracing::error!(
"Failed to flush cursor data to {}: {}",
output_path.display(),
e
);
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/recording/src/cursor.rs
Line: 66-73

Comment:
Silent flush failure on BufWriter drop. `serde_json::to_writer` serializes in chunks via `Write::write_all`, so BufWriter's internal buffer typically still holds the last few KB when serialization finishes. Rust's `BufWriter` drop calls `flush()` but discards the error — if it fails (e.g. disk full), the cursor file is silently truncated. An explicit `writer.flush()` before the block ends captures and logs that error.

```suggestion
            let mut writer = BufWriter::new(file);
            if let Err(e) = serde_json::to_writer(&mut writer, &events) {
                tracing::error!(
                    "Failed to serialize cursor data to {}: {}",
                    output_path.display(),
                    e
                );
            } else if let Err(e) = writer.flush() {
                tracing::error!(
                    "Failed to flush cursor data to {}: {}",
                    output_path.display(),
                    e
                );
            }
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 1237 to +1281
@@ -1246,6 +1263,22 @@ mod tests {
assert_eq!(normal, DEFAULT_MP4_MUXER_BUFFER_SIZE);
assert_eq!(instant, DEFAULT_MP4_MUXER_BUFFER_SIZE_INSTANT);
}

#[test]
fn invalid_instant_override_falls_back_to_global_override() {
unsafe {
std::env::set_var("CAP_MP4_MUXER_BUFFER_SIZE", "80");
std::env::set_var("CAP_MP4_MUXER_BUFFER_SIZE_INSTANT", "not_a_number");
}
let normal = get_mp4_muxer_buffer_size(false);
let instant = get_mp4_muxer_buffer_size(true);
unsafe {
std::env::remove_var("CAP_MP4_MUXER_BUFFER_SIZE");
std::env::remove_var("CAP_MP4_MUXER_BUFFER_SIZE_INSTANT");
}
assert_eq!(normal, 80);
assert_eq!(instant, 80);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Env-var tests are not safe to run in parallel. Both new tests (instant_env_override_takes_precedence_over_global_override and invalid_instant_override_falls_back_to_global_override) mutate process-wide env vars (CAP_MP4_MUXER_BUFFER_SIZE and CAP_MP4_MUXER_BUFFER_SIZE_INSTANT) without any locking. Because cargo test runs unit tests on multiple threads by default, these tests — and the pre-existing ones in this module — can observe each other's set_var/remove_var calls, producing spurious failures or incorrect passing results. The serial_test crate (or a shared Mutex in a lazy_static) is the idiomatic fix.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/recording/src/output_pipeline/macos.rs
Line: 1237-1281

Comment:
Env-var tests are not safe to run in parallel. Both new tests (`instant_env_override_takes_precedence_over_global_override` and `invalid_instant_override_falls_back_to_global_override`) mutate process-wide env vars (`CAP_MP4_MUXER_BUFFER_SIZE` and `CAP_MP4_MUXER_BUFFER_SIZE_INSTANT`) without any locking. Because `cargo test` runs unit tests on multiple threads by default, these tests — and the pre-existing ones in this module — can observe each other's `set_var`/`remove_var` calls, producing spurious failures or incorrect passing results. The `serial_test` crate (or a shared `Mutex` in a `lazy_static`) is the idiomatic fix.

How can I resolve this? If you propose a fix, please make it concise.

@JYZ-LESLIE
Copy link
Copy Markdown
Author

Addressed both Greptile findings in 97f8b7c:

  • crates/recording/src/cursor.rs: added explicit writer.flush() + error logging after serde_json::to_writer so cursor snapshot flush failures are surfaced (no silent truncation on drop).
  • crates/recording/src/output_pipeline/macos.rs: wrapped all env-mutating muxer buffer tests with a shared mutex guard and cleanup helper to prevent parallel test interference.

Validation note: Rust toolchain is not available in this runner, so I could not execute cargo test locally here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🙋 Bounty claim contributor:verified Contributor passed trust analysis. pr:verified PR passed security analysis.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant