feat(base): disable proactive memory compaction in guest#2553
Conversation
kcompactd's background migrations dirty destination 2 MiB host hugepages without producing a snapshot-aligned benefit, inflating memfile diffs. We trigger compaction explicitly pre-pause instead.
PR SummaryMedium Risk Overview Reviewed by Cursor Bugbot for commit 479a399. Bugbot is set up for automated code reviews on this repo. Configure here. |
There was a problem hiding this comment.
LGTM, simple sysctl tweak with clear rationale tied to the pre-pause compaction trigger.
Extended reasoning...
Overview
Single-line addition to packages/orchestrator/pkg/template/build/phases/base/provision.sh that appends vm.compaction_proactiveness=0 to /etc/sysctl.conf in the base template, disabling kcompactd background page migration in the guest.
Security risks
None. This is a kernel tunable inside the guest VM that controls when the kernel does background memory compaction. It does not affect auth, isolation, or any host-facing surface.
Level of scrutiny
Low. The change is purely a guest-side performance/snapshot-size optimization with a well-explained rationale: with 2 MiB hugepage backing, kcompactd-driven page migrations dirty destination hugepages from the host UFFD perspective and bloat memfile diffs without workload benefit. The PR description references the companion PR (#2551) that adds an explicit pre-pause compact_memory write so compaction still happens deterministically at the right moment.
Other factors
The pattern matches the existing fs.inotify.max_user_watches line directly above it, the comment explains the why clearly, and existing templates inherit the change only on rebuild so rollout is gradual. No bug reports were filed.
matthewlouisbrockman
left a comment
There was a problem hiding this comment.
lgtm, we need to make sure 2551 goes out at same time/first?
Adds an opt-in pre-pause step that runs `sync`, `drop_caches`,
`compact_memory`, and `fstrim -av` on the live VM via envd's Process
service to shrink the memfile/rootfs diff. Each step is wrapped in
`timeout -s KILL` with its own cap, so a stuck step (most realistically
a slow `sync` on a large dirty backlog) cannot starve the rest — and a
killed step does not abort the chain (`;`-separated, not `&&`).
Pausing FC is unaffected by an in-flight guest `sync` we time out: FC
only drains in-flight virtio I/O before completing the pause; any
unflushed dirty pages stay in the memfile snapshot and converge on
resume. Per-step timeouts trade reclaim payoff, never correctness —
`drop_caches` is documented non-destructive, `fstrim` consults FS
allocation metadata not pagecache, and a partial `compact_memory` is
just less-compacted.
Disabled by default — the LD flag's null default leaves every step at 0
(skipped). Missing keys, zero, negative, and wrong-type values all
collapse to "skip". The orchestrator skips the envd call entirely when
the chain is empty. The outer `Connect-Timeout-Ms` is the sum of
per-step caps plus a small slack.
Single LD flag, one rule per cohort:
- `guest-pause-reclaim` (JSON) — per-step caps in milliseconds keyed by
step name, evaluated against sandbox / team / template LD contexts so
targeting is configured in LaunchDarkly.
Example value:
```json
{"sync":500,"drop_caches":200,"compact_memory":1000,"fstrim":500}
```
`resume-build` exposes `-reclaim` to inject the example values into the
offline LD store for local testing.
Pairs cleanly with #2553 (disable proactive compaction in the guest base
image), but is independent of it and of FPH (#2552). Split out from
#2550.
Adds
vm.compaction_proactiveness=0to the base template's/etc/sysctl.confso kcompactd no longer runs background page migrations in the guest.With 2 MiB host-side hugepage backing of guest RAM, every migration dirties a destination hugepage from the host UFFD's perspective and lands in the next memfile diff — with no snapshot-aligned benefit. The pre-pause
compact_memorywrite (#2551) does the work deterministically right before we capture state.Existing templates inherit the change on rebuild.