Skip to content

feat(base): disable proactive memory compaction in guest#2553

Merged
ValentaTomas merged 1 commit into
mainfrom
feat/disable-proactive-compaction-base
May 4, 2026
Merged

feat(base): disable proactive memory compaction in guest#2553
ValentaTomas merged 1 commit into
mainfrom
feat/disable-proactive-compaction-base

Conversation

@ValentaTomas
Copy link
Copy Markdown
Member

Adds vm.compaction_proactiveness=0 to the base template's /etc/sysctl.conf so kcompactd no longer runs background page migrations in the guest.

With 2 MiB host-side hugepage backing of guest RAM, every migration dirties a destination hugepage from the host UFFD's perspective and lands in the next memfile diff — with no snapshot-aligned benefit. The pre-pause compact_memory write (#2551) does the work deterministically right before we capture state.

Existing templates inherit the change on rebuild.

kcompactd's background migrations dirty destination 2 MiB host hugepages
without producing a snapshot-aligned benefit, inflating memfile diffs.
We trigger compaction explicitly pre-pause instead.
@cursor
Copy link
Copy Markdown

cursor Bot commented May 4, 2026

PR Summary

Medium Risk
Kernel memory-tuning change applied to all rebuilt base templates could increase memory fragmentation or allocation latency under certain workloads; no direct security impact.

Overview
Updates the base template provisioning to append vm.compaction_proactiveness=0 to /etc/sysctl.conf, disabling kcompactd’s background memory compaction so snapshot diffs aren’t inflated by host-visible page migrations and compaction can be triggered deterministically before pause.

Reviewed by Cursor Bugbot for commit 479a399. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, simple sysctl tweak with clear rationale tied to the pre-pause compaction trigger.

Extended reasoning...

Overview

Single-line addition to packages/orchestrator/pkg/template/build/phases/base/provision.sh that appends vm.compaction_proactiveness=0 to /etc/sysctl.conf in the base template, disabling kcompactd background page migration in the guest.

Security risks

None. This is a kernel tunable inside the guest VM that controls when the kernel does background memory compaction. It does not affect auth, isolation, or any host-facing surface.

Level of scrutiny

Low. The change is purely a guest-side performance/snapshot-size optimization with a well-explained rationale: with 2 MiB hugepage backing, kcompactd-driven page migrations dirty destination hugepages from the host UFFD perspective and bloat memfile diffs without workload benefit. The PR description references the companion PR (#2551) that adds an explicit pre-pause compact_memory write so compaction still happens deterministically at the right moment.

Other factors

The pattern matches the existing fs.inotify.max_user_watches line directly above it, the comment explains the why clearly, and existing templates inherit the change only on rebuild so rollout is gradual. No bug reports were filed.

@ValentaTomas ValentaTomas enabled auto-merge (squash) May 4, 2026 01:44
@ValentaTomas ValentaTomas requested a review from arkamar May 4, 2026 08:52
Copy link
Copy Markdown
Contributor

@matthewlouisbrockman matthewlouisbrockman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, we need to make sure 2551 goes out at same time/first?

@ValentaTomas ValentaTomas merged commit 0892051 into main May 4, 2026
48 checks passed
@ValentaTomas ValentaTomas deleted the feat/disable-proactive-compaction-base branch May 4, 2026 18:07
ValentaTomas added a commit that referenced this pull request May 8, 2026
Adds an opt-in pre-pause step that runs `sync`, `drop_caches`,
`compact_memory`, and `fstrim -av` on the live VM via envd's Process
service to shrink the memfile/rootfs diff. Each step is wrapped in
`timeout -s KILL` with its own cap, so a stuck step (most realistically
a slow `sync` on a large dirty backlog) cannot starve the rest — and a
killed step does not abort the chain (`;`-separated, not `&&`).

Pausing FC is unaffected by an in-flight guest `sync` we time out: FC
only drains in-flight virtio I/O before completing the pause; any
unflushed dirty pages stay in the memfile snapshot and converge on
resume. Per-step timeouts trade reclaim payoff, never correctness —
`drop_caches` is documented non-destructive, `fstrim` consults FS
allocation metadata not pagecache, and a partial `compact_memory` is
just less-compacted.

Disabled by default — the LD flag's null default leaves every step at 0
(skipped). Missing keys, zero, negative, and wrong-type values all
collapse to "skip". The orchestrator skips the envd call entirely when
the chain is empty. The outer `Connect-Timeout-Ms` is the sum of
per-step caps plus a small slack.

Single LD flag, one rule per cohort:

- `guest-pause-reclaim` (JSON) — per-step caps in milliseconds keyed by
step name, evaluated against sandbox / team / template LD contexts so
targeting is configured in LaunchDarkly.

Example value:

```json
{"sync":500,"drop_caches":200,"compact_memory":1000,"fstrim":500}
```

`resume-build` exposes `-reclaim` to inject the example values into the
offline LD store for local testing.

Pairs cleanly with #2553 (disable proactive compaction in the guest base
image), but is independent of it and of FPH (#2552). Split out from
#2550.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants