feat(ext4): disable journal on rootfs mkfs#2565
Conversation
Add ^has_journal to the mkfs feature list. The ext4 journal commits metadata every 5 s (jbd2/commit timer) regardless of guest activity, which dirties journal blocks on a fixed cadence and inflates every snapshot diff. Removing the journal also reclaims ~32 MiB of disk space at the front of the rootfs. Snapshots are taken from a paused guest, so all in-flight writes have already drained — the journal's crash-recovery guarantee across a clean pause/resume isn't needed. A guest kernel panic between snapshots can leave the rootfs inconsistent; resume always loads the previous good snapshot, so the blast radius is bounded to one in-flight customer interaction. mkfs-time only; affects newly-built bases.
PR SummaryHigh Risk Overview Reviewed by Cursor Bugbot for commit e84b9c8. Bugbot is set up for automated code reviews on this repo. Configure here. |
❌ 3 Tests Failed:
View the top 3 failed test(s) by shortest run time
View the full list of 3 ❄️ flaky test(s)
To view more test analytics, go to the Test Analytics Dashboard |
| // between snapshots can leave the rootfs inconsistent; resume always | ||
| // loads the previous good snapshot, so blast radius is bounded to | ||
| // one in-flight customer interaction. | ||
| "^has_journal", |
There was a problem hiding this comment.
Disabling the journal causes resize2fs (used in Resize and Shrink) to fail on filesystems that are not perfectly clean. Since snapshots of running VMs are crash-consistent but not cleanly unmounted, they are marked as dirty. Without a journal for automatic recovery, resize2fs will refuse to run, breaking disk scaling for resumed sandboxes.
CI Feedback 🧐A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
|
|
Close for now, will reopen after measuring later. |
Drop
has_journalfrom the rootfs mkfs feature list. The jbd2 commit timer dirties journal blocks every 5 s regardless of guest activity, inflating every snapshot diff. Also reclaims ~32 MiB at the front of the rootfs.Snapshots come from a paused guest (no in-flight writes), so the journal's crash-recovery role across pause/resume is unused.
Risk: a guest kernel panic between snapshots can leave the rootfs inconsistent — resume always loads the previous good snapshot, so blast radius is bounded to one in-flight customer interaction. Want to discuss before flipping universally — feature flag is an easy follow-up.