Conversation
Both fork paths (CoW shm and legacy IPC byte-copy) silently broke MAP_SHARED visibility across fork: the child mapped the slab MAP_PRIVATE or got a fresh byte copy, so writes from either side stayed local and never reached the kernel page cache the parent shared with the file. MAP_SHARED|MAP_ANONYMOUS, the standard parent-child IPC primitive used by Postgres and other multi-process daemons, was equally broken. Three pieces close the gap: 1. Parent-side conversion (mmap_fork_prepare_anon_shared, with commit/abort wrappers). While siblings are quiesced the fork thread walks live regions, promotes each MAP_SHARED|MAP_ANONYMOUS region without a backing fd into a memfd-style overlay (mkstemp+unlink+ftruncate, pwrite-seed from host_base, host MAP_FIXED|MAP_SHARED via the new hvf_apply_file_overlay_quiesced helper, mark_overlay_metadata_range), and pre-stages per-region dup() fds so a transient EMFILE rolls back cleanly. The candidate filter skips regions whose host-page-rounded tail would alias a neighbor mapping. The transactional commit/abort wrappers let the fork-IPC failure path roll back the in-place conversion (overlay teardown plus region metadata restore) before resuming siblings; abort validates every captured snapshot before tearing down so a sibling-drift past the quiesce timeout does not leave host VA out of sync with semantic state. forkipc.c logs a warning when abort returns a partial failure so the parent's stale state is visible in post-mortem. 2. Child-side restoration (mmap_fork_restore_overlays). The recv path now snapshots parent overlay_active/start/end (and a new parent_had_fd[] mirror) before clearing inherited state, then re-runs hvf_apply_file_overlay against the saved overlay span once SCM_RIGHTS delivers the backing fds. The inner quiesce is a no-op since no worker vCPUs exist yet. 3. Pre-existing fork-IPC alignment bug. The old recv_backing_fds filter (!MAP_ANONYMOUS && offset != -1) matched the shim region (LINUX_MAP_PRIVATE, offset 0) and ELF text segments and silently stole incoming SCM_RIGHTS fds, leaving the actual file-backed regions with backing_fd=-1. The receiver now uses parent_had_fd[] as the filter so its iteration order matches the sender's "backing_fd >= 0" filter exactly. Unassigned fds are closed instead of leaked. hvf_apply_file_overlay and hvf_remove_file_overlay are split into a public variant that handles thread_quiesce_siblings and a _quiesced inner that the parent fork-prep / abort paths call without a nested barrier. Locked in by tests/test-cross-fork-mapshared.c (3 cases: file-backed mkstemp, MAP_SHARED|MAP_ANONYMOUS, /dev/shm via shm_open). Each case verifies pre-fork seed visibility, child-write-visible-to-parent, parent-write-visible-to-child, and on-disk reconciliation. All three pass against Linux ground truth via tests/qemu-runner.sh.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Both fork paths (CoW shm and legacy IPC byte-copy) silently broke MAP_SHARED visibility across fork: the child mapped the slab MAP_PRIVATE or got a fresh byte copy, so writes from either side stayed local and never reached the kernel page cache the parent shared with the file. MAP_SHARED|MAP_ANONYMOUS, the standard parent-child IPC primitive used by Postgres and other multi-process daemons, was equally broken.
Three pieces close the gap:
hvf_apply_file_overlay and hvf_remove_file_overlay are split into a public variant that handles thread_quiesce_siblings and a _quiesced inner that the parent fork-prep / abort paths call without a nested barrier.
Locked in by tests/test-cross-fork-mapshared.c (3 cases: file-backed mkstemp, MAP_SHARED|MAP_ANONYMOUS, /dev/shm via shm_open). Each case verifies pre-fork seed visibility, child-write-visible-to-parent, parent-write-visible-to-child, and on-disk reconciliation. All three pass against Linux ground truth via tests/qemu-runner.sh.
Summary by cubic
Preserves MAP_SHARED coherence across fork for file-backed and anonymous shared mappings. Converts anonymous shared regions to memfd-backed overlays in the parent and re-applies them in the child so both processes see each other’s writes and on-disk state stays correct.
Bug Fixes
MAP_SHARED|MAP_ANONYMOUSregions without a backing fd into memfd overlays; seed bytes; installMAP_SHARED|MAP_FIXEDvia new_quiescedhelper; pre-stage per-regiondup()fds; keep siblings quiesced through SCM_RIGHTS send; transactional commit/abort with rollback validation; skip host-page-tail alias cases.parent_had_fd[], receive fds in the same order, then re-install overlays before worker vCPUs; per-region failures fall back to snapshot semantics.parent_had_fd[]; close unassigned fds; add strict checks for truncated/missing SCM_RIGHTS payloads.test-cross-fork-mapsharedcovering file-backed,MAP_SHARED|MAP_ANONYMOUS, and/dev/shm; verifies parent↔child visibility and on-disk reconciliation.Refactors
hvf_apply_file_overlay/hvf_remove_file_overlayinto public and_quiescedvariants for safe use during fork.Written for commit 1140b13. Summary will update on new commits.