[debian/rules]: atomic symlink replacement for /lib/modules/<KVER>/{build,source}#39
Open
bhouse-nexthop wants to merge 1 commit into
Conversation
…uild,source}
Replace the racy `rm` + `ln -s` pair with `ln -sfn`. The two-step
form leaves a window of tens-to-hundreds of milliseconds where
/lib/modules/$(KVER_ARCH)/build and /lib/modules/$(KVER_ARCH)/source
do not exist on disk. `ln -sfn` performs the unlink + symlink in a
single rename(2) syscall via a hidden temp path, so the target name
is never absent from a concurrent observer's point of view.
This race fails sibling platform-modules-* recipes in
sonic-buildimage that build concurrently. Each of those recipes
iterates over its MODULE_DIRS calling
make -C /lib/modules/$(KVER_ARCH)/build M=... clean
per platform. If a single iteration lands in the window between
this recipe's `rm` and `ln -s`, GNU make's chdir fails with
"No such file or directory" on the kernel build dir, aborting the
whole recipe. Observed in a sonic-buildimage Broadcom build:
make[3]: *** /lib/modules/6.12.41+deb13-sonic-amd64/build: \
No such file or directory. Stop.
[...]
make: *** [slave.mk:915: target/debs/trixie/\
platform-modules-z9100_1.1_amd64.deb] Error 1
The dell platform-modules iteration succeeded for the first six
platforms (s6000 .. s5224f) and failed on the seventh (s5232f),
exactly the kind of stochastic window-strike that a non-atomic
symlink replacement produces.
The non-DNX variant `saibcm-modules/debian/rules` already uses
this idiom (lines 92-93 at HEAD of sdk-6.5.32-gpl):
cd /; sudo ln -sfn /usr/src/linux-headers-$(KVER_COMMON)/ \
/lib/modules/$(KVER_ARCH)/source
cd /; sudo ln -sfn /usr/src/linux-headers-$(KVER_ARCH)/ \
/lib/modules/$(KVER_ARCH)/build
so this is a delta the DNX branches missed. Behaviorally identical
when no concurrent observer is racing; race-free when one is.
Signed-off-by: Brad House <bhouse@nexthop.ai>
|
/azp run |
|
No pipelines are associated with this pull request. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace the racy
rm+ln -spair indebian/ruleswithln -sfnto make the/lib/modules/$(KVER_ARCH)/{build,source}symlink replacement atomic.Why this matters
The current pattern at
debian/rules:92-95(lines from sdk-6.5.35-dnx):leaves a window of tens to hundreds of milliseconds where
/lib/modules/$(KVER_ARCH)/buildand/lib/modules/$(KVER_ARCH)/sourcedo not exist on disk between thermand theln -s.ln -sfnperforms the unlink + symlink in a singlerename(2)syscall (via a hidden temp path), so the target name is never absent from a concurrent observer's point of view.Why this is observable
This
debian/rulesruns inside sonic-buildimage as part of the saibcm-modules-dnx submodule. sonic-buildimage drives parallel package builds (SONIC_BUILD_JOBS=N), and many sibling platform-modules-* recipes (Dell, Delta, Ingrasys, Inventec, Mitac, Accton, Centec, Marvell-Teralynx, Nephos, etc.) iterate over theirMODULE_DIRScallingper platform. If a single iteration lands in the window between this recipe's
rmandln -s, GNU make'schdir(/lib/modules/.../build)returnsENOENTand it aborts withfailing the whole sibling recipe. Observed during a sonic-buildimage Broadcom build (
SONIC_BUILD_JOBS=12), where the Dell platform-modules recipe's post-build clean iteration succeeded for the first six platforms (s6000..s5224f) and failed on the seventh (s5232f):That's exactly the kind of stochastic window-strike a non-atomic symlink replacement produces — first 6 iterations pass because the window happened to be quiet, then one iteration lands during a saibcm-modules-dnx (or opennsl-modules-dnx) build-arch-stamp re-link.
Cross-reference: non-DNX variant already has this fix
The sibling
saibcm-modules/debian/rules(non-DNX) already usesln -sfn:# saibcm-modules/debian/rules, lines 92-93: cd /; sudo ln -sfn /usr/src/linux-headers-$(KVER_COMMON)/ /lib/modules/$(KVER_ARCH)/source cd /; sudo ln -sfn /usr/src/linux-headers-$(KVER_ARCH)/ /lib/modules/$(KVER_ARCH)/buildSo this is a delta the DNX branches missed when copy-pasting the recipe. The fix is behaviorally identical when no concurrent observer is racing, and race-free when one is.
Affected branches
The same
rm+ln -spattern is present in (at least):sdk-6.5.35-dnx(this PR's base)sdk-6.5.34-dnx-gplsdk-6.5.32-dnx-gpl-trixiesdk-6.5.32-dnx-gplThis PR targets
sdk-6.5.35-dnxsince the buildimage submodule currently floats commits forward through there. The patch is trivial to cherry-pick to the other active DNX branches; happy to file follow-ups if maintainers prefer.Test plan
saibcm-modules/debian/rulesalready usesln -sfn; behavior is otherwise identical.SONIC_BUILD_JOBS=12Broadcom build, both pre-patch (fails) and that the only difference at fault is this non-atomic symlink replacement.