Skip to content

SRE-3755 ci: Add NFS mount retry to handle transient server readiness#18244

Open
ryon-jensen wants to merge 1 commit into
release/2.8from
ryon-jensen/2.8/SRE-3775_mount_nfs
Open

SRE-3755 ci: Add NFS mount retry to handle transient server readiness#18244
ryon-jensen wants to merge 1 commit into
release/2.8from
ryon-jensen/2.8/SRE-3775_mount_nfs

Conversation

@ryon-jensen
Copy link
Copy Markdown
Contributor

The NFS mount in test_main_prep_node.sh can fail with "access denied" when the NFS server on FIRST_NODE hasn't fully registered its exports before client nodes attempt to mount. This is a race between setup_nfs.sh completing on the server and clush launching test_main_prep_node.sh on all nodes simultaneously.

Add a retry loop (3 attempts, 5s apart) around the mount call, and on final failure print showmount/getent diagnostics to aid debugging. Also tighten the /proc/mounts grep to avoid false substring matches.

backport of … #18118

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

…#18118)

The NFS mount in test_main_prep_node.sh can fail with "access denied"
when the NFS server on FIRST_NODE hasn't fully registered its exports
before client nodes attempt to mount. This is a race between
setup_nfs.sh completing on the server and clush launching
test_main_prep_node.sh on all nodes simultaneously.

Add a retry loop (3 attempts, 5s apart) around the mount call, and
on final failure print showmount/getent diagnostics to aid debugging.
Also tighten the /proc/mounts grep to avoid false substring matches.

Signed-off-by: Ryon Jensen <ryon.jensen@hpe.com>
@github-actions
Copy link
Copy Markdown

Errors are Unable to load ticket data
https://daosio.atlassian.net/browse/SRE-3755

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants