Docs - Add documentation for --no-docker parameter requirements (#715)#783
Docs - Add documentation for --no-docker parameter requirements (#715)#783NJX-njx wants to merge 5 commits into
Conversation
- Add new section 'Using --no-docker on Remote Nodes' in run-superbench.md - Document that sb binary and dependencies must be pre-installed on each remote host - Describe deployment options (extract container, install from source, etc.) - Note environment variables and use cases - Update --no-docker description in cli.md with link to detailed docs Fixes microsoft#715 Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
This PR updates SuperBench documentation to clarify the requirements for running sb run --no-docker against remote nodes, addressing failures where remote hosts don’t have the sb CLI installed.
Changes:
- Added a new “Using
--no-dockeron Remote Nodes” section to describe prerequisites and deployment approaches. - Expanded the
--no-dockerCLI flag description to call out remote-node requirements and link to the detailed guide.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| docs/getting-started/run-superbench.md | Adds a dedicated section documenting remote-node prerequisites and guidance for --no-docker. |
| docs/cli.md | Updates --no-docker help text with a brief prerequisite note and link to detailed docs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | `--host-password` | `None` | Host password or key passphrase if needed. | | ||
| | `--host-username` | `None` | Host username if needed. | | ||
| | `--no-docker` | `False` | Run on host directly without Docker. | | ||
| | `--no-docker` | `False` | Run on host directly without Docker. When using remote nodes, SuperBench (`sb` binary and dependencies) must be pre-installed on each target host; otherwise `command not found` will occur. See [Run SuperBench - Using --no-docker on Remote Nodes](getting-started/run-superbench.md#using---no-docker-on-remote-nodes) for details. | |
There was a problem hiding this comment.
For readability in this table cell, consider formatting the literal error text as code (e.g., command not found) and/or shortening the row by moving the longer explanation into the linked getting-started section. Very long table cells can make the markdown harder to maintain and review.
| | `--no-docker` | `False` | Run on host directly without Docker. When using remote nodes, SuperBench (`sb` binary and dependencies) must be pre-installed on each target host; otherwise `command not found` will occur. See [Run SuperBench - Using --no-docker on Remote Nodes](getting-started/run-superbench.md#using---no-docker-on-remote-nodes) for details. | | |
| | `--no-docker` | `False` | Run on host directly without Docker. See [Run SuperBench - Using --no-docker on Remote Nodes](getting-started/run-superbench.md#using---no-docker-on-remote-nodes) for details on using this option with remote nodes. | |
|
@NJX-njx please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
polarG
left a comment
There was a problem hiding this comment.
Multi-perspective code review for PR #783
2 files, +16 / −1 (docs-only).
Dimensions reviewed: Correctness (×2 reviewers) + Maintainability (×2 reviewers). Performance / Security / Testing skipped — not exercised by a docs-only change.
Summary
| # | Severity | Title | Location |
|---|---|---|---|
| 1 | BLOCKER | (Correctness) Option C documents an unsupported "extract container filesystem" workflow | docs/getting-started/run-superbench.md L60 |
| 2 | SHOULD-FIX | (Correctness) Option B's third_party/ + "build instructions" pointer is unactionable |
docs/getting-started/run-superbench.md L59 |
| 3 | SHOULD-FIX | (Correctness) docs/cli.md cell omits the actual failing binary name (sb exec/sb) |
docs/cli.md L378 |
| 4 | SHOULD-FIX | (Correctness + Maintainability) docs/cli.md table cell is ~280 chars vs ≤80 in siblings |
docs/cli.md L378 |
| 5 | SHOULD-FIX | (Correctness) Cross-link omits the section anchor | docs/cli.md L378 |
| 6 | SHOULD-FIX | (Maintainability) Cross-link label uses non-standard "Page - Section" format with -- |
docs/cli.md L378 |
| 7 | SHOULD-FIX | (Maintainability) New section ignores file's paragraph + bash-fence + admonition style | docs/getting-started/run-superbench.md L51 |
| 8 | SHOULD-FIX | (Maintainability) New section should be H3 under ## Run, not a peer H2 |
docs/getting-started/run-superbench.md L51 |
| 9 | NON-BLOCKING | (Maintainability) No back-link from run-superbench.md to cli.md |
docs/getting-started/run-superbench.md L64 |
| 10 | NON-BLOCKING | (Maintainability) No cross-reference between pre-existing :::tip TIP and new section |
docs/getting-started/run-superbench.md L51 |
| 11 | NON-BLOCKING | (Maintainability) "execute sb exec directly" leaks an internal subcommand |
docs/getting-started/run-superbench.md L55 |
| 12 | NOTED | (Correctness) Ansible-sb exec-127 claim verified against runner.py:127 / runner.py:498 |
n/a |
| 13 | NOTED | (Correctness) SB_MICRO_PATH semantics verified against micro_base.py:182 / runner.py:94 |
n/a |
Verdict
BLOCKED — The PR correctly identifies and documents the rc=127 / command not found pitfall for sb run --no-docker against remote hosts, but Option C in the new section prescribes a workflow ("manually extract the container filesystem to the host") that is not supported anywhere in the repo, contradicts the section's own use-case statement, and would lead users into the exact failure mode the PR is trying to fix. Removing Option C plus addressing the SHOULD-FIX items (especially the deep-link anchor and the oversized cli.md table cell) is sufficient to merge.
| 2. **Deployment options:** | ||
| - **Option A:** Extract the contents of the `superbench/superbench` Docker image onto each node (e.g., copy binaries, Python environment, and micro-benchmark executables to a consistent path), then ensure `sb` is in PATH. | ||
| - **Option B:** Install SuperBench from source or pip on each node, and build/install the required micro-benchmark binaries (see `third_party/` and build instructions). | ||
| - **Option C (requires Docker on remote nodes):** If Docker is available on the remote nodes for deployment but you still want to execute benchmarks without containers, you can first use `sb deploy` to pull the image and prepare the container, then manually extract the container filesystem to the host and run subsequent `sb run --no-docker` commands against that host installation. |
There was a problem hiding this comment.
[BLOCKER] (Correctness) Option C documents an unsupported "extract container filesystem to host" workflow
Issue (verified facts):
- "Manually extract the container filesystem to the host" describes a workflow that does not exist in this repository. A grep for
docker export,docker cp,extract.*container,container filesystemreturns only this new doc. sb deploy(superbench/runner/playbooks/deploy.yaml) creates and runs thesb-workspacecontainer and keeps the rootfs inside it. There is no script, Makefile target, test, or other doc covering a rootfs extraction step.- Option C is internally contradictory with item Setup: Init - Initial setup.py and basic configs #4 ("for standard deployments, prefer
sb deploy+sb runwithout--no-docker"): it sits inside a section titled "Using--no-dockeron Remote Nodes" yet starts with "requires Docker on remote nodes".
Impact: Readers who try to follow Option C will hand-roll fragile docker export / docker cp / flows whose layouts will not match the SB_WORKSPACE / SB_MICRO_PATH expectations baked into superbench/runner/runner.py — exactly the failure class this PR is trying to prevent.
Recommendation: Delete Option C. If the underlying intent is "Docker is available but we cannot nest containers", point readers to the existing :::tip TIP block at lines 43–49 (privileged container + sb run --no-docker -l localhost). Only re-introduce a rootfs-extraction recipe once it is a supported, tested workflow shipped with the repo.
Agreement: 3/3 reviewers (severity 1 BLOCKER / 2 SHOULD-FIX, escalated to the highest per orchestrator policy).
|
|
||
| 2. **Deployment options:** | ||
| - **Option A:** Extract the contents of the `superbench/superbench` Docker image onto each node (e.g., copy binaries, Python environment, and micro-benchmark executables to a consistent path), then ensure `sb` is in PATH. | ||
| - **Option B:** Install SuperBench from source or pip on each node, and build/install the required micro-benchmark binaries (see `third_party/` and build instructions). |
There was a problem hiding this comment.
[SHOULD-FIX] (Correctness) third_party/ + "build instructions" pointer is unactionable
Issue: third_party/Makefile does exist (targets cuda, rocm, common, cuda_cutlass, …, keyed off SB_MICRO_PATH, MPI_HOME, HIP_HOME, CUDA_VER), but there is no user-facing "build third_party on the host" guide in docs/. docs/getting-started/installation.mdx documents only the control-node build (pip install . && make postinstall). A user following "see third_party/ and build instructions" lands on a Makefile with no surrounding guidance and no required env-var documentation.
Impact: Option B cannot be reproduced from this doc alone, undermining the PR's goal of preventing rc=127.
Recommendation: Replace with a concrete pointer + required vars, e.g.:
- **Option B:** On each node, install SuperBench (see
[installation](installation.mdx)) and then build the native
micro-benchmark binaries with the project Makefile:
\`\`\`bash
export SB_MICRO_PATH=/opt/superbench # must match the value used at runtime
cd third_party && make -j cuda # or `make rocm` on AMD
\`\`\`
The supported variables (`SB_MICRO_PATH`, `MPI_HOME`, `HIP_HOME`,
`CUDA_VER`, …) are defined at the top of `third_party/Makefile`.Agreement: 2/3 reviewers.
| | `--host-password` | `None` | Host password or key passphrase if needed. | | ||
| | `--host-username` | `None` | Host username if needed. | | ||
| | `--no-docker` | `False` | Run on host directly without Docker. | | ||
| | `--no-docker` | `False` | Run on host directly without Docker. When using remote nodes, SuperBench (`sb` binary and dependencies) must be pre-installed on each target host; otherwise `command not found` will occur. See [Run SuperBench - Using --no-docker on Remote Nodes](getting-started/run-superbench.md) for details. | |
There was a problem hiding this comment.
[SHOULD-FIX] (Correctness) Cell omits the actual failing binary name (sb exec / sb)
Issue: The user types sb run --no-docker … on the control node, but the command Ansible actually runs on each remote host is sb exec … (superbench/runner/runner.py:127, wrapped via bash -c '... && cd $SB_WORKSPACE && {command}' at runner.py:494-498 when self._docker_config.skip is True). The shell error on a non-prepared host is therefore sb: command not found (rc 127). A user grepping logs for sb run: command not found will not find it.
Impact: Diagnostic ambiguity for the exact failure mode the PR is trying to document (fixes #715, which is precisely this command not found / rc=127 confusion).
Recommendation: Reword the cell (combine with the anchor and label fixes — see other comments on this line):
| `--no-docker` | `False` | Run on host directly without Docker. On remote nodes, the `sb` CLI (which Ansible invokes as `sb exec`) and its dependencies must be pre-installed on every target host; otherwise the remote shell exits with `sb: command not found` (rc 127). See [Using --no-docker on Remote Nodes](getting-started/run-superbench.md#using---no-docker-on-remote-nodes). |
Agreement: 1/3 reviewers (cross-references finding 11 below).
| | `--host-password` | `None` | Host password or key passphrase if needed. | | ||
| | `--host-username` | `None` | Host username if needed. | | ||
| | `--no-docker` | `False` | Run on host directly without Docker. | | ||
| | `--no-docker` | `False` | Run on host directly without Docker. When using remote nodes, SuperBench (`sb` binary and dependencies) must be pre-installed on each target host; otherwise `command not found` will occur. See [Run SuperBench - Using --no-docker on Remote Nodes](getting-started/run-superbench.md) for details. | |
There was a problem hiding this comment.
[SHOULD-FIX] (Correctness + Maintainability) Table cell is ~280 chars vs ≤80 in every sibling row
Issue: Valid Markdown, but ~280 chars in one cell forces the rendered Description column to balloon dramatically, breaks the visual alignment of every sibling row, and duplicates content that already lives at the link target. Every other Description cell in this table is one short sentence.
Impact: Source diffing / future column-width edits become painful. Readers scanning the flag table get a wall of text in one cell instead of a uniform reference table.
Recommendation: Reduce to a single-line caveat + deep link (also fixes the anchor and label findings):
| `--no-docker` | `False` | Run on host directly without Docker. On remote nodes, `sb` must be pre-installed on every host — see [Using --no-docker on Remote Nodes](getting-started/run-superbench.md#using---no-docker-on-remote-nodes). |
Agreement: 2/3 reviewers (1 NON-BLOCKING / 1 SHOULD-FIX, escalated).
| | `--host-password` | `None` | Host password or key passphrase if needed. | | ||
| | `--host-username` | `None` | Host username if needed. | | ||
| | `--no-docker` | `False` | Run on host directly without Docker. | | ||
| | `--no-docker` | `False` | Run on host directly without Docker. When using remote nodes, SuperBench (`sb` binary and dependencies) must be pre-installed on each target host; otherwise `command not found` will occur. See [Run SuperBench - Using --no-docker on Remote Nodes](getting-started/run-superbench.md) for details. | |
There was a problem hiding this comment.
[SHOULD-FIX] (Correctness) Cross-link omits the section anchor
Issue: run-superbench.md is now 64 lines with the new section as its last block. A no-anchor link lands the reader at # Run SuperBench and forces them to scroll through ## Deploy, ## Run, and the existing :::tip TIP before reaching the content the link promises. Existing precedent in this docs tree deep-links to in-page sections:
docs/user-tutorial/system-config.md:31:[Deploy SuperBench](../getting-started/run-superbench.md#deploy)docs/user-tutorial/system-config.md:33:[Ansible Inventory](../getting-started/configuration.md#ansible-inventory)
Impact: Worse UX (extra scrolling) and inconsistent with the established #deploy / #ansible-inventory deep-link convention.
Recommendation: Append the Docusaurus-slugified fragment (verify the slug against a local yarn build once the heading is final):
[Using --no-docker on Remote Nodes](getting-started/run-superbench.md#using---no-docker-on-remote-nodes)Agreement: 3/3 reviewers.
|
|
||
| ::: | ||
|
|
||
| ## Using `--no-docker` on Remote Nodes |
There was a problem hiding this comment.
[SHOULD-FIX] (Maintainability) New section ignores the file's paragraph + bash-fence + admonition style
Issue: Every other prose unit in this file follows "1 short paragraph → ```bash fenced example → optional :::note / :::tip admonition" (## Deploy, ## Run). Confirmed by git grep -nE ':::(tip|note|caution|warning|info)' -- docs/:
docs/getting-started/installation.mdx:17::::tip Tips
docs/getting-started/installation.mdx:32::::note
docs/getting-started/installation.mdx:61::::note Note
docs/getting-started/run-superbench.md:27::::note Note
docs/getting-started/run-superbench.md:44::::tip TIP
docs/user-tutorial/baseline-generation.md:31::::tip Tips
docs/user-tutorial/result-summary.md:31::::tip Tips
The new section uses none — it is one dense 4-item bold-led numbered list, zero runnable code blocks, no admonitions. Items 1 (command not found, exit code 127) and 4 (HPC clusters with restricted container runtimes) are textbook :::caution / :::note material; Options A/B/C are command-driven yet show no commands.
Impact: Future editors will see one section that looks foreign to the rest of the page, increasing drift over time.
Recommendation: Restructure as paragraphs + 1–2 bash code fences + admonitions, e.g.:
### Using `--no-docker` on Remote Nodes
When you run `sb run --no-docker` against remote hosts (via `--host-file` or
`--host-list`), Ansible SSHes into each node and invokes the `sb` binary
directly, so SuperBench must already be installed on every target host.
\`\`\`bash
sb run --no-docker -f remote.ini -c resnet.yaml \\
--config-override superbench.env.SB_MICRO_PATH=/opt/superbench
\`\`\`
:::caution
If `sb` is not on `PATH` on a remote host, the run fails with
`sb: command not found` (exit code 127).
:::
:::note
Set `SB_MICRO_PATH` (env var or `superbench.env.SB_MICRO_PATH` via
`--config-override`) to the on-host install path of the micro-benchmark
binaries.
:::Agreement: 2/3 reviewers.
|
|
||
| ::: | ||
|
|
||
| ## Using `--no-docker` on Remote Nodes |
There was a problem hiding this comment.
[SHOULD-FIX] (Maintainability) New section should be H3 under ## Run, not a peer H2
Issue: This document's existing H2 structure is the top-level workflow narrative (## Deploy → ## Run). The new section documents requirements for one specific variant of the sb run step and is logically a Run-time caveat — the existing :::tip TIP for the same flag is already correctly nested under ## Run. Adding it as a third peer H2 implies it is a separate workflow stage and splits --no-docker guidance across two sibling H2s.
Impact: Readers and the Docusaurus sidebar/TOC will surface "Using --no-docker on Remote Nodes" as a peer to Deploy/Run.
Recommendation: Demote to H3 under ## Run, immediately after (or merged into) the existing :::tip TIP:
## Run
...
:::tip TIP
... (existing local privileged-container note) ...
:::
### Using `--no-docker` on Remote Nodes
...Agreement: 1/3 reviewers.
|
|
||
| ::: | ||
|
|
||
| ## Using `--no-docker` on Remote Nodes |
There was a problem hiding this comment.
[NON-BLOCKING] (Maintainability) No cross-reference between the pre-existing :::tip TIP and the new section
Issue: Two adjacent blocks on the same flag with no narrative linkage (tip = local privileged container; new section = remote hosts). Not strictly contradictory, but future maintainers may update one and forget the other.
Recommendation: After demoting the new section to H3 under ## Run (sibling finding), add a leading sentence: "The tip above covers running --no-docker locally inside a privileged container. The requirements below apply when --no-docker is used against remote hosts."
Agreement: 1/3 reviewers.
|
|
||
| When running `sb run` with `--no-docker` on **remote nodes** (via `--host-file` or `--host-list`), the following requirements apply: | ||
|
|
||
| 1. **SuperBench must be pre-installed on each remote node.** The `sb` CLI binary and its dependencies must be available in the PATH on every target host. Running without Docker means Ansible will SSH into each node and execute `sb exec` directly; if `sb` is not installed, you will see `command not found` (exit code 127). |
There was a problem hiding this comment.
[NON-BLOCKING] (Maintainability) "execute sb exec directly" leaks an internal subcommand
Issue: sb exec is the actual remote command (superbench/runner/runner.py:127, superbench/runner/playbooks/cleanup.yaml:5-7) but it is not documented as a user-facing command in docs/cli.md (which lists sb deploy, sb run, sb result …, etc.). Exposing it by name without explanation couples user docs to an implementation detail of runner.py — a future rename will silently rot this line.
Note: This conflicts with the suggestion to name sb exec in cli.md for diagnostic clarity (see the docs/cli.md line 378 comment about the failing binary). Reconcile by: in cli.md, say "the failing binary is sb"; in run-superbench.md, drop the sb exec reference and say "invokes the sb binary directly".
Recommendation: Rephrase, e.g.: "Ansible SSHes into each node and invokes the sb binary directly; if sb is not on PATH you will see sb: command not found (exit code 127)."
Agreement: 1/3 reviewers.
|
|
||
| 3. **Environment configuration:** Ensure the `SB_MICRO_PATH` environment variable is set on each remote node so that it matches the on-host installation path of SuperBench micro-benchmark binaries when using `--no-docker`. Alternatively, you can set the config key `superbench.env.SB_MICRO_PATH` via `--config-override` so that SuperBench exports this environment variable for remote executions. | ||
|
|
||
| 4. **Use case:** `--no-docker` is intended for environments where Docker-in-Docker or nested containers are not supported (e.g., certain Kubernetes setups, HPC clusters with restricted container runtimes). For standard deployments, prefer `sb deploy` + `sb run` without `--no-docker`. |
There was a problem hiding this comment.
[NON-BLOCKING] (Maintainability) No back-link from run-superbench.md to cli.md
Issue: cli.md now links forward to this new section, but the new section never references cli.md. Other tutorial pages routinely back-link to the canonical flag table:
docs/user-tutorial/baseline-generation.md:17: ... [SuperBench CLI](../cli.md).
docs/user-tutorial/result-summary.md:17: ... [SuperBench CLI](../cli.md).
docs/user-tutorial/data-diagnosis.md:17: ... [SuperBench CLI](../cli.md).
Readers landing on the new section have no pointer back to --host-file, --host-list, --config-override (all referenced by name).
Recommendation: Add a one-liner at the top or bottom of the new section:
For the full list of flags accepted by `sb run`, see [SuperBench CLI](../cli.md#sb-run).Agreement: 2/3 reviewers.
Summary
Fixes #715 - Documents the requirements and expectations when using
--no-dockeron remote nodes.Problem
Users running
sb run --no-dockeron remote nodes encounteredcommand not found(rc=127) because the documentation did not explain that SuperBench must be pre-installed on each target host.Changes
sbCLI and dependencies must be pre-installed on each remote nodeSB_MICRO_PATH)--no-dockerdescription with brief requirements and link to detailed docs