Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -375,7 +375,7 @@ sb run [--config-file]
| `--host-list` `-l` | `None` | Comma separated host list. |
| `--host-password` | `None` | Host password or key passphrase if needed. |
| `--host-username` | `None` | Host username if needed. |
| `--no-docker` | `False` | Run on host directly without Docker. |
| `--no-docker` | `False` | Run on host directly without Docker. When using remote nodes, SuperBench (`sb` binary and dependencies) must be pre-installed on each target host; otherwise `command not found` will occur. See [Run SuperBench - Using --no-docker on Remote Nodes](getting-started/run-superbench.md) for details. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD-FIX] (Correctness) Cell omits the actual failing binary name (sb exec / sb)

Issue: The user types sb run --no-docker … on the control node, but the command Ansible actually runs on each remote host is sb exec … (superbench/runner/runner.py:127, wrapped via bash -c '... && cd $SB_WORKSPACE && {command}' at runner.py:494-498 when self._docker_config.skip is True). The shell error on a non-prepared host is therefore sb: command not found (rc 127). A user grepping logs for sb run: command not found will not find it.

Impact: Diagnostic ambiguity for the exact failure mode the PR is trying to document (fixes #715, which is precisely this command not found / rc=127 confusion).

Recommendation: Reword the cell (combine with the anchor and label fixes — see other comments on this line):

| `--no-docker` | `False` | Run on host directly without Docker. On remote nodes, the `sb` CLI (which Ansible invokes as `sb exec`) and its dependencies must be pre-installed on every target host; otherwise the remote shell exits with `sb: command not found` (rc 127). See [Using --no-docker on Remote Nodes](getting-started/run-superbench.md#using---no-docker-on-remote-nodes). |

Agreement: 1/3 reviewers (cross-references finding 11 below).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD-FIX] (Correctness + Maintainability) Table cell is ~280 chars vs ≤80 in every sibling row

Issue: Valid Markdown, but ~280 chars in one cell forces the rendered Description column to balloon dramatically, breaks the visual alignment of every sibling row, and duplicates content that already lives at the link target. Every other Description cell in this table is one short sentence.

Impact: Source diffing / future column-width edits become painful. Readers scanning the flag table get a wall of text in one cell instead of a uniform reference table.

Recommendation: Reduce to a single-line caveat + deep link (also fixes the anchor and label findings):

| `--no-docker` | `False` | Run on host directly without Docker. On remote nodes, `sb` must be pre-installed on every host — see [Using --no-docker on Remote Nodes](getting-started/run-superbench.md#using---no-docker-on-remote-nodes). |

Agreement: 2/3 reviewers (1 NON-BLOCKING / 1 SHOULD-FIX, escalated).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD-FIX] (Correctness) Cross-link omits the section anchor

Issue: run-superbench.md is now 64 lines with the new section as its last block. A no-anchor link lands the reader at # Run SuperBench and forces them to scroll through ## Deploy, ## Run, and the existing :::tip TIP before reaching the content the link promises. Existing precedent in this docs tree deep-links to in-page sections:

  • docs/user-tutorial/system-config.md:31: [Deploy SuperBench](../getting-started/run-superbench.md#deploy)
  • docs/user-tutorial/system-config.md:33: [Ansible Inventory](../getting-started/configuration.md#ansible-inventory)

Impact: Worse UX (extra scrolling) and inconsistent with the established #deploy / #ansible-inventory deep-link convention.

Recommendation: Append the Docusaurus-slugified fragment (verify the slug against a local yarn build once the heading is final):

[Using --no-docker on Remote Nodes](getting-started/run-superbench.md#using---no-docker-on-remote-nodes)

Agreement: 3/3 reviewers.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD-FIX] (Maintainability) Cross-link label uses non-standard "Page - Section" format

Issue: The new link is the only "Page - Section" composite label in docs/. Every other inter-doc link uses a short, single-page label:

  • docs/user-tutorial/baseline-generation.md:17: [SuperBench CLI](../cli.md)
  • docs/user-tutorial/system-config.md:31: [Deploy SuperBench](../getting-started/run-superbench.md#deploy)
  • docs/user-tutorial/benchmarks/model-benchmarks.md:133: [Result Summary](../result-summary.md)
  • docs/superbench-config.mdx:176: [micro-benchmark](./user-tutorial/benchmarks/micro-benchmarks.md)
  • docs/cli.md:109 (this same file): [here](./user-tutorial/container-images.mdx)

Also, Docusaurus / typography pipelines may render the literal -- in the label as an en-dash, visually mangling the flag name.

Impact: Inconsistent navigation labels; future editors will copy whichever style they see first. Also a real risk that --no-docker no longer renders as the actual flag spelling.

Recommendation: Adopt the established short-label form (combined with the anchor fix above):

See [Using --no-docker on Remote Nodes](getting-started/run-superbench.md#using---no-docker-on-remote-nodes).

Agreement: 2/3 reviewers.

| `--output-dir` | `None` | Path to output directory, outputs/{datetime} will be used if not specified. |
| `--private-key` | `None` | Path to private key if needed. |

Expand Down
15 changes: 15 additions & 0 deletions docs/getting-started/run-superbench.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,18 @@ You can create a privileged container with `superbench/superbench` image, skip `
`sb run --no-docker -l localhost -c resnet.yaml`.

:::

## Using `--no-docker` on Remote Nodes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD-FIX] (Maintainability) New section ignores the file's paragraph + bash-fence + admonition style

Issue: Every other prose unit in this file follows "1 short paragraph → ```bash fenced example → optional :::note / :::tip admonition" (## Deploy, ## Run). Confirmed by git grep -nE ':::(tip|note|caution|warning|info)' -- docs/:

docs/getting-started/installation.mdx:17::::tip Tips
docs/getting-started/installation.mdx:32::::note
docs/getting-started/installation.mdx:61::::note Note
docs/getting-started/run-superbench.md:27::::note Note
docs/getting-started/run-superbench.md:44::::tip TIP
docs/user-tutorial/baseline-generation.md:31::::tip Tips
docs/user-tutorial/result-summary.md:31::::tip Tips

The new section uses none — it is one dense 4-item bold-led numbered list, zero runnable code blocks, no admonitions. Items 1 (command not found, exit code 127) and 4 (HPC clusters with restricted container runtimes) are textbook :::caution / :::note material; Options A/B/C are command-driven yet show no commands.

Impact: Future editors will see one section that looks foreign to the rest of the page, increasing drift over time.

Recommendation: Restructure as paragraphs + 1–2 bash code fences + admonitions, e.g.:

### Using `--no-docker` on Remote Nodes

When you run `sb run --no-docker` against remote hosts (via `--host-file` or
`--host-list`), Ansible SSHes into each node and invokes the `sb` binary
directly, so SuperBench must already be installed on every target host.

\`\`\`bash
sb run --no-docker -f remote.ini -c resnet.yaml \\
  --config-override superbench.env.SB_MICRO_PATH=/opt/superbench
\`\`\`

:::caution
If `sb` is not on `PATH` on a remote host, the run fails with
`sb: command not found` (exit code 127).
:::

:::note
Set `SB_MICRO_PATH` (env var or `superbench.env.SB_MICRO_PATH` via
`--config-override`) to the on-host install path of the micro-benchmark
binaries.
:::

Agreement: 2/3 reviewers.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD-FIX] (Maintainability) New section should be H3 under ## Run, not a peer H2

Issue: This document's existing H2 structure is the top-level workflow narrative (## Deploy## Run). The new section documents requirements for one specific variant of the sb run step and is logically a Run-time caveat — the existing :::tip TIP for the same flag is already correctly nested under ## Run. Adding it as a third peer H2 implies it is a separate workflow stage and splits --no-docker guidance across two sibling H2s.

Impact: Readers and the Docusaurus sidebar/TOC will surface "Using --no-docker on Remote Nodes" as a peer to Deploy/Run.

Recommendation: Demote to H3 under ## Run, immediately after (or merged into) the existing :::tip TIP:

## Run
...
:::tip TIP
... (existing local privileged-container note) ...
:::

### Using `--no-docker` on Remote Nodes
...

Agreement: 1/3 reviewers.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[NON-BLOCKING] (Maintainability) No cross-reference between the pre-existing :::tip TIP and the new section

Issue: Two adjacent blocks on the same flag with no narrative linkage (tip = local privileged container; new section = remote hosts). Not strictly contradictory, but future maintainers may update one and forget the other.

Recommendation: After demoting the new section to H3 under ## Run (sibling finding), add a leading sentence: "The tip above covers running --no-docker locally inside a privileged container. The requirements below apply when --no-docker is used against remote hosts."

Agreement: 1/3 reviewers.


When running `sb run` with `--no-docker` on **remote nodes** (via `--host-file` or `--host-list`), the following requirements apply:

1. **SuperBench must be pre-installed on each remote node.** The `sb` CLI binary and its dependencies must be available in the PATH on every target host. Running without Docker means Ansible will SSH into each node and execute `sb exec` directly; if `sb` is not installed, you will see `command not found` (exit code 127).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[NON-BLOCKING] (Maintainability) "execute sb exec directly" leaks an internal subcommand

Issue: sb exec is the actual remote command (superbench/runner/runner.py:127, superbench/runner/playbooks/cleanup.yaml:5-7) but it is not documented as a user-facing command in docs/cli.md (which lists sb deploy, sb run, sb result …, etc.). Exposing it by name without explanation couples user docs to an implementation detail of runner.py — a future rename will silently rot this line.

Note: This conflicts with the suggestion to name sb exec in cli.md for diagnostic clarity (see the docs/cli.md line 378 comment about the failing binary). Reconcile by: in cli.md, say "the failing binary is sb"; in run-superbench.md, drop the sb exec reference and say "invokes the sb binary directly".

Recommendation: Rephrase, e.g.: "Ansible SSHes into each node and invokes the sb binary directly; if sb is not on PATH you will see sb: command not found (exit code 127)."

Agreement: 1/3 reviewers.


2. **Deployment options:**
- **Option A:** Extract the contents of the `superbench/superbench` Docker image onto each node (e.g., copy binaries, Python environment, and micro-benchmark executables to a consistent path), then ensure `sb` is in PATH.
- **Option B:** Install SuperBench from source or pip on each node, and build/install the required micro-benchmark binaries (see `third_party/` and build instructions).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD-FIX] (Correctness) third_party/ + "build instructions" pointer is unactionable

Issue: third_party/Makefile does exist (targets cuda, rocm, common, cuda_cutlass, …, keyed off SB_MICRO_PATH, MPI_HOME, HIP_HOME, CUDA_VER), but there is no user-facing "build third_party on the host" guide in docs/. docs/getting-started/installation.mdx documents only the control-node build (pip install . && make postinstall). A user following "see third_party/ and build instructions" lands on a Makefile with no surrounding guidance and no required env-var documentation.

Impact: Option B cannot be reproduced from this doc alone, undermining the PR's goal of preventing rc=127.

Recommendation: Replace with a concrete pointer + required vars, e.g.:

   - **Option B:** On each node, install SuperBench (see
     [installation](installation.mdx)) and then build the native
     micro-benchmark binaries with the project Makefile:
     \`\`\`bash
     export SB_MICRO_PATH=/opt/superbench  # must match the value used at runtime
     cd third_party && make -j cuda   # or `make rocm` on AMD
     \`\`\`
     The supported variables (`SB_MICRO_PATH`, `MPI_HOME`, `HIP_HOME`,
     `CUDA_VER`, …) are defined at the top of `third_party/Makefile`.

Agreement: 2/3 reviewers.

- **Option C (requires Docker on remote nodes):** If Docker is available on the remote nodes for deployment but you still want to execute benchmarks without containers, you can first use `sb deploy` to pull the image and prepare the container, then manually extract the container filesystem to the host and run subsequent `sb run --no-docker` commands against that host installation.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BLOCKER] (Correctness) Option C documents an unsupported "extract container filesystem to host" workflow

Issue (verified facts):

  • "Manually extract the container filesystem to the host" describes a workflow that does not exist in this repository. A grep for docker export, docker cp, extract.*container, container filesystem returns only this new doc.
  • sb deploy (superbench/runner/playbooks/deploy.yaml) creates and runs the sb-workspace container and keeps the rootfs inside it. There is no script, Makefile target, test, or other doc covering a rootfs extraction step.
  • Option C is internally contradictory with item Setup: Init - Initial setup.py and basic configs #4 ("for standard deployments, prefer sb deploy + sb run without --no-docker"): it sits inside a section titled "Using --no-docker on Remote Nodes" yet starts with "requires Docker on remote nodes".

Impact: Readers who try to follow Option C will hand-roll fragile docker export / docker cp / flows whose layouts will not match the SB_WORKSPACE / SB_MICRO_PATH expectations baked into superbench/runner/runner.py — exactly the failure class this PR is trying to prevent.

Recommendation: Delete Option C. If the underlying intent is "Docker is available but we cannot nest containers", point readers to the existing :::tip TIP block at lines 43–49 (privileged container + sb run --no-docker -l localhost). Only re-introduce a rootfs-extraction recipe once it is a supported, tested workflow shipped with the repo.

Agreement: 3/3 reviewers (severity 1 BLOCKER / 2 SHOULD-FIX, escalated to the highest per orchestrator policy).


3. **Environment configuration:** Ensure the `SB_MICRO_PATH` environment variable is set on each remote node so that it matches the on-host installation path of SuperBench micro-benchmark binaries when using `--no-docker`. Alternatively, you can set the config key `superbench.env.SB_MICRO_PATH` via `--config-override` so that SuperBench exports this environment variable for remote executions.

4. **Use case:** `--no-docker` is intended for environments where Docker-in-Docker or nested containers are not supported (e.g., certain Kubernetes setups, HPC clusters with restricted container runtimes). For standard deployments, prefer `sb deploy` + `sb run` without `--no-docker`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[NON-BLOCKING] (Maintainability) No back-link from run-superbench.md to cli.md

Issue: cli.md now links forward to this new section, but the new section never references cli.md. Other tutorial pages routinely back-link to the canonical flag table:

docs/user-tutorial/baseline-generation.md:17: ... [SuperBench CLI](../cli.md).
docs/user-tutorial/result-summary.md:17:   ... [SuperBench CLI](../cli.md).
docs/user-tutorial/data-diagnosis.md:17:   ... [SuperBench CLI](../cli.md).

Readers landing on the new section have no pointer back to --host-file, --host-list, --config-override (all referenced by name).

Recommendation: Add a one-liner at the top or bottom of the new section:

For the full list of flags accepted by `sb run`, see [SuperBench CLI](../cli.md#sb-run).

Agreement: 2/3 reviewers.

Loading