Fix isvc docs and add cli create isvc by typhoonzero · Pull Request #108 · alauda/aml-docs

typhoonzero · 2026-02-26T01:14:16Z

Summary by CodeRabbit

Documentation
- Reworded scheduling guidance to focus on "Scheduling Inference Services based on the CUDA version" and clarified conditional parsing of CUDA-related labels for NVIDIA accelerators.
- Added a new CLI guide for creating inference services with GPU affinity, including step‑by‑step instructions, example configurations, startup validation steps, and API testing examples.

coderabbitai · 2026-02-26T01:14:29Z

Walkthrough

Documentation edits: title and narrative changes removing "accurately" from CUDA scheduling guidance, added CUDA label parsing and nodeAffinity examples; plus a new CLI how-to that shows creating a KServe InferenceService (vLLM) with GPU checks, model-path resolution, startup script, YAML example, and test instructions.

Changes

Cohort / File(s)	Summary
CUDA Scheduling Documentation `docs/en/model_inference/inference_service/how_to/accurately_schedule.mdx`	Renamed title/subheaders to remove "accurately"; updated intro wording; added guidance to parse `cpaas.io/accelerator-type` and `cpaas.io/cuda-version`; added nodeAffinity YAML example showing CUDA runtime major/minor usage and note about k8s operator `Gt` limitation and "rt version minus one" workaround.
CLI InferenceService Guide (new) `docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx`	New how-to documenting prerequisites and steps to create a KServe InferenceService via CLI; includes `qwen-2-vllm.yaml` example with GPU affinity, resource/runtime/storageUri/securityContext, startup shell script for GPU validation and GGUF vs dir model-path resolution, vLLM launch flags (e.g., chunked prefill, eager enforcement), kubectl apply/check, and OpenAI-compatible curl test examples.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

fix: preferredDuringSchedulingIgnoredDuringExecution #44: Overlapping edits to nodeAffinity YAML structure and weight/preference nesting in the same scheduling doc.
feat: isvc schelude by cuda version #42: Related changes addressing CUDA label parsing and scheduling guidance in the same file.
improve isvc scheluder #46: Overlapping edits to CUDA-version scheduling and node-label parsing in the scheduling documentation.

Suggested reviewers

zhaomingkun1030

Poem

🐇 I hopped through docs with nimble feet,
Tucked "accurately" out where it couldn't compete,
Added CUDA labels, affinity delight,
A CLI guide to start vLLM at night,
Hooray for clearer paths and GPU light! 🎉

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title mentions fixing ISVC docs and adding CLI create ISVC, which aligns with the actual changes: documentation updates to the scheduling guide and a new CLI creation guide.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix_isvc_docs_and_add_cli_create_isvc

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx (1)

184-186: Minor grammar: "OpenAI-compatible" should be hyphenated.

When "OpenAI compatible" is used as a compound adjective modifying "API", it should be hyphenated.

📝 Suggested fix

-## Test the InferenceService using OpenAI Compatible API
+## Test the InferenceService using OpenAI-Compatible API

-Once the InferenceService is ready, you can test it using the OpenAI compatible API.
+Once the InferenceService is ready, you can test it using the OpenAI-compatible API.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx`
around lines 184 - 186, The heading "## Test the InferenceService using OpenAI
Compatible API" uses "OpenAI compatible" as a compound adjective and should be
hyphenated; update that heading (and any other occurrences in this file) to read
"OpenAI-compatible API" so the compound modifier is grammatically correct.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx`:
- Around line 161-165: The kubectl example shows a conflicting namespace: the
YAML already sets "namespace: mlops-demo-ai-test" but the command uses "-n
<your-namespace>"; pick one approach to avoid confusion—prefer keeping the
namespace inside the manifest: remove the "-n <your-namespace>" flag from the
example kubectl command `kubectl apply -f qwen-2-vllm.yaml -n <your-namespace>`
so the command becomes `kubectl apply -f qwen-2-vllm.yaml`, and add a short note
that the namespace is defined in the YAML via "namespace: mlops-demo-ai-test".
- Around line 125-126: Remove the stray "- bash" token that appears after the
script content (it becomes an unintended $@ in the script referenced on line
124); edit the CLI example so the container command remains `bash -c "<script>"`
without the extra `- bash`, or place any intended shell args correctly under the
env/args block so no stray `- bash` is passed to the command.

---

Nitpick comments:
In
`@docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx`:
- Around line 184-186: The heading "## Test the InferenceService using OpenAI
Compatible API" uses "OpenAI compatible" as a compound adjective and should be
hyphenated; update that heading (and any other occurrences in this file) to read
"OpenAI-compatible API" so the compound modifier is grammatically correct.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 498a273 and 801ca4b.

📒 Files selected for processing (2)

docs/en/model_inference/inference_service/how_to/accurately_schedule.mdx
docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx

coderabbitai · 2026-02-26T01:17:16Z

docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx

+        - bash
+        env:


⚠️ Potential issue | 🟠 Major

Stray - bash argument appears misplaced.

Line 125 contains - bash after the script content ends. This would be passed as an additional argument to the container command, which is likely unintended. The command structure shows bash -c <script>, and this extra - bash would become $@ in the script (referenced on line 124).

🐛 Proposed fix

${PARAM_ENFORCE_EAGER} ${PARAM_ENABLE_CHUNKED_PREFILL} $@ - - bash env:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- bash

env:

${PARAM_ENFORCE_EAGER} ${PARAM_ENABLE_CHUNKED_PREFILL} $@

env:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx` around lines 125 - 126, Remove the stray "- bash" token that appears after the script content (it becomes an unintended $@ in the script referenced on line 124); edit the CLI example so the container command remains `bash -c "<script>"` without the extra `- bash`, or place any intended shell args correctly under the env/args block so no stray `- bash` is passed to the command.

coderabbitai · 2026-02-26T01:17:16Z

docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx

+2. Apply the YAML file to your cluster using `kubectl`:
+
+   ```bash
+   kubectl apply -f qwen-2-vllm.yaml -n <your-namespace>
+   ```


⚠️ Potential issue | 🟡 Minor

Namespace specification is redundant and potentially confusing.

The YAML file already specifies namespace: mlops-demo-ai-test on line 47, but the kubectl command instructs users to provide -n <your-namespace>. This could confuse users about which namespace takes precedence.

Consider either:

Removing the namespace from the YAML and keeping -n <your-namespace> in the command, or

Keeping the namespace in the YAML and removing -n <your-namespace> from the command

📝 Option 1: Remove namespace flag from kubectl command

- kubectl apply -f qwen-2-vllm.yaml -n <your-namespace> + kubectl apply -f qwen-2-vllm.yaml

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx` around lines 161 - 165, The kubectl example shows a conflicting namespace: the YAML already sets "namespace: mlops-demo-ai-test" but the command uses "-n <your-namespace>"; pick one approach to avoid confusion—prefer keeping the namespace inside the manifest: remove the "-n <your-namespace>" flag from the example kubectl command `kubectl apply -f qwen-2-vllm.yaml -n <your-namespace>` so the command becomes `kubectl apply -f qwen-2-vllm.yaml`, and add a short note that the namespace is defined in the YAML via "namespace: mlops-demo-ai-test".

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx (1)

184-186: Minor: Use hyphen in compound adjective.

"OpenAI compatible" should be "OpenAI-compatible" when used as a compound adjective before "API".

✏️ Suggested fix

-## Test the InferenceService using OpenAI Compatible API
+## Test the InferenceService using OpenAI-Compatible API

-Once the InferenceService is ready, you can test it using the OpenAI compatible API.
+Once the InferenceService is ready, you can test it using the OpenAI-compatible API.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx`
around lines 184 - 186, Update the heading and any occurrences of the phrase
"OpenAI compatible" (e.g., the heading "## Test the InferenceService using
OpenAI Compatible API" in create_inference_service_cli.mdx) to use the
hyphenated compound adjective "OpenAI-compatible" so it reads "## Test the
InferenceService using OpenAI-compatible API" (and replace other instances
accordingly).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx`:
- Line 11: Fix the subject-verb agreement in the opening sentence by changing
"provide" to "provides" so it reads: "This document describes how to create
kserve inference services using command line, and provides sample YAMLs." Locate
and update that sentence in create_inference_service_cli.mdx (the first
paragraph/opening line).
- Line 77: Fix the typo in the echo message: replace the misspelled word
"aquired" with "acquired" in the string used when no GPUs are found (the echo
line: "No GPUs found. Please check if the container have aquired any GPU
device") so the message reads "No GPUs found. Please check if the container have
acquired any GPU device".
- Line 141: The YAML comment numbering is inconsistent: two comments are labeled
"4." — the one "Affinity settings for GPU" and the later "# 4. Required: set
resources used by this service"; change the later comment to "# 5. Required: set
resources used by this service" and also increment the subsequent comment at
"line 153" (the next numbered YAML comment) accordingly so all comment numbers
are sequential; update the text snippets "Required: set resources used by this
service" and the following comment to reflect the new numbering.

---

Nitpick comments:
In
`@docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx`:
- Around line 184-186: Update the heading and any occurrences of the phrase
"OpenAI compatible" (e.g., the heading "## Test the InferenceService using
OpenAI Compatible API" in create_inference_service_cli.mdx) to use the
hyphenated compound adjective "OpenAI-compatible" so it reads "## Test the
InferenceService using OpenAI-compatible API" (and replace other instances
accordingly).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 801ca4b and 41f3a23.

📒 Files selected for processing (1)

docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx

coderabbitai · 2026-02-26T01:56:47Z

docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx

+
+# Create Inference Service using CLI
+
+This document describes how to create [kserve](https://kserve.github.io/website/) inference services using command line, and provide sample YAMLs.


⚠️ Potential issue | 🟡 Minor

Minor grammar issue: subject-verb agreement.

"provide" should be "provides" to agree with the singular subject "This document".

✏️ Suggested fix

-This document describes how to create [kserve](https://kserve.github.io/website/) inference services using command line, and provide sample YAMLs. +This document describes how to create [kserve](https://kserve.github.io/website/) inference services using the command line, and provides sample YAMLs.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx` at line 11, Fix the subject-verb agreement in the opening sentence by changing "provide" to "provides" so it reads: "This document describes how to create kserve inference services using command line, and provides sample YAMLs." Locate and update that sentence in create_inference_service_cli.mdx (the first paragraph/opening line).

coderabbitai · 2026-02-26T01:56:47Z

docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx

+        GPU_COUNT=$(python3 -c "import torch; print(torch.cuda.device_count())")
+        echo "Starting serving model name: ${MODEL_NAME}, num gpus: ${GPU_COUNT}"
+        if [ ${GPU_COUNT} -lt 1 ]; then
+            echo "No GPUs found. Please check if the container have aquired any GPU device"


⚠️ Potential issue | 🟡 Minor

Typo: "aquired" should be "acquired".

✏️ Suggested fix

- echo "No GPUs found. Please check if the container have aquired any GPU device" + echo "No GPUs found. Please check if the container has acquired any GPU device"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

echo "No GPUs found. Please check if the container have aquired any GPU device"

echo "No GPUs found. Please check if the container has acquired any GPU device"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx` at line 77, Fix the typo in the echo message: replace the misspelled word "aquired" with "acquired" in the string used when no GPUs are found (the echo line: "No GPUs found. Please check if the container have aquired any GPU device") so the message reads "No GPUs found. Please check if the container have acquired any GPU device".

coderabbitai · 2026-02-26T01:56:48Z

docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx

+        name: transformers
+      name: ""
+      protocolVersion: v2
+      # 4. Required: set resources used by this service


⚠️ Potential issue | 🟡 Minor

Inconsistent comment numbering in the YAML.

There are two comments labeled "4." (line 50 for "Affinity settings for GPU" and line 141 for "set resources"). This should be renumbered for clarity.

✏️ Suggested fix

- # 4. Required: set resources used by this service + # 5. Required: set resources used by this service resources:

And update line 153:

- # 5. Required: select "ClusterServingRuntime" and the storageUri + # 6. Required: select "ClusterServingRuntime" and the storageUri

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# 4. Required: set resources used by this service

# 5. Required: set resources used by this service

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx` at line 141, The YAML comment numbering is inconsistent: two comments are labeled "4." — the one "Affinity settings for GPU" and the later "# 4. Required: set resources used by this service"; change the later comment to "# 5. Required: set resources used by this service" and also increment the subsequent comment at "line 153" (the next numbered YAML comment) accordingly so all comment numbers are sequential; update the text snippets "Required: set resources used by this service" and the following comment to reflect the new numbering.

cloudflare-workers-and-pages · 2026-02-26T01:59:10Z

Deploying alauda-ai with Cloudflare Pages

Latest commit:	`41f3a23`
Status:	✅ Deploy successful!
Preview URL:	https://36c9392e.alauda-ai.pages.dev
Branch Preview URL:	https://fix-isvc-docs-and-add-cli-cr.alauda-ai.pages.dev

View logs

fix isvc docs and add cli create isvc

801ca4b

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

fix format

41f3a23

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix isvc docs and add cli create isvc#108

Fix isvc docs and add cli create isvc#108
typhoonzero wants to merge 2 commits intomasterfrom
fix_isvc_docs_and_add_cli_create_isvc

typhoonzero commented Feb 26, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 26, 2026

Uh oh!

coderabbitai bot Feb 26, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 26, 2026

Uh oh!

coderabbitai bot Feb 26, 2026

Uh oh!

coderabbitai bot Feb 26, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-        - bash
-        env:
+          ${PARAM_ENFORCE_EAGER} ${PARAM_ENABLE_CHUNKED_PREFILL} $@
+        env:


		# Create Inference Service using CLI

		This document describes how to create [kserve](https://kserve.github.io/website/) inference services using command line, and provide sample YAMLs.

	echo "No GPUs found. Please check if the container have aquired any GPU device"
	echo "No GPUs found. Please check if the container has acquired any GPU device"

	# 4. Required: set resources used by this service
	# 5. Required: set resources used by this service

Conversation

typhoonzero commented Feb 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

cloudflare-workers-and-pages bot commented Feb 26, 2026

Deploying alauda-ai with Cloudflare Pages

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

typhoonzero commented Feb 26, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 26, 2026 •

edited

Loading