Skip to content

Fix isvc docs and add cli create isvc#108

Open
typhoonzero wants to merge 2 commits intomasterfrom
fix_isvc_docs_and_add_cli_create_isvc
Open

Fix isvc docs and add cli create isvc#108
typhoonzero wants to merge 2 commits intomasterfrom
fix_isvc_docs_and_add_cli_create_isvc

Conversation

@typhoonzero
Copy link
Contributor

@typhoonzero typhoonzero commented Feb 26, 2026

Summary by CodeRabbit

  • Documentation
    • Reworded scheduling guidance to focus on "Scheduling Inference Services based on the CUDA version" and clarified conditional parsing of CUDA-related labels for NVIDIA accelerators.
    • Added a new CLI guide for creating inference services with GPU affinity, including step‑by‑step instructions, example configurations, startup validation steps, and API testing examples.

@coderabbitai
Copy link

coderabbitai bot commented Feb 26, 2026

Walkthrough

Documentation edits: title and narrative changes removing "accurately" from CUDA scheduling guidance, added CUDA label parsing and nodeAffinity examples; plus a new CLI how-to that shows creating a KServe InferenceService (vLLM) with GPU checks, model-path resolution, startup script, YAML example, and test instructions.

Changes

Cohort / File(s) Summary
CUDA Scheduling Documentation
docs/en/model_inference/inference_service/how_to/accurately_schedule.mdx
Renamed title/subheaders to remove "accurately"; updated intro wording; added guidance to parse cpaas.io/accelerator-type and cpaas.io/cuda-version; added nodeAffinity YAML example showing CUDA runtime major/minor usage and note about k8s operator Gt limitation and "rt version minus one" workaround.
CLI InferenceService Guide (new)
docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx
New how-to documenting prerequisites and steps to create a KServe InferenceService via CLI; includes qwen-2-vllm.yaml example with GPU affinity, resource/runtime/storageUri/securityContext, startup shell script for GPU validation and GGUF vs dir model-path resolution, vLLM launch flags (e.g., chunked prefill, eager enforcement), kubectl apply/check, and OpenAI-compatible curl test examples.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • zhaomingkun1030

Poem

🐇 I hopped through docs with nimble feet,
Tucked "accurately" out where it couldn't compete,
Added CUDA labels, affinity delight,
A CLI guide to start vLLM at night,
Hooray for clearer paths and GPU light! 🎉

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title mentions fixing ISVC docs and adding CLI create ISVC, which aligns with the actual changes: documentation updates to the scheduling guide and a new CLI creation guide.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix_isvc_docs_and_add_cli_create_isvc

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx (1)

184-186: Minor grammar: "OpenAI-compatible" should be hyphenated.

When "OpenAI compatible" is used as a compound adjective modifying "API", it should be hyphenated.

📝 Suggested fix
-## Test the InferenceService using OpenAI Compatible API
+## Test the InferenceService using OpenAI-Compatible API

-Once the InferenceService is ready, you can test it using the OpenAI compatible API.
+Once the InferenceService is ready, you can test it using the OpenAI-compatible API.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx`
around lines 184 - 186, The heading "## Test the InferenceService using OpenAI
Compatible API" uses "OpenAI compatible" as a compound adjective and should be
hyphenated; update that heading (and any other occurrences in this file) to read
"OpenAI-compatible API" so the compound modifier is grammatically correct.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx`:
- Around line 161-165: The kubectl example shows a conflicting namespace: the
YAML already sets "namespace: mlops-demo-ai-test" but the command uses "-n
<your-namespace>"; pick one approach to avoid confusion—prefer keeping the
namespace inside the manifest: remove the "-n <your-namespace>" flag from the
example kubectl command `kubectl apply -f qwen-2-vllm.yaml -n <your-namespace>`
so the command becomes `kubectl apply -f qwen-2-vllm.yaml`, and add a short note
that the namespace is defined in the YAML via "namespace: mlops-demo-ai-test".
- Around line 125-126: Remove the stray "- bash" token that appears after the
script content (it becomes an unintended $@ in the script referenced on line
124); edit the CLI example so the container command remains `bash -c "<script>"`
without the extra `- bash`, or place any intended shell args correctly under the
env/args block so no stray `- bash` is passed to the command.

---

Nitpick comments:
In
`@docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx`:
- Around line 184-186: The heading "## Test the InferenceService using OpenAI
Compatible API" uses "OpenAI compatible" as a compound adjective and should be
hyphenated; update that heading (and any other occurrences in this file) to read
"OpenAI-compatible API" so the compound modifier is grammatically correct.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 498a273 and 801ca4b.

📒 Files selected for processing (2)
  • docs/en/model_inference/inference_service/how_to/accurately_schedule.mdx
  • docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx

Comment on lines 125 to 126
- bash
env:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Stray - bash argument appears misplaced.

Line 125 contains - bash after the script content ends. This would be passed as an additional argument to the container command, which is likely unintended. The command structure shows bash -c <script>, and this extra - bash would become $@ in the script (referenced on line 124).

🐛 Proposed fix
          ${PARAM_ENFORCE_EAGER} ${PARAM_ENABLE_CHUNKED_PREFILL} $@
-       - bash
        env:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- bash
env:
${PARAM_ENFORCE_EAGER} ${PARAM_ENABLE_CHUNKED_PREFILL} $@
env:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx`
around lines 125 - 126, Remove the stray "- bash" token that appears after the
script content (it becomes an unintended $@ in the script referenced on line
124); edit the CLI example so the container command remains `bash -c "<script>"`
without the extra `- bash`, or place any intended shell args correctly under the
env/args block so no stray `- bash` is passed to the command.

Comment on lines 161 to 165
2. Apply the YAML file to your cluster using `kubectl`:

```bash
kubectl apply -f qwen-2-vllm.yaml -n <your-namespace>
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Namespace specification is redundant and potentially confusing.

The YAML file already specifies namespace: mlops-demo-ai-test on line 47, but the kubectl command instructs users to provide -n <your-namespace>. This could confuse users about which namespace takes precedence.

Consider either:

  1. Removing the namespace from the YAML and keeping -n <your-namespace> in the command, or
  2. Keeping the namespace in the YAML and removing -n <your-namespace> from the command
📝 Option 1: Remove namespace flag from kubectl command
-   kubectl apply -f qwen-2-vllm.yaml -n <your-namespace>
+   kubectl apply -f qwen-2-vllm.yaml
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx`
around lines 161 - 165, The kubectl example shows a conflicting namespace: the
YAML already sets "namespace: mlops-demo-ai-test" but the command uses "-n
<your-namespace>"; pick one approach to avoid confusion—prefer keeping the
namespace inside the manifest: remove the "-n <your-namespace>" flag from the
example kubectl command `kubectl apply -f qwen-2-vllm.yaml -n <your-namespace>`
so the command becomes `kubectl apply -f qwen-2-vllm.yaml`, and add a short note
that the namespace is defined in the YAML via "namespace: mlops-demo-ai-test".

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx (1)

184-186: Minor: Use hyphen in compound adjective.

"OpenAI compatible" should be "OpenAI-compatible" when used as a compound adjective before "API".

✏️ Suggested fix
-## Test the InferenceService using OpenAI Compatible API
+## Test the InferenceService using OpenAI-Compatible API

-Once the InferenceService is ready, you can test it using the OpenAI compatible API.
+Once the InferenceService is ready, you can test it using the OpenAI-compatible API.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx`
around lines 184 - 186, Update the heading and any occurrences of the phrase
"OpenAI compatible" (e.g., the heading "## Test the InferenceService using
OpenAI Compatible API" in create_inference_service_cli.mdx) to use the
hyphenated compound adjective "OpenAI-compatible" so it reads "## Test the
InferenceService using OpenAI-compatible API" (and replace other instances
accordingly).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx`:
- Line 11: Fix the subject-verb agreement in the opening sentence by changing
"provide" to "provides" so it reads: "This document describes how to create
kserve inference services using command line, and provides sample YAMLs." Locate
and update that sentence in create_inference_service_cli.mdx (the first
paragraph/opening line).
- Line 77: Fix the typo in the echo message: replace the misspelled word
"aquired" with "acquired" in the string used when no GPUs are found (the echo
line: "No GPUs found. Please check if the container have aquired any GPU
device") so the message reads "No GPUs found. Please check if the container have
acquired any GPU device".
- Line 141: The YAML comment numbering is inconsistent: two comments are labeled
"4." — the one "Affinity settings for GPU" and the later "# 4. Required: set
resources used by this service"; change the later comment to "# 5. Required: set
resources used by this service" and also increment the subsequent comment at
"line 153" (the next numbered YAML comment) accordingly so all comment numbers
are sequential; update the text snippets "Required: set resources used by this
service" and the following comment to reflect the new numbering.

---

Nitpick comments:
In
`@docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx`:
- Around line 184-186: Update the heading and any occurrences of the phrase
"OpenAI compatible" (e.g., the heading "## Test the InferenceService using
OpenAI Compatible API" in create_inference_service_cli.mdx) to use the
hyphenated compound adjective "OpenAI-compatible" so it reads "## Test the
InferenceService using OpenAI-compatible API" (and replace other instances
accordingly).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 801ca4b and 41f3a23.

📒 Files selected for processing (1)
  • docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx


# Create Inference Service using CLI

This document describes how to create [kserve](https://kserve.github.io/website/) inference services using command line, and provide sample YAMLs.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor grammar issue: subject-verb agreement.

"provide" should be "provides" to agree with the singular subject "This document".

✏️ Suggested fix
-This document describes how to create [kserve](https://kserve.github.io/website/) inference services using command line, and provide sample YAMLs.
+This document describes how to create [kserve](https://kserve.github.io/website/) inference services using the command line, and provides sample YAMLs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx`
at line 11, Fix the subject-verb agreement in the opening sentence by changing
"provide" to "provides" so it reads: "This document describes how to create
kserve inference services using command line, and provides sample YAMLs." Locate
and update that sentence in create_inference_service_cli.mdx (the first
paragraph/opening line).

GPU_COUNT=$(python3 -c "import torch; print(torch.cuda.device_count())")
echo "Starting serving model name: ${MODEL_NAME}, num gpus: ${GPU_COUNT}"
if [ ${GPU_COUNT} -lt 1 ]; then
echo "No GPUs found. Please check if the container have aquired any GPU device"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo: "aquired" should be "acquired".

✏️ Suggested fix
-            echo "No GPUs found. Please check if the container have aquired any GPU device"
+            echo "No GPUs found. Please check if the container has acquired any GPU device"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo "No GPUs found. Please check if the container have aquired any GPU device"
echo "No GPUs found. Please check if the container has acquired any GPU device"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx`
at line 77, Fix the typo in the echo message: replace the misspelled word
"aquired" with "acquired" in the string used when no GPUs are found (the echo
line: "No GPUs found. Please check if the container have aquired any GPU
device") so the message reads "No GPUs found. Please check if the container have
acquired any GPU device".

name: transformers
name: ""
protocolVersion: v2
# 4. Required: set resources used by this service
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent comment numbering in the YAML.

There are two comments labeled "4." (line 50 for "Affinity settings for GPU" and line 141 for "set resources"). This should be renumbered for clarity.

✏️ Suggested fix
-      # 4. Required: set resources used by this service
+      # 5. Required: set resources used by this service
       resources:

And update line 153:

-      # 5. Required: select "ClusterServingRuntime" and the storageUri
+      # 6. Required: select "ClusterServingRuntime" and the storageUri
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# 4. Required: set resources used by this service
# 5. Required: set resources used by this service
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@docs/en/model_inference/inference_service/how_to/create_inference_service_cli.mdx`
at line 141, The YAML comment numbering is inconsistent: two comments are
labeled "4." — the one "Affinity settings for GPU" and the later "# 4. Required:
set resources used by this service"; change the later comment to "# 5. Required:
set resources used by this service" and also increment the subsequent comment at
"line 153" (the next numbered YAML comment) accordingly so all comment numbers
are sequential; update the text snippets "Required: set resources used by this
service" and the following comment to reflect the new numbering.

@cloudflare-workers-and-pages
Copy link

Deploying alauda-ai with  Cloudflare Pages  Cloudflare Pages

Latest commit: 41f3a23
Status: ✅  Deploy successful!
Preview URL: https://36c9392e.alauda-ai.pages.dev
Branch Preview URL: https://fix-isvc-docs-and-add-cli-cr.alauda-ai.pages.dev

View logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant