docs: add coding-agents + MLOps how-tos for on-prem LLMs by typhoonzero · Pull Request #245 · alauda/aml-docs

typhoonzero · 2026-05-29T03:38:05Z

Summary

Adds two stacked how-tos under model_inference/inference_service/how_to, building on each other:

1. `coding_agents_with_inference_service.mdx` — Use Coding Agents with On-Premise Inference Services

Connect terminal coding agents to a self-hosted, OpenAI-compatible InferenceService so source code and prompts never leave the cluster:

Builds on and links to the existing Create Inference Service using CLI and Configure External Access how-tos rather than repeating them.
Adds a curl smoke test for validating the endpoint (base URL / model name / API key).
Shows how to enable tool (function) calling on the vLLM runtime — the prerequisite that lets agents actually edit files and run commands.
Per-agent connection config for opencode, Codex CLI, and Claude Code (the latter via an Anthropic→OpenAI translation proxy such as LiteLLM or claude-code-router).
Best practices: a GPU-memory→model-size table, performance tuning for agent traffic (prefix caching, GPU memory utilization, chunked prefill, CUDA graphs, quantization, autoscaling/cold-start trade-offs), and getting-started guidance for vibe coding and MLOps workflows.

2. `mlops_with_coding_agents.mdx` — Run MLOps with Coding Agents and On-Premise LLMs

Once the agent is wired up, the same agent drives day-to-day MLOps:

Manage InferenceService / LLMInferenceService — agent loop (draft → kubectl apply --dry-run=server → apply → poll → smoke-test) with concrete starter prompts.
Configure Envoy AI Gateway — auth and rate limits via AIGatewayRoute, AIServiceBackend, BackendSecurityPolicy, SecurityPolicy, and BackendTrafficPolicy. Cross-links the existing envoy_ai_gateway intro/install docs.
Agent-driven performance tuning loop — five steps: SLOs → reproducible benchmark → one change per iteration → measure → stop on SLO or hardware ceiling. Cross-links into doc 1's tuning section instead of duplicating the flag list.
Fine-tuning plans and reports — a tool-selection table (Notebook / Training Hub / Kubeflow Trainer v2 / LLM Compressor) plus two ready-to-commit markdown templates: a pre-run plan and a post-run report designed for the agent to fill in from MLflow runs, training logs, and eval outputs.
Adds a "daily MLOps loop" walk-through and guardrails (read-only first, server-side dry-run by default, one change per iteration, no fabricated metrics, no hosted-provider fallback).

Test plan

doom lint passes on both files (0 errors, 0 warnings)
Pre-commit yarn lint passes
All internal cross-links to existing docs (inference_service how-tos and troubleshooting, envoy_ai_gateway, kubeflow, workbench, llm-compressor, infrastructure_management/hardware_profile) resolve
Reviewer: verify rendered pages and the two markdown templates render cleanly inside fenced code blocks

🤖 Generated with Claude Code

Summary by CodeRabbit

Documentation
- Added comprehensive guide for using terminal-based coding agents with self-hosted inference service instances, including setup instructions, agent configuration, and performance tuning tips.
- Added documentation covering MLOps workflows powered by coding agents on on-premise infrastructure, with guidance on resource management, performance optimization, and operational best practices.

coderabbitai · 2026-05-29T03:38:16Z

Walkthrough

Adds two new how‑to docs: one teaching terminal coding agents (opencode, Codex CLI, Claude Code via proxy) against on‑prem InferenceService with setup, configs, vLLM tuning, and troubleshooting; the other describing MLOps workflows (gateway CRDs, benchmarking loops, fine‑tuning plans, and daily agent operations/guardrails).

Changes

Coding Agents On-Premise Integration Guide

Layer / File(s)	Summary
Page metadata and prerequisites `docs/en/model_inference/inference_service/how_to/coding_agents_with_inference_service.mdx`	Frontmatter, localization, intro framing, warning about evolving agent config formats, and prerequisites (OpenAI-compatible endpoint, network access, tool-calling parser, agent CLI).
Integration architecture and endpoint setup `docs/en/model_inference/inference_service/how_to/coding_agents_with_inference_service.mdx`	End-to-end integration flow diagram, per-agent API compatibility mapping, Step 1 curl smoke-test, and Step 2 vLLM tool-calling enablement with parser/chat-template guidance.
Agent-specific integrations `docs/en/model_inference/inference_service/how_to/coding_agents_with_inference_service.mdx`	Configuration details for opencode (env var wiring, model key), Codex CLI (`config.toml`, base URL, `wire_api = "chat"`), and Claude Code (Anthropic→OpenAI proxy options with env var routing for models).
Model selection and vLLM performance tuning `docs/en/model_inference/inference_service/how_to/coding_agents_with_inference_service.mdx`	Best-practice model selection (GPU memory, parser/tool-calling, context budgeting, quantization) and vLLM tuning recommendations (prefix caching, KV cache sizing, chunked prefill, CUDA graphs, batching, dtype, tensor parallelism, speculative decoding, autoscaling, timeouts) plus vibe-coding tips and MLOps starter tasks.
Troubleshooting and references `docs/en/model_inference/inference_service/how_to/coding_agents_with_inference_service.mdx`	Troubleshooting checklist mapping symptoms to setup areas and curated references to deployment, vLLM tool-calling, and agent/proxy docs.

MLOps with Coding Agents

Layer / File(s)	Summary
Page frontmatter and top-level guidance `docs/en/model_inference/inference_service/how_to/mlops_with_coding_agents.mdx`	Frontmatter, scoped operational safety warning, and framing of four supported MLOps workflows for agent-driven operations.
Gateway CRDs and performance tuning loop `docs/en/model_inference/inference_service/how_to/mlops_with_coding_agents.mdx`	Agent-managed Envoy AI Gateway CRD workflow (auth, rate-limiting) with smoke-test steps and a reproducible benchmark-and-iterate performance tuning loop with SLO-style objectives.
Fine-tuning planning and reporting templates `docs/en/model_inference/inference_service/how_to/mlops_with_coding_agents.mdx`	Reusable fine-tuning job plan and report templates, guidance on capturing training/eval artifacts, example prompts, and fields to mark as TODO when data is missing.
Daily MLOps loop and guardrails `docs/en/model_inference/inference_service/how_to/mlops_with_coding_agents.mdx`	Daily end-to-end agent loop, guardrails (read-only first, --dry-run=server, prevent metric fabrication, keep ops on-prem), and references to prerequisite docs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 In a burrow of docs I nibble and write,
Agents and gateways snug under soft light.
curl, parsers, and proxies all lined in a row—
Deploy, test, tune, and watch the models glow.
hops off to deploy with a carrot-sized CLI ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main addition: two how-to documentation pages for using coding agents with on-premises LLMs.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docs/coding-agents-onprem-llm

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@docs/en/model_inference/inference_service/how_to/coding_agents_with_inference_service.mdx`:
- Line 69: Fix the broken heading anchors by removing the backslash-escaped
braces and using a supported ID or plain auto-slug: replace "## Step 2: Enable
tool calling on the runtime \{`#enable-tool-calling-on-the-runtime`}" (and the
similar heading at the other location) with either a plain heading "## Step 2:
Enable tool calling on the runtime" (rely on auto-generated slug) or an
unescaped ID form "## Step 2: Enable tool calling on the runtime
{`#enable-tool-calling-on-the-runtime`}" so in-page links like
(`#enable-tool-calling-on-the-runtime`) and (`#claude-code`) resolve correctly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9456e8cf-88c7-4032-862a-69f099664a18

📥 Commits

Reviewing files that changed from the base of the PR and between fff230c and 21fc546.

📒 Files selected for processing (1)

docs/en/model_inference/inference_service/how_to/coding_agents_with_inference_service.mdx

coderabbitai · 2026-05-29T03:41:03Z

+
+A normal JSON completion confirms the endpoint is reachable and the model name is correct. Note the three values you will reuse for every agent: **base URL** (ending in `/v1`), **model name** (the `--served-model-name`), and **API key**.
+
+## Step 2: Enable tool calling on the runtime \{#enable-tool-calling-on-the-runtime}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix heading anchor syntax to avoid broken in-page links.

The escaped \{#...} likely won’t create the intended IDs, so links like (#enable-tool-calling-on-the-runtime) and (#claude-code) can break. Use plain heading text (auto-slug) or unescaped supported ID syntax for your MDX flavor.

Also applies to: 133-133

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/en/model_inference/inference_service/how_to/coding_agents_with_inference_service.mdx` at line 69, Fix the broken heading anchors by removing the backslash-escaped braces and using a supported ID or plain auto-slug: replace "## Step 2: Enable tool calling on the runtime \{`#enable-tool-calling-on-the-runtime`}" (and the similar heading at the other location) with either a plain heading "## Step 2: Enable tool calling on the runtime" (rely on auto-generated slug) or an unescaped ID form "## Step 2: Enable tool calling on the runtime {`#enable-tool-calling-on-the-runtime`}" so in-page links like (`#enable-tool-calling-on-the-runtime`) and (`#claude-code`) resolve correctly.

cloudflare-workers-and-pages · 2026-05-29T03:47:06Z

Deploying alauda-ai with Cloudflare Pages

Latest commit:	`20ee4d9`
Status:	✅ Deploy successful!
Preview URL:	https://c1ef65e4.alauda-ai.pages.dev
Branch Preview URL:	https://docs-coding-agents-onprem-ll.alauda-ai.pages.dev

View logs

Explains how to point opencode, Codex CLI, and Claude Code at a self-hosted OpenAI-compatible InferenceService, building on the existing deploy and external-access how-tos. Covers enabling tool calling on the vLLM runtime, plus best practices for performance tuning, matching a model to available GPU memory, and getting started with vibe coding and MLOps workflows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Follow-on to the on-prem coding agent guide. Covers four agent-driven MLOps workflows: managing InferenceService and LLMInferenceService resources, configuring authentication and rate limiting on Envoy AI Gateway, an iterative agent-driven performance tuning loop, and reusable templates for fine-tuning plans and post-run reports. Links to the existing fine-tuning paths (Workbench Notebook, Training Hub, Kubeflow Trainer v2, LLM Compressor) and to the Envoy AI Gateway install doc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Restructures the Claude Code subsection into two options: - Option A — point Claude Code directly at an on-prem endpoint that speaks the Anthropic Messages API (native runner or gateway), using ANTHROPIC_BASE_URL + ANTHROPIC_MODEL plus the CLAUDE_CODE_* flags that keep the session on-premise (disable non-essential traffic, 1M context, attribution header, telemetry; cap MAX_OUTPUT_TOKENS). - Option B — keep the existing LiteLLM / claude-code-router path for OpenAI-compatible-only endpoints. The direct-env approach avoids a separate proxy when the endpoint already accepts Claude Code traffic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

docs/en/model_inference/inference_service/how_to/coding_agents_with_inference_service.mdx (1)

161-161: 💤 Low value

Consider rewording for clarity.

The phrase "very large outputs" could be more precise. Consider "excessively large outputs" or "outputs larger than the on-prem model supports" for better clarity.

Suggested rewording

-- The `CLAUDE_CODE_DISABLE_*` and `CLAUDE_CODE_*=0` flags are what actually keep an "on-prem" setup on-prem: without them, Claude Code can still emit non-essential requests to Anthropic-hosted endpoints and ask the model for features (1M context, very large outputs) the on-prem model cannot honor.
+- The `CLAUDE_CODE_DISABLE_*` and `CLAUDE_CODE_*=0` flags are what actually keep an "on-prem" setup on-prem: without them, Claude Code can still emit non-essential requests to Anthropic-hosted endpoints and ask the model for features (1M context, excessively large outputs) the on-prem model cannot honor.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@docs/en/model_inference/inference_service/how_to/coding_agents_with_inference_service.mdx`
at line 161, Update the sentence that mentions "very large outputs" to a clearer
phrase: replace that fragment in the string referencing the flags
CLAUDE_CODE_DISABLE_* and CLAUDE_CODE_*=0 with either "excessively large
outputs" or "outputs larger than the on-prem model supports" so the intent is
explicit that outputs may exceed on‑prem model capacity.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@docs/en/model_inference/inference_service/how_to/coding_agents_with_inference_service.mdx`:
- Around line 148-152: Update the inline comments to match documented Claude
Code env var semantics: change the comment for
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 to indicate it disables
telemetry/auto-updater/feedback/error reporting; change
CLAUDE_CODE_ATTRIBUTION_HEADER=0 to say it removes the system-prompt attribution
block; change CLAUDE_CODE_DISABLE_1M_CONTEXT=1 to state it disables the
1M‑context model variants in the picker; and change
CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000 to note it sets the max output-tokens cap.

In
`@docs/en/model_inference/inference_service/how_to/mlops_with_coding_agents.mdx`:
- Line 266: The markdown link text "vLLM benchmarking" currently points to
https://docs.vllm.ai/en/latest/serving/usage_stats.html which is the usage-stats
page; update the link so the label and URL match by either changing the URL to
the official benchmarking doc (for example the benchmarking CLI page under vLLM
docs, e.g., .../benchmarking/cli/) or change the link label to "vLLM usage
stats" to reflect the existing URL; locate the markdown anchor "[vLLM
benchmarking](https://docs.vllm.ai/en/latest/serving/usage_stats.html)" in the
file and replace the URL or label accordingly.

---

Nitpick comments:
In
`@docs/en/model_inference/inference_service/how_to/coding_agents_with_inference_service.mdx`:
- Line 161: Update the sentence that mentions "very large outputs" to a clearer
phrase: replace that fragment in the string referencing the flags
CLAUDE_CODE_DISABLE_* and CLAUDE_CODE_*=0 with either "excessively large
outputs" or "outputs larger than the on-prem model supports" so the intent is
explicit that outputs may exceed on‑prem model capacity.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 493f1a30-6237-4b14-b5b1-41e17c39e61f

📥 Commits

Reviewing files that changed from the base of the PR and between 21fc546 and 20ee4d9.

📒 Files selected for processing (2)

docs/en/model_inference/inference_service/how_to/coding_agents_with_inference_service.mdx
docs/en/model_inference/inference_service/how_to/mlops_with_coding_agents.mdx

coderabbitai · 2026-05-29T06:36:40Z

+export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1      # suppress optional traffic to Anthropic-hosted services
+export CLAUDE_CODE_ATTRIBUTION_HEADER=0                # drop the Anthropic attribution header
+export CLAUDE_CODE_ENABLE_TELEMETRY=0                  # disable telemetry
+export CLAUDE_CODE_DISABLE_1M_CONTEXT=1                # disable the 1M-context feature; most on-prem models can't serve it
+export CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000             # cap to what the on-prem model and runtime support


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

What environment variables does Claude Code support for disabling telemetry and controlling traffic to Anthropic services?

💡 Result:

Claude Code supports several environment variables to manage telemetry and control network traffic to Anthropic services [1][2]. Telemetry and Non-Essential Traffic Controls To opt out of telemetry and reduce non-essential network traffic, the following environment variables can be set to 1 [1][3]: DISABLE_TELEMETRY: Opts out of telemetry. This disables Statsig-based telemetry event logging and feature-flag fetching [1][4][5]. DO_NOT_TRACK: Equivalent to DISABLE_TELEMETRY; Claude Code recognizes this cross-tool convention [1]. CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC: A comprehensive setting that is equivalent to simultaneously setting DISABLE_TELEMETRY, DISABLE_AUTOUPDATER, DISABLE_FEEDBACK_COMMAND, and DISABLE_ERROR_REPORTING [1]. DISABLE_GROWTHBOOK: Specifically disables the fetching of remote feature flags, causing the application to use built-in code defaults instead [1][4]. CLAUDE_CODE_DISABLE_FEEDBACK_SURVEY: Disables session quality surveys [1]. Managing Telemetry and Observability While the above variables disable Anthropic’s built-in telemetry, Claude Code also supports user-managed OpenTelemetry (OTel) for custom observability [1][5]. CLAUDE_CODE_ENABLE_TELEMETRY: Set to 1 to enable OpenTelemetry data collection [1][6]. This is required before configuring custom OTel exporters (e.g., using standard OTEL_EXPORTER_OTLP_* variables) [1][6]. Controlling Traffic to Anthropic Services Beyond disabling telemetry, you can control how and where Claude Code sends traffic to Anthropic services [7][2]: ANTHROPIC_BASE_URL: Overrides the default Anthropic API base URL, allowing you to route traffic through an alternative gateway or proxy [8][2]. ANTHROPIC_API_KEY: Used for authentication; setting this to an empty value is sometimes used to force authentication fallbacks (e.g., via alternative tokens or proxies) [8][2]. ANTHROPIC_UNIX_SOCKET: Configures the client to connect to the Anthropic API via a Unix domain socket instead of a standard network connection [2]. Configuration Methods These environment variables can be applied at the process level (e.g., in your shell profile) or via the env field in your.claude/settings.json file [9][2][10]. Note that environment variables set in settings.json apply to every session and subprocess spawned by Claude Code [9][10]. Top Results: [1], [4], [3], [7], [2]

Citations:

1: https://code.claude.com/docs/en/env-vars

2: https://sanbuphy-claude-code-source-code.mintlify.app/configuration/environment-variables

3: https://gist.github.com/jedisct1/9627644cda1c3929affe9b1ce8eaf714

4: [DOCS] Document feature-flag and bootstrap cache behavior when opt-out variables are enabled anthropics/claude-code#62382

5: [DOCS] Telemetry Configuration Ambiguity: Confusion between Anthropic internal metrics (Statsig) and User-managed OpenTelemetry anthropics/claude-code#19117

6: https://www.mintlify.com/VineeTagarwaL-code/claude-code/configuration/environment-variables

7: https://code.claude.com/docs/en/corporate-proxy

8: https://www.blogarama.com/internet-blogs/1309494-tech-basket-blog/76355821-use-claude-code-openrouter-free-models-may-2026

9: https://code.claude.com/docs/en/settings

10: https://code.claude.com/docs/en/configuration

Align the snippet’s inline comments with documented Claude Code env var behavior.

CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 isn’t just “suppress optional traffic”; it’s documented as equivalent to disabling telemetry/auto-updater/feedback/error reporting.

CLAUDE_CODE_ATTRIBUTION_HEADER=0 removes the system-prompt attribution block (not a generic “Anthropic attribution header”).

CLAUDE_CODE_DISABLE_1M_CONTEXT=1 disables the 1M-context model variants shown in the picker.

CLAUDE_CODE_MAX_OUTPUT_TOKENS is the max output-tokens cap (not “on-prem model/runtime support”).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/en/model_inference/inference_service/how_to/coding_agents_with_inference_service.mdx` around lines 148 - 152, Update the inline comments to match documented Claude Code env var semantics: change the comment for CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 to indicate it disables telemetry/auto-updater/feedback/error reporting; change CLAUDE_CODE_ATTRIBUTION_HEADER=0 to say it removes the system-prompt attribution block; change CLAUDE_CODE_DISABLE_1M_CONTEXT=1 to state it disables the 1M‑context model variants in the picker; and change CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000 to note it sets the max output-tokens cap.

coderabbitai · 2026-05-29T06:36:40Z

+- [Envoy AI Gateway upstream documentation](https://aigateway.envoyproxy.io/)
+- [Envoy Gateway upstream documentation](https://gateway.envoyproxy.io/)
+- [KServe LLMInferenceService](https://kserve.github.io/website/)
+- [vLLM benchmarking](https://docs.vllm.ai/en/latest/serving/usage_stats.html)


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

What is the current official vLLM documentation page for benchmarking (e.g., vllm bench serve), and does https://docs.vllm.ai/en/latest/serving/usage_stats.html` document benchmarking?`

💡 Result:

The official vLLM documentation for benchmarking is located at https://docs.vllm.ai/en/latest/benchmarking/cli/ [1]. This page serves as the primary guide for running benchmark tests and includes information on the vllm bench CLI tools [1][2]. Specifically, detailed CLI reference documentation for the vllm bench serve command can be found at https://docs.vllm.ai/en/stable/cli/bench/serve/ [3]. Regarding your second question, the page https://docs.vllm.ai/en/latest/serving/usage_stats.html does not document benchmarking [4][5]. Instead, that page (or its equivalent in other versions) documents vLLM's anonymous usage data collection, which includes information on what data is collected, how to preview it, and how to opt out of the collection [4][5]. Benchmarking is handled under separate sections dedicated to performance testing and evaluation [1][2].

Citations:

1: https://docs.vllm.ai/en/latest/benchmarking/cli/

2: https://docs.vllm.ai/en/v0.20.1/benchmarking/

3: https://docs.vllm.ai/en/stable/cli/bench/serve/

4: https://docs.vllm.ai/en/v0.18.2/usage/usage_stats/

5: https://docs.vllm.ai/en/v0.13.0/usage/usage_stats/

Fix vLLM link label/URL mismatch.

“vLLM benchmarking” currently links to https://docs.vllm.ai/en/latest/serving/usage_stats.html, which is about anonymous “usage stats” collection (not benchmarking). Update the link URL to the official benchmarking docs (e.g., .../benchmarking/cli/) or rename the label to match the usage-stats page.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/en/model_inference/inference_service/how_to/mlops_with_coding_agents.mdx` at line 266, The markdown link text "vLLM benchmarking" currently points to https://docs.vllm.ai/en/latest/serving/usage_stats.html which is the usage-stats page; update the link so the label and URL match by either changing the URL to the official benchmarking doc (for example the benchmarking CLI page under vLLM docs, e.g., .../benchmarking/cli/) or change the link label to "vLLM usage stats" to reflect the existing URL; locate the markdown anchor "[vLLM benchmarking](https://docs.vllm.ai/en/latest/serving/usage_stats.html)" in the file and replace the URL or label accordingly.

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

typhoonzero force-pushed the docs/coding-agents-onprem-llm branch from 21fc546 to 336c3e3 Compare May 29, 2026 03:58

typhoonzero mentioned this pull request May 29, 2026

docs: add MLOps-with-coding-agents how-to #246

Merged

4 tasks

typhoonzero changed the title ~~docs: add guide for using coding agents with on-prem inference services~~ docs: add coding-agents + MLOps how-tos for on-prem LLMs May 29, 2026

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add coding-agents + MLOps how-tos for on-prem LLMs#245

docs: add coding-agents + MLOps how-tos for on-prem LLMs#245
typhoonzero wants to merge 3 commits into
masterfrom
docs/coding-agents-onprem-llm

typhoonzero commented May 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 29, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 29, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented May 29, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 29, 2026

Uh oh!

coderabbitai Bot May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		A normal JSON completion confirms the endpoint is reachable and the model name is correct. Note the three values you will reuse for every agent: base URL (ending in `/v1`), model name (the `--served-model-name`), and API key.

		## Step 2: Enable tool calling on the runtime \{#enable-tool-calling-on-the-runtime}

Conversation

typhoonzero commented May 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. coding_agents_with_inference_service.mdx — Use Coding Agents with On-Premise Inference Services

2. mlops_with_coding_agents.mdx — Run MLOps with Coding Agents and On-Premise LLMs

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

cloudflare-workers-and-pages Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying alauda-ai with Cloudflare Pages

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

typhoonzero commented May 29, 2026 •

edited by coderabbitai Bot

Loading

1. `coding_agents_with_inference_service.mdx` — Use Coding Agents with On-Premise Inference Services

2. `mlops_with_coding_agents.mdx` — Run MLOps with Coding Agents and On-Premise LLMs

coderabbitai Bot commented May 29, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented May 29, 2026 •

edited

Loading