docs(envoy-ai-gateway): add how-to guides for MaaS governance by sinbadonline · Pull Request #238 · alauda/aml-docs

sinbadonline · 2026-05-27T03:27:06Z

Add four how-to guides under docs/en/envoy_ai_gateway/how_to/, plus an index, and surface them from intro.mdx. Register the costmanagement external site in sites.yaml (referenced by the metering guide).

identity_authentication.mdx — SecurityPolicy with OIDC/JWT or API key, mapping claims to identity headers (x-user-id / x-user-group).
token_rate_limiting.mdx — response-cost token quota via llmRequestCosts
- a Global BackendTrafficPolicy backed by Redis.
usage_metering.mdx — OpenTelemetry gen_ai_client_token_usage_token scraped from the ExtProc sidecar (port 1064) with a PodMonitor, plus per-department labelling via metricsRequestHeaderAttributes.
external_provider_routing.mdx — BackendSecurityPolicy credential injection + AIServiceBackend, with priority-based provider failover.

All four guides were verified end-to-end on an ACP cluster (EAIG v0.4.2, Envoy Gateway v1.5.4).

Summary by CodeRabbit

Documentation
- Added Envoy AI Gateway how-to guides: identity authentication (OIDC/JWT and API-key flows), token-based rate limiting with Redis-backed quotas, usage metering with Prometheus metrics and dashboards, external LLM provider routing with model-based routing and priority failover, a How-To index, and deployment guidance with verification steps.
Chores
- Added cost management site configuration.

Add four how-to guides under docs/en/envoy_ai_gateway/how_to/, plus an index, and surface them from intro.mdx. Register the costmanagement external site in sites.yaml (referenced by the metering guide). - identity_authentication.mdx — SecurityPolicy with OIDC/JWT or API key, mapping claims to identity headers (x-user-id / x-user-group). - token_rate_limiting.mdx — response-cost token quota via llmRequestCosts + a Global BackendTrafficPolicy backed by Redis. - usage_metering.mdx — OpenTelemetry gen_ai_client_token_usage_token scraped from the ExtProc sidecar (port 1064) with a PodMonitor, plus per-department labelling via metricsRequestHeaderAttributes. - external_provider_routing.mdx — BackendSecurityPolicy credential injection + AIServiceBackend, with priority-based provider failover. All four guides were verified end-to-end on an ACP cluster (EAIG v0.4.2, Envoy Gateway v1.5.4). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-27T03:27:19Z

Warning

Review limit reached

@sinbadonline, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 42 minutes and 6 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a84cbae5-a8a3-44cf-928d-da4d119ab136

📥 Commits

Reviewing files that changed from the base of the PR and between 25583d1 and ff7303d.

📒 Files selected for processing (1)

docs/en/envoy_ai_gateway/how_to/usage_metering.mdx

Walkthrough

This PR adds comprehensive how-to documentation for Envoy AI Gateway's multi-tenant model serving capabilities, including guides for edge authentication, token rate limiting, usage metering, and external provider routing, plus navigation updates and site configuration.

Changes

Multi-tenant AI Gateway How-To Guides

Layer / File(s)	Summary
How-To landing, intro, and site config `docs/en/envoy_ai_gateway/how_to/index.mdx`, `docs/en/envoy_ai_gateway/intro.mdx`, `sites.yaml`	Adds a how-to landing page, inserts a new "Guides" section in the main introduction, and registers the `costmanagement` site entry.
Identity Authentication Guide `docs/en/envoy_ai_gateway/how_to/identity_authentication.mdx`	Documents OIDC/JWT and API key edge authentication, Dex claim-to-header mappings, optional rollout guidance, verification steps, and links to related docs.
Token Rate Limiting Guide `docs/en/envoy_ai_gateway/how_to/token_rate_limiting.mdx`	Explains Redis-backed global rate limit bootstrap, capturing token usage via `llmRequestCosts`, enforcing per-identity token budgets with `BackendTrafficPolicy`, verification, and next-step links.
Usage Metering Guide `docs/en/envoy_ai_gateway/how_to/usage_metering.mdx`	Describes exporting token usage metrics enriched by identity labels, scraping ExtProc metrics via `PodMonitor`, building dashboards, chargeback integration, and verification steps.
External Provider Routing Guide `docs/en/envoy_ai_gateway/how_to/external_provider_routing.mdx`	Shows how to inject upstream provider credentials (`BackendSecurityPolicy`), register `AIServiceBackend` entries, and route OpenAI-compatible requests to external LLMs with model-based priority failover and curl verification.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Four guides now bloom in the garden so bright,
Auth, quotas, and metrics—all working just right!
With token counts flowing and providers aligned,
Multi-tenant AI routing, neatly defined!
Hop forth, dear reviewer—I'll nibble carrots while you review! 🥕✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding how-to guides for Envoy AI Gateway Model-as-a-Service (MaaS) governance, which is the primary focus of the PR.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/envoy-ai-gateway-how-to-docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

docs/en/envoy_ai_gateway/how_to/identity_authentication.mdx (1)
17-17: 💤 Low value

Consider using "who" for people references.

The phrase "machine consumer that cannot" would read more naturally as "machine consumer who cannot" since it refers to a consumer entity (which can be a person or service acting on behalf of people).
✏️ Suggested wording
-- A machine consumer that cannot run an interactive login presents a static API key that maps to a known tenant.
+- A machine consumer who cannot run an interactive login presents a static API key that maps to a known tenant.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/en/envoy_ai_gateway/how_to/identity_authentication.mdx` at line 17,
Replace "A machine consumer that cannot run an interactive login presents a
static API key that maps to a known tenant." with wording that uses "who" for
the consumer reference; e.g., change the phrase "machine consumer that cannot"
to "machine consumer who cannot" so the sentence reads naturally while
preserving the rest of the text.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/en/envoy_ai_gateway/how_to/usage_metering.mdx`:
- Around line 73-75: Change the fenced PromQL code block language identifier
from "text" to "bash" so the query snippet starting with "sum by (department)
(increase(gen_ai_client_token_usage_token_sum[1h]))" renders correctly in
rspress; locate the triple-backtick block containing that PromQL query and
replace the opening ```text with ```bash.

---

Nitpick comments:
In `@docs/en/envoy_ai_gateway/how_to/identity_authentication.mdx`:
- Line 17: Replace "A machine consumer that cannot run an interactive login
presents a static API key that maps to a known tenant." with wording that uses
"who" for the consumer reference; e.g., change the phrase "machine consumer that
cannot" to "machine consumer who cannot" so the sentence reads naturally while
preserving the rest of the text.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7492fb47-5475-4f3e-a493-d7d02054ebc2

📥 Commits

Reviewing files that changed from the base of the PR and between df87e24 and 4c1686b.

📒 Files selected for processing (7)

docs/en/envoy_ai_gateway/how_to/external_provider_routing.mdx
docs/en/envoy_ai_gateway/how_to/identity_authentication.mdx
docs/en/envoy_ai_gateway/how_to/index.mdx
docs/en/envoy_ai_gateway/how_to/token_rate_limiting.mdx
docs/en/envoy_ai_gateway/how_to/usage_metering.mdx
docs/en/envoy_ai_gateway/intro.mdx
sites.yaml

cloudflare-workers-and-pages · 2026-05-27T03:34:29Z

Deploying alauda-ai with Cloudflare Pages

Latest commit:	`b360ad3`
Status:	✅ Deploy successful!
Preview URL:	https://ccd11221.alauda-ai.pages.dev
Branch Preview URL:	https://feat-envoy-ai-gateway-how-to.alauda-ai.pages.dev

View logs

Add guidance to create the Gateway (and AIGatewayRoute) in a dedicated namespace such as maas-system, not in the control-plane namespace envoy-gateway-system. A full callout in intro.mdx explains the rationale (separation from the control plane, plus avoiding a version-specific issue where a control-plane-namespace gateway does not get the AI Gateway ext_proc filter / SecurityPolicy applied to its listener, silently breaking routing, quotas, and auth). Each how-to guide gets a concise Prerequisites note linking back to it. Verified on an ACP cluster: an identical gateway created in a non-system namespace gets the router ext_proc filter injected natively (model routing returns 200) and SecurityPolicy/apiKeyAuth enforces correctly, both without any workaround. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ripts and field-level explanations Every kubectl/curl/YAML/PromQL snippet in the four how-to guides was exercised against an ALAUDA_CLOUD verify cluster (EAIG v0.4.2, EG v1.5.0) and updated to match observed behavior. identity_authentication.mdx - Switch the API-key example to EG-native apiKeyAuth.forwardClientIDHeader + sanitize (confirmed in the live SecurityPolicy CRD schema). - Add a known-issue warning + EnvoyPatchPolicy workaround for the EG v1.5.x translation gap (HCM api_key_auth filter stays disabled:true; per-route config is not wrapped in FilterConfig{disabled:false}, so the policy reports Accepted=True but does not enforce). Workaround verified end-to-end: alice/ci-runner keys 200 with upstream x-user-id injected, wrong/missing keys 401, X-API-Key stripped. - Note that the OIDC issuer is not bound to Dex; any OIDC provider works (Keycloak, Auth0, Okta, GitHub OIDC, ...) so consumers without an ACP platform account can still authenticate. - Fix SecurityPolicy verification jsonpath: ancestor-scoped at .status.ancestors[*].conditions[?(@.type=="Accepted")].status. token_rate_limiting.mdx - Add Prereq preflight: CRD check, in-cluster Redis PING, kubectl get gateway -A, envoy-ratelimit rollout-status. - Add Redis TLS / password placeholders and the rate-limit deployment health check. - Explain CEL/InputToken/OutputToken/TotalToken semantics with a worked weighted-cost example. - Replace prose-only verification with a runnable alice/bob 8-curl burst and per-user isolation expectation (matches the live 200x2 -> 429x6 pattern observed against the 40-tok/min budget). - Fix redis-cli scan pattern: real EG rate-limit keys are not prefixed with "ratelimit:"; switch the example to a user-id substring match. external_provider_routing.mdx - Add Prereq preflight: CRD checks, in-cluster egress probe to api.openai.com. - Insert the missing Secret-create step and call out that the data-map key MUST be exactly 'apiKey' for type: APIKey (confirmed by controller log: "secret <name> does not contain key apiKey"; symptom is request timeout, not 401). - Add the missing Backend (FQDN) example and the per-type credential matrix (APIKey, AWSCredentials, AzureAPIKey, AzureCredentials, GCPCredentials, AnthropicAPIKey). - Clarify that Backend STATUS=Accepted appears only after an HTTPRoute references it; until then the column is empty. - Explain priority 0/1 semantics (Envoy locality-weighted load balancing) and provide a failover-simulation tip. usage_metering.mdx - Add Prereq preflight: PodMonitor CRD, port-forward 1064, and a metric presence check before any monitoring wiring. - Surface that the ai-gateway-extproc sidecar is declared as a native initContainer (restartPolicy: Always), which is why the aigw-metrics named port is on the pod even though .spec.containers does not list it. - Add a discovery snippet for the cluster's Prometheus podMonitorSelector label and put it on the PodMonitor.metadata.labels (without this label the PodMonitor is invisible to the platform Prometheus). - Replace the fabricated `helm upgrade ai-eg-helm ...` and `kubectl set env ... AI_GATEWAY_METRICS_REQUEST_HEADER_ATTRIBUTES=...` recipes with the verified path: a `kubectl patch deploy` that adds the `-metricsRequestHeaderAttributes=<header>:<label>` CLI flag (single-dash flag on ai-gateway-controller; no env var exists). - Rewrite all PromQL examples to use {__name__="..."} selectors because Prometheus stores the metric with OpenTelemetry's dotted name (`gen_ai.client.token.usage_token_sum`), not the underscored form emitted at the extproc /metrics endpoint. - Add scrape-target health check via the Prometheus /api/v1/targets API and a metric-name discovery snippet via /api/v1/label/__name__/values. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…w for token usage Replace the single-paragraph "Chargeback with Cost Management" pointer with a full Steps section that has been verified live on Cost Management v4.1.0: configure the two ConfigMaps (collection in cpaas-system, display in kube-public), reload agent + server, create a pricing model in the UI, and validate that cost.bills accumulates per-namespace AI Gateway token charges. The verified gotchas are inlined as warnings/troubleshooting so readers do not retread them: - `kind: Project` is the only kind value the agent honors for namespace- scoped custom metrics on v4.1.0; `kind: AIGateway` or similar is silently dropped, leaving cost.usage empty with no diagnostic. - `mappers.cluster: ""` (empty string) is required so the agent fills in its own cluster identity. Writing `cluster: cluster` makes the agent look for a `cluster` Prometheus label that the gen_ai metric does not carry, and every row is dropped without a log entry. - The display ConfigMap MUST be a separate resource with the `cpaas.io/slark.display.config=true` label. Adding a custom entry to the platform-installed `slark-server-common-config` makes slark-server panic at startup (stricter validation on the platform CM). - Neither cost-agent nor cost-server watches its ConfigMap for changes; both must be force-recreated after every collection or display config edit. Document the exact `kubectl delete pod --grace-period=0 --force` commands. - The UI Cost Model form has a "linked clusters" field that, if left empty, saves successfully but matches no usage groups and produces no bills. Call this out as the #1 reason "configured but nothing in the UI" reports happen. - cost.milestones marks each (cluster, date) group Done after first compute, so adding or repricing an item only affects newly-arriving windows. Document the milestone delete recipe for backfill. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

Comment thread docs/en/envoy_ai_gateway/how_to/usage_metering.mdx

zxl and others added 3 commits May 27, 2026 05:55

sinbadonline force-pushed the feat/envoy-ai-gateway-how-to-docs branch from 25583d1 to ff7303d Compare May 28, 2026 08:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(envoy-ai-gateway): add how-to guides for MaaS governance#238

docs(envoy-ai-gateway): add how-to guides for MaaS governance#238
sinbadonline wants to merge 4 commits into
masterfrom
feat/envoy-ai-gateway-how-to-docs

sinbadonline commented May 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 27, 2026 •

edited

Loading

Review limit reached

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

cloudflare-workers-and-pages Bot commented May 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sinbadonline commented May 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cloudflare-workers-and-pages Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying alauda-ai with Cloudflare Pages

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sinbadonline commented May 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 27, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented May 27, 2026 •

edited

Loading