docs(envoy-ai-gateway): add how-to guides for MaaS governance#238
docs(envoy-ai-gateway): add how-to guides for MaaS governance#238sinbadonline wants to merge 4 commits into
Conversation
Add four how-to guides under docs/en/envoy_ai_gateway/how_to/, plus an index, and surface them from intro.mdx. Register the costmanagement external site in sites.yaml (referenced by the metering guide). - identity_authentication.mdx — SecurityPolicy with OIDC/JWT or API key, mapping claims to identity headers (x-user-id / x-user-group). - token_rate_limiting.mdx — response-cost token quota via llmRequestCosts + a Global BackendTrafficPolicy backed by Redis. - usage_metering.mdx — OpenTelemetry gen_ai_client_token_usage_token scraped from the ExtProc sidecar (port 1064) with a PodMonitor, plus per-department labelling via metricsRequestHeaderAttributes. - external_provider_routing.mdx — BackendSecurityPolicy credential injection + AIServiceBackend, with priority-based provider failover. All four guides were verified end-to-end on an ACP cluster (EAIG v0.4.2, Envoy Gateway v1.5.4). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 42 minutes and 6 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThis PR adds comprehensive how-to documentation for Envoy AI Gateway's multi-tenant model serving capabilities, including guides for edge authentication, token rate limiting, usage metering, and external provider routing, plus navigation updates and site configuration. ChangesMulti-tenant AI Gateway How-To Guides
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
docs/en/envoy_ai_gateway/how_to/identity_authentication.mdx (1)
17-17: 💤 Low valueConsider using "who" for people references.
The phrase "machine consumer that cannot" would read more naturally as "machine consumer who cannot" since it refers to a consumer entity (which can be a person or service acting on behalf of people).
✏️ Suggested wording
-- A machine consumer that cannot run an interactive login presents a static API key that maps to a known tenant. +- A machine consumer who cannot run an interactive login presents a static API key that maps to a known tenant.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/en/envoy_ai_gateway/how_to/identity_authentication.mdx` at line 17, Replace "A machine consumer that cannot run an interactive login presents a static API key that maps to a known tenant." with wording that uses "who" for the consumer reference; e.g., change the phrase "machine consumer that cannot" to "machine consumer who cannot" so the sentence reads naturally while preserving the rest of the text.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/en/envoy_ai_gateway/how_to/usage_metering.mdx`:
- Around line 73-75: Change the fenced PromQL code block language identifier
from "text" to "bash" so the query snippet starting with "sum by (department)
(increase(gen_ai_client_token_usage_token_sum[1h]))" renders correctly in
rspress; locate the triple-backtick block containing that PromQL query and
replace the opening ```text with ```bash.
---
Nitpick comments:
In `@docs/en/envoy_ai_gateway/how_to/identity_authentication.mdx`:
- Line 17: Replace "A machine consumer that cannot run an interactive login
presents a static API key that maps to a known tenant." with wording that uses
"who" for the consumer reference; e.g., change the phrase "machine consumer that
cannot" to "machine consumer who cannot" so the sentence reads naturally while
preserving the rest of the text.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7492fb47-5475-4f3e-a493-d7d02054ebc2
📒 Files selected for processing (7)
docs/en/envoy_ai_gateway/how_to/external_provider_routing.mdxdocs/en/envoy_ai_gateway/how_to/identity_authentication.mdxdocs/en/envoy_ai_gateway/how_to/index.mdxdocs/en/envoy_ai_gateway/how_to/token_rate_limiting.mdxdocs/en/envoy_ai_gateway/how_to/usage_metering.mdxdocs/en/envoy_ai_gateway/intro.mdxsites.yaml
Deploying alauda-ai with
|
| Latest commit: |
b360ad3
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://ccd11221.alauda-ai.pages.dev |
| Branch Preview URL: | https://feat-envoy-ai-gateway-how-to.alauda-ai.pages.dev |
Add guidance to create the Gateway (and AIGatewayRoute) in a dedicated namespace such as maas-system, not in the control-plane namespace envoy-gateway-system. A full callout in intro.mdx explains the rationale (separation from the control plane, plus avoiding a version-specific issue where a control-plane-namespace gateway does not get the AI Gateway ext_proc filter / SecurityPolicy applied to its listener, silently breaking routing, quotas, and auth). Each how-to guide gets a concise Prerequisites note linking back to it. Verified on an ACP cluster: an identical gateway created in a non-system namespace gets the router ext_proc filter injected natively (model routing returns 200) and SecurityPolicy/apiKeyAuth enforces correctly, both without any workaround. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ripts and field-level explanations
Every kubectl/curl/YAML/PromQL snippet in the four how-to guides was
exercised against an ALAUDA_CLOUD verify cluster (EAIG v0.4.2, EG v1.5.0)
and updated to match observed behavior.
identity_authentication.mdx
- Switch the API-key example to EG-native apiKeyAuth.forwardClientIDHeader
+ sanitize (confirmed in the live SecurityPolicy CRD schema).
- Add a known-issue warning + EnvoyPatchPolicy workaround for the EG
v1.5.x translation gap (HCM api_key_auth filter stays disabled:true;
per-route config is not wrapped in FilterConfig{disabled:false}, so the
policy reports Accepted=True but does not enforce). Workaround verified
end-to-end: alice/ci-runner keys 200 with upstream x-user-id injected,
wrong/missing keys 401, X-API-Key stripped.
- Note that the OIDC issuer is not bound to Dex; any OIDC provider works
(Keycloak, Auth0, Okta, GitHub OIDC, ...) so consumers without an ACP
platform account can still authenticate.
- Fix SecurityPolicy verification jsonpath: ancestor-scoped at
.status.ancestors[*].conditions[?(@.type=="Accepted")].status.
token_rate_limiting.mdx
- Add Prereq preflight: CRD check, in-cluster Redis PING, kubectl get
gateway -A, envoy-ratelimit rollout-status.
- Add Redis TLS / password placeholders and the rate-limit deployment
health check.
- Explain CEL/InputToken/OutputToken/TotalToken semantics with a worked
weighted-cost example.
- Replace prose-only verification with a runnable alice/bob 8-curl burst
and per-user isolation expectation (matches the live 200x2 -> 429x6
pattern observed against the 40-tok/min budget).
- Fix redis-cli scan pattern: real EG rate-limit keys are not prefixed
with "ratelimit:"; switch the example to a user-id substring match.
external_provider_routing.mdx
- Add Prereq preflight: CRD checks, in-cluster egress probe to
api.openai.com.
- Insert the missing Secret-create step and call out that the data-map
key MUST be exactly 'apiKey' for type: APIKey (confirmed by controller
log: "secret <name> does not contain key apiKey"; symptom is request
timeout, not 401).
- Add the missing Backend (FQDN) example and the per-type credential
matrix (APIKey, AWSCredentials, AzureAPIKey, AzureCredentials,
GCPCredentials, AnthropicAPIKey).
- Clarify that Backend STATUS=Accepted appears only after an HTTPRoute
references it; until then the column is empty.
- Explain priority 0/1 semantics (Envoy locality-weighted load balancing)
and provide a failover-simulation tip.
usage_metering.mdx
- Add Prereq preflight: PodMonitor CRD, port-forward 1064, and a metric
presence check before any monitoring wiring.
- Surface that the ai-gateway-extproc sidecar is declared as a native
initContainer (restartPolicy: Always), which is why the aigw-metrics
named port is on the pod even though .spec.containers does not list it.
- Add a discovery snippet for the cluster's Prometheus podMonitorSelector
label and put it on the PodMonitor.metadata.labels (without this label
the PodMonitor is invisible to the platform Prometheus).
- Replace the fabricated `helm upgrade ai-eg-helm ...` and
`kubectl set env ... AI_GATEWAY_METRICS_REQUEST_HEADER_ATTRIBUTES=...`
recipes with the verified path: a `kubectl patch deploy` that adds the
`-metricsRequestHeaderAttributes=<header>:<label>` CLI flag (single-dash
flag on ai-gateway-controller; no env var exists).
- Rewrite all PromQL examples to use {__name__="..."} selectors because
Prometheus stores the metric with OpenTelemetry's dotted name
(`gen_ai.client.token.usage_token_sum`), not the underscored form
emitted at the extproc /metrics endpoint.
- Add scrape-target health check via the Prometheus /api/v1/targets API
and a metric-name discovery snippet via /api/v1/label/__name__/values.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…w for token usage Replace the single-paragraph "Chargeback with Cost Management" pointer with a full Steps section that has been verified live on Cost Management v4.1.0: configure the two ConfigMaps (collection in cpaas-system, display in kube-public), reload agent + server, create a pricing model in the UI, and validate that cost.bills accumulates per-namespace AI Gateway token charges. The verified gotchas are inlined as warnings/troubleshooting so readers do not retread them: - `kind: Project` is the only kind value the agent honors for namespace- scoped custom metrics on v4.1.0; `kind: AIGateway` or similar is silently dropped, leaving cost.usage empty with no diagnostic. - `mappers.cluster: ""` (empty string) is required so the agent fills in its own cluster identity. Writing `cluster: cluster` makes the agent look for a `cluster` Prometheus label that the gen_ai metric does not carry, and every row is dropped without a log entry. - The display ConfigMap MUST be a separate resource with the `cpaas.io/slark.display.config=true` label. Adding a custom entry to the platform-installed `slark-server-common-config` makes slark-server panic at startup (stricter validation on the platform CM). - Neither cost-agent nor cost-server watches its ConfigMap for changes; both must be force-recreated after every collection or display config edit. Document the exact `kubectl delete pod --grace-period=0 --force` commands. - The UI Cost Model form has a "linked clusters" field that, if left empty, saves successfully but matches no usage groups and produces no bills. Call this out as the #1 reason "configured but nothing in the UI" reports happen. - cost.milestones marks each (cluster, date) group Done after first compute, so adding or repricing an item only affects newly-arriving windows. Document the milestone delete recipe for backfill. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
25583d1 to
ff7303d
Compare
Add four how-to guides under docs/en/envoy_ai_gateway/how_to/, plus an index, and surface them from intro.mdx. Register the costmanagement external site in sites.yaml (referenced by the metering guide).
All four guides were verified end-to-end on an ACP cluster (EAIG v0.4.2, Envoy Gateway v1.5.4).
Summary by CodeRabbit