Skip to content

docs(envoy-ai-gateway): add how-to guides for MaaS governance#238

Open
sinbadonline wants to merge 4 commits into
masterfrom
feat/envoy-ai-gateway-how-to-docs
Open

docs(envoy-ai-gateway): add how-to guides for MaaS governance#238
sinbadonline wants to merge 4 commits into
masterfrom
feat/envoy-ai-gateway-how-to-docs

Conversation

@sinbadonline
Copy link
Copy Markdown
Contributor

@sinbadonline sinbadonline commented May 27, 2026

Add four how-to guides under docs/en/envoy_ai_gateway/how_to/, plus an index, and surface them from intro.mdx. Register the costmanagement external site in sites.yaml (referenced by the metering guide).

  • identity_authentication.mdx — SecurityPolicy with OIDC/JWT or API key, mapping claims to identity headers (x-user-id / x-user-group).
  • token_rate_limiting.mdx — response-cost token quota via llmRequestCosts
    • a Global BackendTrafficPolicy backed by Redis.
  • usage_metering.mdx — OpenTelemetry gen_ai_client_token_usage_token scraped from the ExtProc sidecar (port 1064) with a PodMonitor, plus per-department labelling via metricsRequestHeaderAttributes.
  • external_provider_routing.mdx — BackendSecurityPolicy credential injection + AIServiceBackend, with priority-based provider failover.

All four guides were verified end-to-end on an ACP cluster (EAIG v0.4.2, Envoy Gateway v1.5.4).

Summary by CodeRabbit

  • Documentation
    • Added Envoy AI Gateway how-to guides: identity authentication (OIDC/JWT and API-key flows), token-based rate limiting with Redis-backed quotas, usage metering with Prometheus metrics and dashboards, external LLM provider routing with model-based routing and priority failover, a How-To index, and deployment guidance with verification steps.
  • Chores
    • Added cost management site configuration.

Review Change Stack

Add four how-to guides under docs/en/envoy_ai_gateway/how_to/, plus an
index, and surface them from intro.mdx. Register the costmanagement
external site in sites.yaml (referenced by the metering guide).

- identity_authentication.mdx — SecurityPolicy with OIDC/JWT or API key,
  mapping claims to identity headers (x-user-id / x-user-group).
- token_rate_limiting.mdx — response-cost token quota via llmRequestCosts
  + a Global BackendTrafficPolicy backed by Redis.
- usage_metering.mdx — OpenTelemetry gen_ai_client_token_usage_token
  scraped from the ExtProc sidecar (port 1064) with a PodMonitor, plus
  per-department labelling via metricsRequestHeaderAttributes.
- external_provider_routing.mdx — BackendSecurityPolicy credential
  injection + AIServiceBackend, with priority-based provider failover.

All four guides were verified end-to-end on an ACP cluster
(EAIG v0.4.2, Envoy Gateway v1.5.4).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

Warning

Review limit reached

@sinbadonline, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 42 minutes and 6 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a84cbae5-a8a3-44cf-928d-da4d119ab136

📥 Commits

Reviewing files that changed from the base of the PR and between 25583d1 and ff7303d.

📒 Files selected for processing (1)
  • docs/en/envoy_ai_gateway/how_to/usage_metering.mdx

Walkthrough

This PR adds comprehensive how-to documentation for Envoy AI Gateway's multi-tenant model serving capabilities, including guides for edge authentication, token rate limiting, usage metering, and external provider routing, plus navigation updates and site configuration.

Changes

Multi-tenant AI Gateway How-To Guides

Layer / File(s) Summary
How-To landing, intro, and site config
docs/en/envoy_ai_gateway/how_to/index.mdx, docs/en/envoy_ai_gateway/intro.mdx, sites.yaml
Adds a how-to landing page, inserts a new "Guides" section in the main introduction, and registers the costmanagement site entry.
Identity Authentication Guide
docs/en/envoy_ai_gateway/how_to/identity_authentication.mdx
Documents OIDC/JWT and API key edge authentication, Dex claim-to-header mappings, optional rollout guidance, verification steps, and links to related docs.
Token Rate Limiting Guide
docs/en/envoy_ai_gateway/how_to/token_rate_limiting.mdx
Explains Redis-backed global rate limit bootstrap, capturing token usage via llmRequestCosts, enforcing per-identity token budgets with BackendTrafficPolicy, verification, and next-step links.
Usage Metering Guide
docs/en/envoy_ai_gateway/how_to/usage_metering.mdx
Describes exporting token usage metrics enriched by identity labels, scraping ExtProc metrics via PodMonitor, building dashboards, chargeback integration, and verification steps.
External Provider Routing Guide
docs/en/envoy_ai_gateway/how_to/external_provider_routing.mdx
Shows how to inject upstream provider credentials (BackendSecurityPolicy), register AIServiceBackend entries, and route OpenAI-compatible requests to external LLMs with model-based priority failover and curl verification.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Four guides now bloom in the garden so bright,
Auth, quotas, and metrics—all working just right!
With token counts flowing and providers aligned,
Multi-tenant AI routing, neatly defined!
Hop forth, dear reviewer—I'll nibble carrots while you review! 🥕✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding how-to guides for Envoy AI Gateway Model-as-a-Service (MaaS) governance, which is the primary focus of the PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/envoy-ai-gateway-how-to-docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
docs/en/envoy_ai_gateway/how_to/identity_authentication.mdx (1)

17-17: 💤 Low value

Consider using "who" for people references.

The phrase "machine consumer that cannot" would read more naturally as "machine consumer who cannot" since it refers to a consumer entity (which can be a person or service acting on behalf of people).

✏️ Suggested wording
-- A machine consumer that cannot run an interactive login presents a static API key that maps to a known tenant.
+- A machine consumer who cannot run an interactive login presents a static API key that maps to a known tenant.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/en/envoy_ai_gateway/how_to/identity_authentication.mdx` at line 17,
Replace "A machine consumer that cannot run an interactive login presents a
static API key that maps to a known tenant." with wording that uses "who" for
the consumer reference; e.g., change the phrase "machine consumer that cannot"
to "machine consumer who cannot" so the sentence reads naturally while
preserving the rest of the text.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/en/envoy_ai_gateway/how_to/usage_metering.mdx`:
- Around line 73-75: Change the fenced PromQL code block language identifier
from "text" to "bash" so the query snippet starting with "sum by (department)
(increase(gen_ai_client_token_usage_token_sum[1h]))" renders correctly in
rspress; locate the triple-backtick block containing that PromQL query and
replace the opening ```text with ```bash.

---

Nitpick comments:
In `@docs/en/envoy_ai_gateway/how_to/identity_authentication.mdx`:
- Line 17: Replace "A machine consumer that cannot run an interactive login
presents a static API key that maps to a known tenant." with wording that uses
"who" for the consumer reference; e.g., change the phrase "machine consumer that
cannot" to "machine consumer who cannot" so the sentence reads naturally while
preserving the rest of the text.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7492fb47-5475-4f3e-a493-d7d02054ebc2

📥 Commits

Reviewing files that changed from the base of the PR and between df87e24 and 4c1686b.

📒 Files selected for processing (7)
  • docs/en/envoy_ai_gateway/how_to/external_provider_routing.mdx
  • docs/en/envoy_ai_gateway/how_to/identity_authentication.mdx
  • docs/en/envoy_ai_gateway/how_to/index.mdx
  • docs/en/envoy_ai_gateway/how_to/token_rate_limiting.mdx
  • docs/en/envoy_ai_gateway/how_to/usage_metering.mdx
  • docs/en/envoy_ai_gateway/intro.mdx
  • sites.yaml

Comment thread docs/en/envoy_ai_gateway/how_to/usage_metering.mdx
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 27, 2026

Deploying alauda-ai with  Cloudflare Pages  Cloudflare Pages

Latest commit: b360ad3
Status: ✅  Deploy successful!
Preview URL: https://ccd11221.alauda-ai.pages.dev
Branch Preview URL: https://feat-envoy-ai-gateway-how-to.alauda-ai.pages.dev

View logs

zxl and others added 3 commits May 27, 2026 05:55
Add guidance to create the Gateway (and AIGatewayRoute) in a dedicated
namespace such as maas-system, not in the control-plane namespace
envoy-gateway-system. A full callout in intro.mdx explains the rationale
(separation from the control plane, plus avoiding a version-specific
issue where a control-plane-namespace gateway does not get the AI Gateway
ext_proc filter / SecurityPolicy applied to its listener, silently
breaking routing, quotas, and auth). Each how-to guide gets a concise
Prerequisites note linking back to it.

Verified on an ACP cluster: an identical gateway created in a non-system
namespace gets the router ext_proc filter injected natively (model
routing returns 200) and SecurityPolicy/apiKeyAuth enforces correctly,
both without any workaround.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ripts and field-level explanations

Every kubectl/curl/YAML/PromQL snippet in the four how-to guides was
exercised against an ALAUDA_CLOUD verify cluster (EAIG v0.4.2, EG v1.5.0)
and updated to match observed behavior.

identity_authentication.mdx
- Switch the API-key example to EG-native apiKeyAuth.forwardClientIDHeader
  + sanitize (confirmed in the live SecurityPolicy CRD schema).
- Add a known-issue warning + EnvoyPatchPolicy workaround for the EG
  v1.5.x translation gap (HCM api_key_auth filter stays disabled:true;
  per-route config is not wrapped in FilterConfig{disabled:false}, so the
  policy reports Accepted=True but does not enforce). Workaround verified
  end-to-end: alice/ci-runner keys 200 with upstream x-user-id injected,
  wrong/missing keys 401, X-API-Key stripped.
- Note that the OIDC issuer is not bound to Dex; any OIDC provider works
  (Keycloak, Auth0, Okta, GitHub OIDC, ...) so consumers without an ACP
  platform account can still authenticate.
- Fix SecurityPolicy verification jsonpath: ancestor-scoped at
  .status.ancestors[*].conditions[?(@.type=="Accepted")].status.

token_rate_limiting.mdx
- Add Prereq preflight: CRD check, in-cluster Redis PING, kubectl get
  gateway -A, envoy-ratelimit rollout-status.
- Add Redis TLS / password placeholders and the rate-limit deployment
  health check.
- Explain CEL/InputToken/OutputToken/TotalToken semantics with a worked
  weighted-cost example.
- Replace prose-only verification with a runnable alice/bob 8-curl burst
  and per-user isolation expectation (matches the live 200x2 -> 429x6
  pattern observed against the 40-tok/min budget).
- Fix redis-cli scan pattern: real EG rate-limit keys are not prefixed
  with "ratelimit:"; switch the example to a user-id substring match.

external_provider_routing.mdx
- Add Prereq preflight: CRD checks, in-cluster egress probe to
  api.openai.com.
- Insert the missing Secret-create step and call out that the data-map
  key MUST be exactly 'apiKey' for type: APIKey (confirmed by controller
  log: "secret <name> does not contain key apiKey"; symptom is request
  timeout, not 401).
- Add the missing Backend (FQDN) example and the per-type credential
  matrix (APIKey, AWSCredentials, AzureAPIKey, AzureCredentials,
  GCPCredentials, AnthropicAPIKey).
- Clarify that Backend STATUS=Accepted appears only after an HTTPRoute
  references it; until then the column is empty.
- Explain priority 0/1 semantics (Envoy locality-weighted load balancing)
  and provide a failover-simulation tip.

usage_metering.mdx
- Add Prereq preflight: PodMonitor CRD, port-forward 1064, and a metric
  presence check before any monitoring wiring.
- Surface that the ai-gateway-extproc sidecar is declared as a native
  initContainer (restartPolicy: Always), which is why the aigw-metrics
  named port is on the pod even though .spec.containers does not list it.
- Add a discovery snippet for the cluster's Prometheus podMonitorSelector
  label and put it on the PodMonitor.metadata.labels (without this label
  the PodMonitor is invisible to the platform Prometheus).
- Replace the fabricated `helm upgrade ai-eg-helm ...` and
  `kubectl set env ... AI_GATEWAY_METRICS_REQUEST_HEADER_ATTRIBUTES=...`
  recipes with the verified path: a `kubectl patch deploy` that adds the
  `-metricsRequestHeaderAttributes=<header>:<label>` CLI flag (single-dash
  flag on ai-gateway-controller; no env var exists).
- Rewrite all PromQL examples to use {__name__="..."} selectors because
  Prometheus stores the metric with OpenTelemetry's dotted name
  (`gen_ai.client.token.usage_token_sum`), not the underscored form
  emitted at the extproc /metrics endpoint.
- Add scrape-target health check via the Prometheus /api/v1/targets API
  and a metric-name discovery snippet via /api/v1/label/__name__/values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…w for token usage

Replace the single-paragraph "Chargeback with Cost Management" pointer
with a full Steps section that has been verified live on Cost Management
v4.1.0: configure the two ConfigMaps (collection in cpaas-system, display
in kube-public), reload agent + server, create a pricing model in the UI,
and validate that cost.bills accumulates per-namespace AI Gateway token
charges.

The verified gotchas are inlined as warnings/troubleshooting so readers
do not retread them:

- `kind: Project` is the only kind value the agent honors for namespace-
  scoped custom metrics on v4.1.0; `kind: AIGateway` or similar is
  silently dropped, leaving cost.usage empty with no diagnostic.
- `mappers.cluster: ""` (empty string) is required so the agent fills in
  its own cluster identity. Writing `cluster: cluster` makes the agent
  look for a `cluster` Prometheus label that the gen_ai metric does not
  carry, and every row is dropped without a log entry.
- The display ConfigMap MUST be a separate resource with the
  `cpaas.io/slark.display.config=true` label. Adding a custom entry to
  the platform-installed `slark-server-common-config` makes slark-server
  panic at startup (stricter validation on the platform CM).
- Neither cost-agent nor cost-server watches its ConfigMap for changes;
  both must be force-recreated after every collection or display config
  edit. Document the exact `kubectl delete pod --grace-period=0 --force`
  commands.
- The UI Cost Model form has a "linked clusters" field that, if left
  empty, saves successfully but matches no usage groups and produces no
  bills. Call this out as the #1 reason "configured but nothing in the
  UI" reports happen.
- cost.milestones marks each (cluster, date) group Done after first
  compute, so adding or repricing an item only affects newly-arriving
  windows. Document the milestone delete recipe for backfill.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sinbadonline sinbadonline force-pushed the feat/envoy-ai-gateway-how-to-docs branch from 25583d1 to ff7303d Compare May 28, 2026 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant