Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
190 changes: 190 additions & 0 deletions docs/en/envoy_ai_gateway/how_to/external_provider_routing.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
---
weight: 40
category: howto
---
# Routing to LLM Providers

## Introduction

Envoy AI Gateway can front external LLM providers behind one OpenAI-compatible endpoint. It injects each provider's upstream credentials with a `BackendSecurityPolicy`, routes by model name, and fails over between providers. Consumers call a single internal address and never hold provider keys, so the gateway becomes the controlled egress point for public-cloud LLM traffic, with the same identity, quota, and metering applied to external models as to self-hosted ones.

## Use Cases

- Expose a hosted model, such as one from OpenAI, AWS Bedrock, Azure OpenAI, GCP Vertex AI, or Anthropic, without distributing the provider key.
- Route different model names to different providers behind one endpoint.
- Fail over from a primary provider to a backup when the primary is unavailable.

## Prerequisites

1. Envoy AI Gateway is installed, with a `Gateway` and an `AIGatewayRoute`. Confirm the relevant CRDs are present:

```bash
kubectl get crd \
aigatewayroutes.aigateway.envoyproxy.io \
aiservicebackends.aigateway.envoyproxy.io \
backendsecuritypolicies.aigateway.envoyproxy.io \
backends.gateway.envoyproxy.io
```

2. The upstream provider credential (created in the next section) is stored in a `Secret` in the route namespace.
3. The provider endpoint is reachable from the cluster egress. Verify before going further:

```bash
kubectl run egress-probe --rm -i --restart=Never \
--image=curlimages/curl -- \
curl -s -o /dev/null -w '%{http_code}\n' https://api.openai.com/v1/models
# expect: 401 (anything other than a connection error means egress works)
```

:::note
Create the `Gateway` and `AIGatewayRoute` in a dedicated namespace (for example `maas-system`), not in the Envoy Gateway control-plane namespace `envoy-gateway-system`. A gateway placed in the control-plane namespace may not have the AI Gateway request-processing filter and `SecurityPolicy` applied to its listener, which silently breaks routing and policy enforcement. See [Envoy AI Gateway](../intro).
:::

## Steps

<Steps>
### Store the upstream credential

Create the `Secret` that holds the provider API key. For `type: APIKey` the **data-map key must be exactly `apiKey`** — the `BackendSecurityPolicy` looks up that field by name, so a `--from-literal=key=...` will leave the upstream call unauthenticated even though the policy reports `Accepted`:

```bash
kubectl -n <your-namespace> create secret generic openai-key \
--from-literal=apiKey="$OPENAI_API_KEY" # data-map key must be 'apiKey'
```

Inject the provider credential with a `BackendSecurityPolicy` that targets the backend. The `type` field selects the provider authentication scheme.

```yaml
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: BackendSecurityPolicy
metadata:
name: openai-auth
namespace: <your-namespace>
spec:
type: APIKey
apiKey:
secretRef:
name: openai-key # Secret holding the provider API key (data key 'apiKey')
targetRefs:
- group: aigateway.envoyproxy.io
kind: AIServiceBackend
name: openai-backend
```

The `type` field accepts `APIKey`, `AWSCredentials`, `AzureAPIKey`, `AzureCredentials`, `GCPCredentials`, and `AnthropicAPIKey`. Each type expects a matching credential block and a matching set of `Secret` data keys:

| `type` | credential block | required `Secret` data keys |
|---|---|---|
| `APIKey` | `apiKey.secretRef` | `apiKey` |
| `AWSCredentials` | `awsCredentials.credentialsFile.secretRef` | `credentials` (AWS shared-credentials INI) |
| `AzureAPIKey` | `azureApiKey.secretRef` | `apiKey` |
| `AzureCredentials` | `azureCredentials.clientSecretRef` | `client-secret` (plus `clientID`/`tenantID` inline) |
| `GCPCredentials` | `gcpCredentials.workloadIdentityFederationConfig` | service-account JSON via the configured source |
| `AnthropicAPIKey` | `anthropicApiKey.secretRef` | `apiKey` |

When the upstream auth scheme is wrong, the upstream typically returns `401`/`403`. When the `Secret` is keyed wrongly (for example `key:` instead of `apiKey:`) the failure mode is harder to read: the `BackendSecurityPolicy` still reports `Accepted=True`, but the controller logs `failed to get backend auth from backend security policy. Skipping this backend. ... error: secret <name> does not contain key apiKey` and removes the backend from the route, so requests to that backend **time out** rather than returning a clean 401. Tail the controller logs when introducing a new credential:

```bash
kubectl -n envoy-gateway-system logs deploy/ai-gateway-controller -c ai-gateway-helm \
| grep -E 'backend security policy|does not contain key'
```

### Define the provider backend

Two resources work together: a `Backend` (Envoy Gateway) tells the data plane the **network endpoint** to reach, and an `AIServiceBackend` (Envoy AI Gateway) tells the AI filter the **provider schema** to translate to.

First, declare the upstream endpoint as a `Backend`:

```yaml
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: Backend
metadata:
name: openai-endpoint
namespace: <your-namespace>
spec:
endpoints:
- fqdn:
hostname: api.openai.com
port: 443
```

Then register the provider as an `AIServiceBackend` referencing that `Backend`:

```yaml
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIServiceBackend
metadata:
name: openai-backend
namespace: <your-namespace>
spec:
schema:
name: OpenAI
backendRef:
name: openai-endpoint # Backend pointing at the provider host
kind: Backend
group: gateway.envoyproxy.io
```

- `schema.name`: the upstream protocol the gateway must speak. Common values: `OpenAI`, `AWSBedrock`, `AzureOpenAI`, `GCPVertexAI`, `Anthropic`. The gateway transcodes the incoming OpenAI-compatible request to this schema before forwarding.
- `backendRef`: must point at a `Backend` (group `gateway.envoyproxy.io`), not a `Service` — the AI filter relies on the `Backend` for FQDN + TLS handling to public endpoints.

Confirm both resources reconciled:

```bash
kubectl get backend,aiservicebackend -n <your-namespace>
# AIServiceBackend shows ACCEPTED=True immediately;
# the Backend STATUS column stays empty until the next step's AIGatewayRoute
# actually references this AIServiceBackend — Envoy Gateway only reconciles a
# Backend once at least one HTTPRoute targets it.
```

### Route by model with fallback

Reference multiple backends in one `AIGatewayRoute` rule and set `priority` so the gateway fails over from the primary to the backup:

```yaml
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
name: <aigatewayroute-name>
namespace: <your-namespace>
spec:
# ... parentRefs ...
rules:
- matches:
- headers:
- name: x-ai-eg-model
type: Exact
value: gpt-4o
backendRefs:
- name: openai-backend
priority: 0 # primary
- name: azure-backend
priority: 1 # used when the primary is unavailable
```

- `matches.headers[x-ai-eg-model]`: the AI filter parses the `model` field from the request body and writes it to this header for routing. So `"model":"gpt-4o"` in the request is what reaches this match — no manual header is required from the caller.
- `priority`: Envoy uses **locality-weighted load balancing**. Priority-0 endpoints take all traffic while at least one is healthy; the priority-1 group only receives traffic once every priority-0 endpoint trips its outlier detection. Failover is automatic but takes seconds, not milliseconds; do not rely on it for tail-latency budgets.
</Steps>

## Verification

Send an OpenAI-compatible request to the gateway and confirm it reaches the provider, the client's `Authorization` header is replaced by the BSP-injected key, and a valid response with token usage comes back:

```bash
# A deliberately-bogus client Authorization to prove the gateway strips it
# and substitutes the upstream key from the BackendSecurityPolicy.
curl -sv http://<gateway-address>/v1/chat/completions \
-H 'Authorization: Bearer client-token-that-should-be-replaced' \
-H 'Content-Type: application/json' \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"ping"}]}'
```

A successful response (`200 OK` with a `usage` object) means the upstream credential was injected and the route resolved. Inspect the request that actually hit the upstream by enabling Envoy's access log on the `EnvoyProxy` resource, or temporarily route to a debug echo backend; the upstream-bound `Authorization` header should carry the value from the `openai-key` `Secret`, not the bogus client value above.

To exercise failover, simulate a primary outage by pointing the primary `Backend` at an unreachable host (`hostname: invalid.example.invalid`) for a few seconds and watch traffic shift to the backup; the response body's `model` field will reflect the new provider.

## Learn More

- [Authenticating Consumers](./identity_authentication)
- [Configuring Token Quotas](./token_rate_limiting)
185 changes: 185 additions & 0 deletions docs/en/envoy_ai_gateway/how_to/identity_authentication.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
---
weight: 10
category: howto
---
# Authenticating Consumers

## Introduction

Envoy AI Gateway authenticates every inference request at the edge and propagates the caller's identity to downstream policies. Authentication is configured with the Envoy Gateway `SecurityPolicy` resource, which attaches to the `HTTPRoute` generated by an `AIGatewayRoute`. After the caller is identified, selected claims are copied into request headers that token quotas and usage metering consume as the per-tenant key.

This turns a per-consumer credential, such as an SSO (Single Sign-On) token or an API key, into an identity that quota and metering can act on. It is the foundation of multi-tenant model serving on a shared gateway.

## Use Cases

- A developer obtains a JWT (JSON Web Token) from the platform identity provider and calls the gateway with it, so the gateway enforces per-user token quotas.
- A CI job presents a service-account token so that automated traffic is attributed to a team rather than an individual.
- A machine consumer that cannot run an interactive login presents a static API key that maps to a known tenant.

## Prerequisites

1. Envoy AI Gateway is installed. See [Install Envoy AI Gateway](../install).
2. An `AIGatewayRoute` already routes requests to one or more backends.
3. For the OIDC/JWT path: an OIDC issuer with a reachable JWKS endpoint. The platform's built-in identity provider, Dex, is the default; any other OIDC issuer (Keycloak, Auth0, Okta, GitHub OIDC, an enterprise Entra ID tenant) also works as long as the gateway can reach its `/.well-known/openid-configuration` and JWKS URL.
4. For the API-key path: cluster permission to create `Secret` objects in the gateway's namespace.

:::note
Create the `Gateway` and `AIGatewayRoute` in a dedicated namespace (for example `maas-system`), not in the Envoy Gateway control-plane namespace `envoy-gateway-system`. A gateway placed in the control-plane namespace may not have the AI Gateway request-processing filter and `SecurityPolicy` applied to its listener, which silently breaks routing and policy enforcement. See [Envoy AI Gateway](../intro).
:::

## Steps

<Steps>
### Authenticate with OIDC or JWT

Validate tokens issued by an OIDC issuer. The platform's built-in Dex is the default issuer; it can also broker external identity sources, such as LDAP or another OIDC provider, so their users obtain platform tokens. Those connectors are configured in platform IdP (Identity Provider) management. For platform IdP configuration, see <ExternalSiteLink name="acp" href="security/users_and_roles/idp/intro.html" children="Identity Providers" />.

Any OIDC issuer with a reachable JWKS endpoint can be used. Replace the `issuer` and `remoteJWKS.uri` below with the issuer of your choice when consumers are not platform users — for example, an enterprise Keycloak realm or a SaaS IdP — so the gateway accepts their tokens without requiring a platform account.

Point the gateway at the OIDC issuer and map its claims to identity headers:

```yaml
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: SecurityPolicy
metadata:
name: maas-oidc-auth
namespace: <your-namespace>
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: <aigatewayroute-name> # HTTPRoute generated by your AIGatewayRoute
jwt:
providers:
- name: platform-idp
issuer: https://<platform-address>/dex
remoteJWKS:
uri: https://<platform-address>/dex/keys
claimToHeaders:
- claim: sub # caller identity, used as the per-user quota and metering key
header: x-user-id
- claim: groups # platform groups, used for per-department aggregation and tiers
header: x-user-group
- claim: email
header: x-user-email
```

- `<platform-address>`: the platform access address. Dex publishes its issuer at `/dex` and its JWKS at `/dex/keys`.
- `<aigatewayroute-name>`: the name of the `HTTPRoute` generated by your `AIGatewayRoute`.
- `claimToHeaders`: the bridge between identity and policy. The emitted headers (`x-user-id`, `x-user-group`) become the selector keys for token quotas and the label values for usage metering.

:::note
The `groups` claim reflects the caller's platform groups, populated by the configured IdP connector. To key policies on an attribute the platform does not emit by default, such as a subscription tier, add the claim in the upstream connector and map it with an extra `claimToHeaders` entry.
:::

:::tip
To roll out without blocking traffic, set `jwt.optional: true` first and observe. Remove it once all consumers present valid tokens.
:::

### Authenticate with an API key

:::warning
**Known issue on Envoy Gateway ≤ v1.5.x.** `SecurityPolicy.apiKeyAuth` is translated correctly (the `api_key_auth` filter is added to the listener with credentials, and an `ApiKeyAuthPerRoute` config is attached to the route), but Envoy Gateway does not enable the filter per route. The result: the policy reports `Accepted=True`, requests with a wrong or missing key still return `200`, and no `x-user-id` is injected. Until a fixed EG release is in place, apply the `EnvoyPatchPolicy` shown at the end of this section to enable the filter at the listener level. Verify on your cluster by sending one request with no key after applying the policy; if you get `200` instead of `401`, the patch is required.
:::

For machine consumers that cannot perform an OIDC flow, validate a static API key instead. There is no issuance service: the cluster administrator generates a random string per consumer, stores it in a `Secret`, and shares it out of band. The gateway's data plane validates each request by looking the presented value up in that `Secret`.

Generate one key per consumer and store them in a single Opaque `Secret`. Each data-map key is the **client identifier** that downstream policies see; each value is the **API key** the consumer presents:

```bash
kubectl -n <your-namespace> create secret generic maas-api-keys \
--from-literal=alice="$(openssl rand -hex 32)" \
--from-literal=ci-runner="$(openssl rand -hex 32)"
```

Bind the `Secret` to the route with a `SecurityPolicy`:

```yaml
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: SecurityPolicy
metadata:
name: maas-apikey-auth
namespace: <your-namespace>
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: <aigatewayroute-name> # HTTPRoute generated by your AIGatewayRoute
apiKeyAuth:
credentialRefs:
- name: maas-api-keys # Secret whose data keys are the client identifiers
extractFrom:
- headers:
- X-API-Key # dedicated header avoids the "Bearer " prefix problem of Authorization
forwardClientIDHeader: x-user-id # matched client identifier is injected as this header for downstream policies
sanitize: true # strip the raw API key from the request before it reaches the model backend
```

- `credentialRefs`: one or more Opaque `Secret`s holding the credentials. Each data-map key is the client identifier, each value is the literal API key. Adding a consumer is a `kubectl patch` of one entry; revoking is a single key deletion.
- `extractFrom`: where Envoy reads the presented key from. The filter does a literal-string compare, so prefer a dedicated header such as `X-API-Key`. Reusing `Authorization` requires storing the value with its `Bearer ` prefix, which mixes badly with the OIDC path on the same gateway.
- `forwardClientIDHeader`: the header that carries the matched client identifier to the upstream and to later filters. Use the same name as the OIDC `claimToHeaders` target (`x-user-id`) so token quotas and usage metering see one consistent key across both auth paths.
- `sanitize`: prevents the raw API key from leaking to the model backend or being logged downstream.

**Workaround for the EG ≤ v1.5.x enforcement gap** (see the warning above). Apply this `EnvoyPatchPolicy` once per Gateway to enable the `api_key_auth` filter at the listener level. The patch is verified end-to-end on EG v1.5.0: with it in place, wrong/missing keys return `401` and `forwardClientIDHeader` injection works as documented. Remove it once you upgrade to a fixed EG release.

```yaml
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyPatchPolicy
metadata:
name: enable-apikey-auth
namespace: envoy-gateway-system # same namespace as the Gateway
spec:
targetRef:
group: gateway.networking.k8s.io
kind: Gateway
name: <gateway-name>
type: JSONPatch
jsonPatches:
- type: type.googleapis.com/envoy.config.listener.v3.Listener
name: <gateway-namespace>/<gateway-name>/<listener-name> # e.g. envoy-gateway-system/ai-gw/http
operation:
op: replace
path: /default_filter_chain/filters/0/typed_config/http_filters/0/disabled
value: false
```

Confirm the patch took effect:

```bash
kubectl get envoypatchpolicy enable-apikey-auth -n envoy-gateway-system \
-o jsonpath='{.status.ancestors[*].conditions[?(@.type=="Programmed")].status}'
# expect: True
```
</Steps>

## Verification

Confirm the policy is accepted. `SecurityPolicy` status is ancestor-scoped, so the jsonpath looks one level deeper than for most resources:

```bash
kubectl get securitypolicy <policy-name> -n <your-namespace> \
-o jsonpath='{.status.ancestors[*].conditions[?(@.type=="Accepted")].status}'
```

The command returns `True` when the policy is programmed.

For the OIDC path, send a request with a valid token and confirm the upstream service receives the `x-user-id`, `x-user-group`, and `x-user-email` headers.

For the API-key path, send the matching `X-API-Key` and confirm the upstream sees `x-user-id` set to the matched client identifier:

```bash
curl -sS -H "X-API-Key: <alice-key>" \
https://<gateway-host>/v1/chat/completions \
-d '{"model":"<model>","messages":[{"role":"user","content":"ping"}]}'
```

A wrong or missing key returns `401 Unauthorized` from the gateway before the request reaches any backend.

## Learn More

- [Configuring Token Quotas](./token_rate_limiting)
- [Metering Token Usage](./usage_metering)

## Next Steps

After identity headers are propagated, configure [Configuring Token Quotas](./token_rate_limiting) to enforce per-tenant token budgets.
7 changes: 7 additions & 0 deletions docs/en/envoy_ai_gateway/how_to/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
weight: 30
---

# How To

<Overview />
Loading