Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
162 changes: 7 additions & 155 deletions src/content/docs/ai-gateway/configuration/request-handling.mdx
Original file line number Diff line number Diff line change
@@ -1,103 +1,36 @@
---
pcx_content_type: configuration
title: Request handling
description: Configure AI Gateway request timeouts, retries, and fallback strategies for reliable AI provider interactions.
description: Configure AI Gateway request timeouts and retries for reliable AI provider interactions.
sidebar:
order: 4
products:
- ai-gateway
---

import { Render, Aside } from "~/components";
import { Render } from "~/components";

:::note

[Dynamic Routing](/ai-gateway/features/dynamic-routing/) also offers timeouts and retries per model, along with conditional routing, rate limiting, and budget limiting through a visual interface. This page documents request-handling configuration available through Universal Endpoint provider `config` settings as well as per-request `cf-aig-*` headers that work with any provider endpoint. You can also configure retries at the [gateway level](/ai-gateway/configuration/manage-gateway/#retry-requests).
[Dynamic Routing](/ai-gateway/features/dynamic-routing/) also offers timeouts and retries per model, along with conditional routing, rate limiting, and budget limiting through a visual interface. This page documents request-handling configuration available through per-request `cf-aig-*` headers that work with any provider endpoint. You can also configure retries at the [gateway level](/ai-gateway/configuration/manage-gateway/#retry-requests).

:::

Your AI gateway supports different strategies for handling requests to providers, which allows you to manage AI interactions effectively and ensure your applications remain responsive and reliable.

## Request timeouts

A request timeout allows you to trigger fallbacks or a retry if a provider takes too long to respond.
A request timeout allows you to return an error or trigger a retry if a provider takes too long to respond.

These timeouts help:

- Improve user experience, by preventing users from waiting too long for a response
- Proactively handle errors, by detecting unresponsive providers and triggering a fallback option
- Proactively handle errors, by detecting unresponsive providers

Request timeouts can be set on a Universal Endpoint or directly on a request to any provider.

### Definitions

A timeout is set in milliseconds. Additionally, the timeout is based on when the first part of the response comes back. As long as the first part of the response returns within the specified timeframe - such as when streaming a response - your gateway will wait for the response.
A timeout is set in milliseconds. The timeout is based on when the first part of the response comes back. As long as the first part of the response returns within the specified timeframe — such as when streaming a response — your gateway will wait for the response.

### Configuration

#### Universal Endpoint

If set on a [Universal Endpoint](/ai-gateway/usage/universal/), a request timeout specifies the timeout duration for requests and triggers a fallback.

For a Universal Endpoint, configure the timeout value by setting a `requestTimeout` property within the provider-specific `config` object. Each provider can have a different `requestTimeout` value for granular customization.

```bash title="Provider-level config" {11-13} collapse={15-48}
curl 'https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}' \
--header 'Content-Type: application/json' \
--data '[
{
"provider": "workers-ai",
"endpoint": "@cf/meta/llama-3.1-8b-instruct",
"headers": {
"Authorization": "Bearer {cloudflare_token}",
"Content-Type": "application/json"
},
"config": {
"requestTimeout": 1000
},
"query": {
"messages": [
{
"role": "system",
"content": "You are a friendly assistant"
},
{
"role": "user",
"content": "What is Cloudflare?"
}
]
}
},
{
"provider": "workers-ai",
"endpoint": "@cf/meta/llama-3.1-8b-instruct-fast",
"headers": {
"Authorization": "Bearer {cloudflare_token}",
"Content-Type": "application/json"
},
"query": {
"messages": [
{
"role": "system",
"content": "You are a friendly assistant"
},
{
"role": "user",
"content": "What is Cloudflare?"
}
]
},
"config": {
"requestTimeout": 3000
},
}
]'
```

#### Direct provider

If set on a [provider](/ai-gateway/usage/providers/) request, request timeout specifies the timeout duration for a request and - if exceeded - returns an error.

For a provider-specific endpoint, configure the timeout value by adding a `cf-aig-request-timeout` header.

```bash title="Provider-specific endpoint example" {4}
Expand All @@ -112,14 +45,10 @@ curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/@

## Request retries

AI Gateway also supports automatic retries for failed requests, with a maximum of five retry attempts.
AI Gateway supports automatic retries for failed requests, with a maximum of five retry attempts.

This feature improves your application's resiliency, ensuring you can recover from temporary issues without manual intervention.

Request timeouts can be set on a Universal Endpoint or directly on a request to any provider.

### Definitions

With request retries, you can adjust a combination of three properties:

- Number of attempts (maximum of 5 tries)
Expand All @@ -130,83 +59,6 @@ On the final retry attempt, your gateway will wait until the request completes,

### Configuration

#### Universal endpoint

If set on a [Universal Endpoint](/ai-gateway/usage/universal/), a request retry will automatically retry failed requests up to five times before triggering any configured fallbacks.

For a Universal Endpoint, configure the retry settings with the following properties in the provider-specific `config`:

```json
config:{
maxAttempts?: number;
retryDelay?: number;
backoff?: "constant" | "linear" | "exponential";
}
```

As with the [request timeout](/ai-gateway/configuration/request-handling/#universal-endpoint), each provider can have a different retry settings for granular customization.

```bash title="Provider-level config" {11-15} collapse={16-55}
curl 'https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}' \
--header 'Content-Type: application/json' \
--data '[
{
"provider": "workers-ai",
"endpoint": "@cf/meta/llama-3.1-8b-instruct",
"headers": {
"Authorization": "Bearer {cloudflare_token}",
"Content-Type": "application/json"
},
"config": {
"maxAttempts": 2,
"retryDelay": 1000,
"backoff": "constant"
},
"query": {
"messages": [
{
"role": "system",
"content": "You are a friendly assistant"
},
{
"role": "user",
"content": "What is Cloudflare?"
}
]
}
},
{
"provider": "workers-ai",
"endpoint": "@cf/meta/llama-3.1-8b-instruct-fast",
"headers": {
"Authorization": "Bearer {cloudflare_token}",
"Content-Type": "application/json"
},
"query": {
"messages": [
{
"role": "system",
"content": "You are a friendly assistant"
},
{
"role": "user",
"content": "What is Cloudflare?"
}
]
},
"config": {
"maxAttempts": 4,
"retryDelay": 1000,
"backoff": "exponential"
},
}
]'
```

#### Direct provider

If set on a [provider](/ai-gateway/usage/universal/) request, a request retry will automatically retry failed requests up to five times. On the final retry attempt, your gateway will wait until the request completes, regardless of how long it takes.

For a provider-specific endpoint, configure the retry settings by adding different header values:

- `cf-aig-max-attempts` (number)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Please keep in mind that datasets currently use `AND` joins, so there can only b
| Provider | specific providers | the selected AI provider. |
| AI Models | specific models | the selected AI model. |
| Cost | less than, greater than | cost, specifying a threshold. |
| Request type | Universal, Workers AI Binding, WebSockets | the type of request. |
| Request type | Workers AI Binding, WebSockets | the type of request. |
| Tokens | Total tokens, Tokens In, Tokens Out | token count (less than or greater than). |
| Duration | less than, greater than | request duration. |
| Feedback | equals, does not equal (thumbs up, thumbs down, no feedback) | feedback type. |
Expand Down
14 changes: 6 additions & 8 deletions src/content/docs/ai-gateway/glossary.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,11 @@ AI Gateway supports a variety of headers to help you configure, customize, and m

## Configuration hierarchy

Settings in AI Gateway can be configured at three levels: **Provider**, **Request**, and **Gateway**. Since the same settings can be configured in multiple locations, the following hierarchy determines which value is applied:
Settings in AI Gateway can be configured at two levels: **Request** and **Gateway**. Since the same settings can be configured in multiple locations, the following hierarchy determines which value is applied:

1. **Provider-level headers**:
Relevant only when using the [Universal Endpoint](/ai-gateway/usage/universal/), these headers take precedence over all other configurations.
2. **Request-level headers**:
Apply if no provider-level headers are set.
3. **Gateway-level settings**:
Act as the default if no headers are set at the provider or request levels.
1. **Request-level headers**:
Headers included in individual requests take precedence over gateway-level settings.
2. **Gateway-level settings**:
Act as the default if no headers are set at the request level.

This hierarchy ensures consistent behavior, prioritizing the most specific configurations. Use provider-level and request-level headers for more fine-tuned control, and gateway settings for general defaults.
This hierarchy ensures consistent behavior, prioritizing the most specific configurations. Use request-level headers for fine-tuned control, and gateway settings for general defaults.
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ tags:
description: >-
Reference for the AI binding with AI Gateway. Call Workers AI and
third-party models with env.AI.run(), access log IDs, and use gateway methods
for feedback, logging, URLs, and universal requests.
for feedback, logging, and URLs.
products:
- ai-gateway
---
Expand Down Expand Up @@ -198,21 +198,4 @@ const anthropic = createAnthropic({
});
```

### `run()`

Executes a [universal request](/ai-gateway/usage/universal/) to any supported provider. Accepts a single request object or an array.

```typescript
const resp = await gateway.run({
provider: "workers-ai",
endpoint: "@cf/meta/llama-3.1-8b-instruct",
headers: {
authorization: "Bearer my-api-token",
},
query: {
prompt: "tell me a joke",
},
});
```

**Returns:** `Promise<Response>`
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ See full list of available filters and their descriptions below:
| Provider | specific providers | the selected AI provider. |
| AI Models | specific models | the selected AI model. |
| Cost | less than, greater than | cost, specifying a threshold. |
| Request type | Universal, Workers AI Binding, WebSockets | the type of request. |
| Request type | Workers AI Binding, WebSockets | the type of request. |
| Tokens | Total tokens, Tokens In, Tokens Out | token count (less than or greater than). |
| Duration | less than, greater than | request duration. |
| Feedback | equals, does not equal (thumbs up, thumbs down, no feedback) | feedback type. |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ All of the tutorials assume you have already completed the [Get started guide](/

## 1. Create an AI Gateway and OpenAI API key

On the AI Gateway page in the Cloudflare dashboard, create a new AI Gateway by clicking the plus button on the top right. You should be able to name the gateway as well as the endpoint. Click on the API Endpoints button to copy the endpoint. You can choose from provider-specific endpoints such as OpenAI, HuggingFace, and Replicate. Or you can use the universal endpoint that accepts a specific schema and supports model fallback and retries.
On the AI Gateway page in the Cloudflare dashboard, create a new AI Gateway by clicking the plus button on the top right. You should be able to name the gateway as well as the endpoint. Click on the API Endpoints button to copy the endpoint. You can choose from provider-specific endpoints such as OpenAI, HuggingFace, and Replicate.

For this tutorial, we will be using the OpenAI provider-specific endpoint, so select OpenAI in the dropdown and copy the new endpoint.

Expand Down
Loading