cloudflare · ethulia · May 8, 2026 · May 7, 2026 · May 7, 2026 · May 8, 2026
@@ -1,103 +1,36 @@
 ---
 pcx_content_type: configuration
 title: Request handling
-description: Configure AI Gateway request timeouts, retries, and fallback strategies for reliable AI provider interactions.
+description: Configure AI Gateway request timeouts and retries for reliable AI provider interactions.
 sidebar:
   order: 4
 products:
   - ai-gateway
 ---
 
-import { Render, Aside } from "~/components";
+import { Render } from "~/components";
 
 :::note
 
-[Dynamic Routing](/ai-gateway/features/dynamic-routing/) also offers timeouts and retries per model, along with conditional routing, rate limiting, and budget limiting through a visual interface. This page documents request-handling configuration available through Universal Endpoint provider `config` settings as well as per-request `cf-aig-*` headers that work with any provider endpoint. You can also configure retries at the [gateway level](/ai-gateway/configuration/manage-gateway/#retry-requests).
+[Dynamic Routing](/ai-gateway/features/dynamic-routing/) also offers timeouts and retries per model, along with conditional routing, rate limiting, and budget limiting through a visual interface. This page documents request-handling configuration available through per-request `cf-aig-*` headers that work with any provider endpoint. You can also configure retries at the [gateway level](/ai-gateway/configuration/manage-gateway/#retry-requests).
 
 :::
 
 Your AI gateway supports different strategies for handling requests to providers, which allows you to manage AI interactions effectively and ensure your applications remain responsive and reliable.
 
 ## Request timeouts
 
-A request timeout allows you to trigger fallbacks or a retry if a provider takes too long to respond.
+A request timeout allows you to return an error or trigger a retry if a provider takes too long to respond.
 
 These timeouts help:
 
 - Improve user experience, by preventing users from waiting too long for a response
-- Proactively handle errors, by detecting unresponsive providers and triggering a fallback option
+- Proactively handle errors, by detecting unresponsive providers
 
-Request timeouts can be set on a Universal Endpoint or directly on a request to any provider.
-
-### Definitions
-
-A timeout is set in milliseconds. Additionally, the timeout is based on when the first part of the response comes back. As long as the first part of the response returns within the specified timeframe - such as when streaming a response - your gateway will wait for the response.
+A timeout is set in milliseconds. The timeout is based on when the first part of the response comes back. As long as the first part of the response returns within the specified timeframe — such as when streaming a response — your gateway will wait for the response.
 
 ### Configuration
 
-#### Universal Endpoint
-
-If set on a [Universal Endpoint](/ai-gateway/usage/universal/), a request timeout specifies the timeout duration for requests and triggers a fallback.
-
-For a Universal Endpoint, configure the timeout value by setting a `requestTimeout` property within the provider-specific `config` object. Each provider can have a different `requestTimeout` value for granular customization.
-
-```bash title="Provider-level config" {11-13} collapse={15-48}
-curl 'https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}' \
-	--header 'Content-Type: application/json' \
-	--data '[
-    {
-        "provider": "workers-ai",
-        "endpoint": "@cf/meta/llama-3.1-8b-instruct",
-        "headers": {
-            "Authorization": "Bearer {cloudflare_token}",
-            "Content-Type": "application/json"
-        },
-        "config": {
-            "requestTimeout": 1000
-        },
-        "query": {
-            "messages": [
-                {
-                    "role": "system",
-                    "content": "You are a friendly assistant"
-                },
-                {
-                    "role": "user",
-                    "content": "What is Cloudflare?"
-                }
-            ]
-        }
-    },
-    {
-        "provider": "workers-ai",
-        "endpoint": "@cf/meta/llama-3.1-8b-instruct-fast",
-        "headers": {
-            "Authorization": "Bearer {cloudflare_token}",
-            "Content-Type": "application/json"
-        },
-        "query": {
-            "messages": [
-                {
-                    "role": "system",
-                    "content": "You are a friendly assistant"
-                },
-                {
-                    "role": "user",
-                    "content": "What is Cloudflare?"
-                }
-            ]
-        },
-				"config": {
-            "requestTimeout": 3000
-        },
-    }
-]'
-```
-
-#### Direct provider
-
-If set on a [provider](/ai-gateway/usage/providers/) request, request timeout specifies the timeout duration for a request and - if exceeded - returns an error.
-
 For a provider-specific endpoint, configure the timeout value by adding a `cf-aig-request-timeout` header.
 
 ```bash title="Provider-specific endpoint example" {4}
@@ -112,14 +45,10 @@ curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/@
 
 ## Request retries
 
-AI Gateway also supports automatic retries for failed requests, with a maximum of five retry attempts.
+AI Gateway supports automatic retries for failed requests, with a maximum of five retry attempts.
 
 This feature improves your application's resiliency, ensuring you can recover from temporary issues without manual intervention.
 
-Request timeouts can be set on a Universal Endpoint or directly on a request to any provider.
-
-### Definitions
-
 With request retries, you can adjust a combination of three properties:
 
 - Number of attempts (maximum of 5 tries)
@@ -130,83 +59,6 @@ On the final retry attempt, your gateway will wait until the request completes,
 
 ### Configuration
 
-#### Universal endpoint
-
-If set on a [Universal Endpoint](/ai-gateway/usage/universal/), a request retry will automatically retry failed requests up to five times before triggering any configured fallbacks.
-
-For a Universal Endpoint, configure the retry settings with the following properties in the provider-specific `config`:
-
-```json
-config:{
-	maxAttempts?: number;
-	retryDelay?: number;
-	backoff?: "constant" | "linear" | "exponential";
-}
-```
-
-As with the [request timeout](/ai-gateway/configuration/request-handling/#universal-endpoint), each provider can have a different retry settings for granular customization.
-
-```bash title="Provider-level config" {11-15} collapse={16-55}
-curl 'https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}' \
-	--header 'Content-Type: application/json' \
-	--data '[
-    {
-        "provider": "workers-ai",
-        "endpoint": "@cf/meta/llama-3.1-8b-instruct",
-        "headers": {
-            "Authorization": "Bearer {cloudflare_token}",
-            "Content-Type": "application/json"
-        },
-        "config": {
-            "maxAttempts": 2,
-						"retryDelay": 1000,
-						"backoff": "constant"
-        },
-        "query": {
-            "messages": [
-                {
-                    "role": "system",
-                    "content": "You are a friendly assistant"
-                },
-                {
-                    "role": "user",
-                    "content": "What is Cloudflare?"
-                }
-            ]
-        }
-    },
-    {
-        "provider": "workers-ai",
-        "endpoint": "@cf/meta/llama-3.1-8b-instruct-fast",
-        "headers": {
-            "Authorization": "Bearer {cloudflare_token}",
-            "Content-Type": "application/json"
-        },
-        "query": {
-            "messages": [
-                {
-                    "role": "system",
-                    "content": "You are a friendly assistant"
-                },
-                {
-                    "role": "user",
-                    "content": "What is Cloudflare?"
-                }
-            ]
-        },
-				"config": {
-            "maxAttempts": 4,
-						"retryDelay": 1000,
-						"backoff": "exponential"
-        },
-    }
-]'
-```
-
-#### Direct provider
-
-If set on a [provider](/ai-gateway/usage/universal/) request, a request retry will automatically retry failed requests up to five times. On the final retry attempt, your gateway will wait until the request completes, regardless of how long it takes.
-
 For a provider-specific endpoint, configure the retry settings by adding different header values:
 
 - `cf-aig-max-attempts` (number)

@@ -37,7 +37,7 @@ Please keep in mind that datasets currently use `AND` joins, so there can only b
 | Provider        | specific providers                                           | the selected AI provider.                 |
 | AI Models       | specific models                                              | the selected AI model.                    |
 | Cost            | less than, greater than                                      | cost, specifying a threshold.             |
-| Request type    | Universal, Workers AI Binding, WebSockets                    | the type of request.                      |
+| Request type    | Workers AI Binding, WebSockets                               | the type of request.                      |
 | Tokens          | Total tokens, Tokens In, Tokens Out                          | token count (less than or greater than).  |
 | Duration        | less than, greater than                                      | request duration.                         |
 | Feedback        | equals, does not equal (thumbs up, thumbs down, no feedback) | feedback type.                            |

@@ -16,13 +16,11 @@ AI Gateway supports a variety of headers to help you configure, customize, and m
 
 ## Configuration hierarchy
 
-Settings in AI Gateway can be configured at three levels: **Provider**, **Request**, and **Gateway**. Since the same settings can be configured in multiple locations, the following hierarchy determines which value is applied:
+Settings in AI Gateway can be configured at two levels: **Request** and **Gateway**. Since the same settings can be configured in multiple locations, the following hierarchy determines which value is applied:
 
-1. **Provider-level headers**:
-   Relevant only when using the [Universal Endpoint](/ai-gateway/usage/universal/), these headers take precedence over all other configurations.
-2. **Request-level headers**:
-   Apply if no provider-level headers are set.
-3. **Gateway-level settings**:
-   Act as the default if no headers are set at the provider or request levels.
+1. **Request-level headers**:
+   Headers included in individual requests take precedence over gateway-level settings.
+2. **Gateway-level settings**:
+   Act as the default if no headers are set at the request level.
 
-This hierarchy ensures consistent behavior, prioritizing the most specific configurations. Use provider-level and request-level headers for more fine-tuned control, and gateway settings for general defaults.
+This hierarchy ensures consistent behavior, prioritizing the most specific configurations. Use request-level headers for fine-tuned control, and gateway settings for general defaults.
@@ -9,7 +9,7 @@ tags:
 description: >-
   Reference for the AI binding with AI Gateway. Call Workers AI and
   third-party models with env.AI.run(), access log IDs, and use gateway methods
-  for feedback, logging, URLs, and universal requests.
+  for feedback, logging, and URLs.
 products:
   - ai-gateway
 ---
@@ -198,21 +198,4 @@ const anthropic = createAnthropic({
 });
 ```
 
-### `run()`
 
-Executes a [universal request](/ai-gateway/usage/universal/) to any supported provider. Accepts a single request object or an array.
-
-```typescript
-const resp = await gateway.run({
-	provider: "workers-ai",
-	endpoint: "@cf/meta/llama-3.1-8b-instruct",
-	headers: {
-		authorization: "Bearer my-api-token",
-	},
-	query: {
-		prompt: "tell me a joke",
-	},
-});
-```
-
-**Returns:** `Promise<Response>`
@@ -120,7 +120,7 @@ See full list of available filters and their descriptions below:
 | Provider        | specific providers                                           | the selected AI provider.                 |
 | AI Models       | specific models                                              | the selected AI model.                    |
 | Cost            | less than, greater than                                      | cost, specifying a threshold.             |
-| Request type    | Universal, Workers AI Binding, WebSockets                    | the type of request.                      |
+| Request type    | Workers AI Binding, WebSockets                               | the type of request.                      |
 | Tokens          | Total tokens, Tokens In, Tokens Out                          | token count (less than or greater than).  |
 | Duration        | less than, greater than                                      | request duration.                         |
 | Feedback        | equals, does not equal (thumbs up, thumbs down, no feedback) | feedback type.                            |

@@ -23,7 +23,7 @@ All of the tutorials assume you have already completed the [Get started guide](/
 
 ## 1. Create an AI Gateway and OpenAI API key
 
-On the AI Gateway page in the Cloudflare dashboard, create a new AI Gateway by clicking the plus button on the top right. You should be able to name the gateway as well as the endpoint. Click on the API Endpoints button to copy the endpoint. You can choose from provider-specific endpoints such as OpenAI, HuggingFace, and Replicate. Or you can use the universal endpoint that accepts a specific schema and supports model fallback and retries.
+On the AI Gateway page in the Cloudflare dashboard, create a new AI Gateway by clicking the plus button on the top right. You should be able to name the gateway as well as the endpoint. Click on the API Endpoints button to copy the endpoint. You can choose from provider-specific endpoints such as OpenAI, HuggingFace, and Replicate.
 
 For this tutorial, we will be using the OpenAI provider-specific endpoint, so select OpenAI in the dropdown and copy the new endpoint.