How to build a large-scale real-time and batch processing system for OpenAI workloads? #2073
-
|
Hi, I'm trying to figure out how to design a system that can manage real-time and batch processing for OpenAI applications. What architectures, tools, and best practices have you used to nail high performance, fault tolerance, and reliability? If anyone has any help or advice you could give, I would appreciate it :) Thank you |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
Here's a practical architecture that covers both real-time and batch workloads with OpenAI: Real-time path (low latency, user-facing)
import asyncio
from openai import AsyncOpenAI
from tenacity import retry, wait_exponential, stop_after_attempt
client = AsyncOpenAI(max_retries=3, timeout=30.0)
@retry(wait=wait_exponential(min=1, max=10), stop=stop_after_attempt(3))
async def complete(messages):
return await client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
)Batch path (high throughput, background jobs) Use OpenAI's native Batch API for 50% cost savings: import json
from openai import OpenAI
client = OpenAI()
# 1. Create a JSONL file with your requests
requests = [
{"custom_id": f"req-{i}", "method": "POST", "url": "/v1/chat/completions",
"body": {"model": "gpt-4o-mini", "messages": [{"role": "user", "content": prompt}]}}
for i, prompt in enumerate(prompts)
]
with open("batch_input.jsonl", "w") as f:
for req in requests:
f.write(json.dumps(req) + "\n")
# 2. Upload and create batch
input_file = client.files.create(file=open("batch_input.jsonl", "rb"), purpose="batch")
batch = client.batches.create(input_file_id=input_file.id, endpoint="/v1/chat/completions", completion_window="24h")
# 3. Poll for completion
import time
while batch.status not in ("completed", "failed", "expired"):
time.sleep(60)
batch = client.batches.retrieve(batch.id)Fault tolerance essentials:
Queue-based architecture for mixed workloads: Route requests by urgency: anything user-facing goes through the real-time workers, everything else gets batched. |
Beta Was this translation helpful? Give feedback.
Here's a practical architecture that covers both real-time and batch workloads with OpenAI:
Real-time path (low latency, user-facing)
AsyncOpenAIwith connection pooling for concurrent requests