How can I simulate real-time streaming transcription using OpenAI API? #2307

Santoshchodipilli · 2025-04-15T04:24:10Z

Santoshchodipilli
Apr 15, 2025

I'm working on a project where I want to convert speech to text in real-time using OpenAI's Whisper model. I see that Whisper's hosted API (whisper-1) currently only supports batch mode — sending a full audio file and receiving the full transcript.

I'm trying to achieve a streaming-like transcription experience, where I can start receiving partial transcriptions as audio is still being recorded or uploaded.

Is there a way to simulate streaming transcription using Whisper?

I'm using Python.

I considered chunking the audio into small parts and sending them sequentially.

Is that the best approach, or is there a better method?

Also, is there any public roadmap or timeline for when the official OpenAI Whisper API might support real-time streaming transcription?

Thanks in advance!

pravakarp98 · 2025-04-25T09:56:35Z

pravakarp98
Apr 25, 2025

You're on the right path of emulating streaming transcription with Whisper — that's the best workaround available at the moment, TheOpenAI's whisper-1 API is only capable of batch processing and not streaming.

let me describe it for you -

Emulating Streaming with Whisper (Python)
first, breaking down the audio into small segments (e.g., every 2-5 seconds) and processing them in chunked form and passing them into Whisper is the most prevalent and efficient method for simulating real-time transcription. Here's what you can do:

Method: Chunked Streaming Simulation
Record audio in chunks via a microphone stream.
Store each chunk into a temporary buffer or file.
Pass the chunk to Whisper (whisper-1) API or locally execute if you have the open-source model.
Show the transcription incrementally.

Python Utilities You May Use:
pyaudio or sounddevice — for recording microphone sound in chunks.
queue.Queue — for handling chunks of audio asynchronously.
openai.Audio.transcribe("whisper-1",.) — for transcribing a chunk if you are using the OpenAI API.
Or use OpenAI whisper for local inference with faster turnaround.

If you're interested, I can help you set up a full real-time transcription.

2 replies

Santoshchodipilli Apr 25, 2025
Author

It's not about to microphone stream, I am expecting streaming response for recorded audio file which is static.
So, as you said if i chunk the audio with respect to 2-5 seconds. if any word falls at the chunking endpoint, then the chunks goes to be wrong.

pravakarp98 Apr 25, 2025

You can use an offset or overlap to map the last word of chunk A with the first word of chunk B.

manimovassagh · 2026-03-20T22:46:02Z

manimovassagh
Mar 20, 2026

Chunked audio is indeed the best approach for simulating streaming transcription with Whisper. But there are some important details to get right:

Working approach: overlapping chunks with VAD

import asyncio
import io
import numpy as np
import sounddevice as sd
from openai import AsyncOpenAI

client = AsyncOpenAI()

SAMPLE_RATE = 16000
CHUNK_DURATION = 3  # seconds per chunk
OVERLAP = 0.5  # overlap between chunks to avoid cutting words

async def transcribe_chunk(audio_bytes: bytes) -> str:
    buf = io.BytesIO(audio_bytes)
    buf.name = "chunk.wav"  # Whisper needs a filename hint

    transcript = await client.audio.transcriptions.create(
        model="whisper-1",
        file=buf,
        language="en",
    )
    return transcript.text

async def stream_transcription():
    chunk_samples = int(SAMPLE_RATE * CHUNK_DURATION)
    overlap_samples = int(SAMPLE_RATE * OVERLAP)
    buffer = np.array([], dtype=np.float32)

    def audio_callback(indata, frames, time, status):
        nonlocal buffer
        buffer = np.append(buffer, indata[:, 0])

    with sd.InputStream(samplerate=SAMPLE_RATE, channels=1, callback=audio_callback):
        while True:
            await asyncio.sleep(CHUNK_DURATION - OVERLAP)

            if len(buffer) < chunk_samples:
                continue

            # Take chunk with overlap
            chunk = buffer[:chunk_samples]
            buffer = buffer[chunk_samples - overlap_samples:]

            # Convert to WAV bytes
            import wave
            wav_buf = io.BytesIO()
            with wave.open(wav_buf, "wb") as wf:
                wf.setnchannels(1)
                wf.setsampwidth(2)
                wf.setframerate(SAMPLE_RATE)
                wf.writeframes((chunk * 32767).astype(np.int16).tobytes())

            text = await transcribe_chunk(wav_buf.getvalue())
            if text.strip():
                print(text, end=" ", flush=True)

asyncio.run(stream_transcription())

Key improvements over naive chunking:

Overlap between chunks -- prevents words from being cut in half at chunk boundaries
Voice Activity Detection (VAD) -- skip silent chunks to reduce API calls. Libraries like webrtcvad or silero-vad work well for this
Async transcription -- fire off chunk N's API call while recording chunk N+1

Alternative: OpenAI Realtime API

If you need true real-time transcription (not simulated), OpenAI's Realtime API supports audio streaming natively via WebSockets. It's a different endpoint and pricing model, but it gives you actual incremental transcription:

# Realtime API uses WebSocket, not the REST transcription endpoint
# See: https://platform.openai.com/docs/guides/realtime

There's no official roadmap for adding streaming to the Whisper REST API specifically, but the Realtime API is effectively OpenAI's answer to that use case.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I simulate real-time streaming transcription using OpenAI API? #2307

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How can I simulate real-time streaming transcription using OpenAI API? #2307

Uh oh!

Santoshchodipilli Apr 15, 2025

Replies: 2 comments · 2 replies

Uh oh!

pravakarp98 Apr 25, 2025

Uh oh!

Uh oh!

Santoshchodipilli Apr 25, 2025 Author

Uh oh!

pravakarp98 Apr 25, 2025

Uh oh!

manimovassagh Mar 20, 2026

Santoshchodipilli
Apr 15, 2025

Replies: 2 comments 2 replies

pravakarp98
Apr 25, 2025

Santoshchodipilli Apr 25, 2025
Author

manimovassagh
Mar 20, 2026