The official JavaScript/TypeScript SDK for the aiOla API, work seamlessly in both Node.js and browser environments.
- Node.js: 18.0.0 or higher
Learn more about the aiOla API and how to use the SDK in our documentation.
npm install @aiola/sdk
# or
yarn add @aiola/sdkThe aiOla SDK uses a two-step authentication process:
- Generate Access Token: Use your API key to create a temporary access token, save it for later use
- Create Client: Use the access token to instantiate the client, make sure to save
import { AiolaClient } from '@aiola/sdk';
const { accessToken, sessionId } = await AiolaClient.grantToken({
apiKey: AIOLA_API_KEY
});const client = new AiolaClient({
accessToken: accessToken
});The SDK automatically handles common scenarios like concurrency limits:
try {
const { accessToken } = await AiolaClient.grantToken({ apiKey: AIOLA_API_KEY });
} catch (error) {
if (error.code === 'MAX_CONCURRENCY_REACHED') {
console.log('Concurrency limit reached. Please wait for existing sessions to expire.');
}
}Close Session on Server:
// Terminates the session on the server
await AiolaClient.closeSession(accessToken, {
apiKey: AIOLA_API_KEY
});const { accessToken } = await AiolaClient.grantToken({
apiKey: AIOLA_API_KEY,
authBaseUrl: 'https://mycompany.auth.aiola.ai',
});
const client = new AiolaClient({
accessToken,
baseUrl: 'https://mycompany.api.aiola.ai',
});import { AiolaClient } from '@aiola/sdk';
import fs from 'fs/promises';
async function transcribeFile() {
try {
// Step 1: Generate access token
const { accessToken } = await AiolaClient.grantToken({ apiKey: AIOLA_API_KEY });
// Step 2: Create client
const client = new AiolaClient({ accessToken });
// Step 3: Transcribe file
const file = await fs.readFile('path/to/your/audio.wav');
const transcript = await client.stt.transcribeFile({
file: file,
language: 'en', //supported lan: en,de,fr,es,pr,zh,ja,it
keywords: {
"<word_to_catch>": "<word_transcribe>",
},
});
console.log(transcript);
} catch (error) {
console.error('Error transcribing file:', error);
}
}
transcribeFile();import { AiolaClient } from '@aiola/sdk';
const AIOLA_API_KEY = process.env.AIOLA_API_KEY || 'YOUR_API_KEY';
// Stream audio in real-time for live transcription
async function liveStreaming() {
try {
// Step 1: Generate access token, save it
const { accessToken } = await AiolaClient.grantToken({ apiKey: AIOLA_API_KEY });
// Step 2: Create client using the access token
const client = new AiolaClient({ accessToken });
// Step 3: Create stream connection
const connection = await client.stt.stream({
langCode: 'en', //supported lan: en,de,fr,es,pr,zh,ja,it
keywords: {
"venus": "venuss", // "<word_to_catch>": "<word_transcribe>",
},
});
connection.on('transcript', (data) => {
console.log('Transcript:', data.transcript);
});
connection.on('connect', async () => {
console.log('Connected to streaming service');
// Get an audio file (or use a microphone stream)
const response = await fetch("https://github.com/aiola-lab/aiola-js-sdk/raw/refs/heads/main/examples/stt/assets/sample-en.wav");
const audioData = await response.arrayBuffer();
// Send audio data
connection.send(Buffer.from(audioData));
});
connection.on('disconnect', () => {
console.log('Disconnected from streaming service');
});
connection.on('error', (error) => {
console.error('Streaming error:', error);
});
connection.connect();
} catch (error) {
console.error('Error setting up streaming:', error);
}
}
liveStreaming();You can dynamically set schema values during a live streaming session to update recognition parameters in real-time:
connection.on('connect', () => {
const contactSchema = {
"contact.name": [
"John Doe",
"Jane Smith",
"Bob Johnson",
],
"contact.email": [
"john@example.com",
"jane@example.com",
],
};
connection.setSchemaValues(contactSchema, (response) => {
if (response.status === "ok") {
console.log("Schema values set successfully");
} else {
console.error("Error setting schema values:", response);
}
});
});
connection.on('schema_values_updated', (data) => {
console.log("Schema values updated on server:", data);
});import fs from 'fs';
import { AiolaClient, VoiceId } from '@aiola/sdk';
async function createFile() {
try {
const { accessToken } = await AiolaClient.grantToken({ apiKey: AIOLA_API_KEY });
const client = new AiolaClient({ accessToken });
const audio = await client.tts.synthesize({
text: 'Hello, how can I help you today?',
voice_id: VoiceId.EnglishUSFemale, // Type-safe enum
});
const fileStream = fs.createWriteStream('./output.wav');
for await (const chunk of audio) {
fileStream.write(chunk);
}
console.log('Audio file created successfully');
} catch (error) {
console.error('Error creating audio file:', error);
}
}
createFile();import { AiolaClient, VoiceId } from '@aiola/sdk';
async function streamTts() {
try {
const { accessToken } = await AiolaClient.grantToken({ apiKey: AIOLA_API_KEY });
const client = new AiolaClient({ accessToken });
const stream = await client.tts.stream({
text: 'Hello, how can I help you today?',
voice_id: VoiceId.EnglishUSFemale, // Type-safe enum
});
const audioChunks: Buffer[] = [];
for await (const chunk of stream) {
audioChunks.push(chunk);
}
console.log('Audio chunks received:', audioChunks.length);
} catch (error) {
console.error('Error streaming TTS:', error);
}
}
streamTts();a ready-made web app that demonstrates how to use the SDK directly in a browser to stream microphone audio to aiOla Speech-to-Text and receive live transcripts.
cd examples/stt/browser-mic-stream
npm install
npm run dev- Uses the
AudioWorkletAPI to capture microphone audio, convert it to 16-bit PCM (16 kHz, mono), and send it through the WebSocket returned.
Main client for interacting with the Aiola API.
Generates an access token from an API key.
Parameters:
apiKey(string, required): Your Aiola API keybaseUrl(string, optional): Custom API base URLauthBaseUrl(string, optional): Custom authentication base URLworkflowId(string, optional): Custom workflow ID
Returns: Promise<GrantTokenResponse>
accessToken(string): JWT access token for API authenticationsessionId(string): Unique session identifier
Closes a session on the server and frees up concurrency slots.
Parameters:
accessToken(string, required): The access token for the sessionoptions(AuthOptions, required): Authentication configuration
Returns: Promise<SessionCloseResponse>
Creates a new AiolaClient instance.
Parameters:
accessToken(string, required): Valid access token fromgrantToken()baseUrl(string, optional): Custom API base URLauthBaseUrl(string, optional): Custom authentication base URLworkflowId(string, optional): Custom workflow ID
Gets the Speech-to-Text (STT) client.
Type: Stt
Gets the Text-to-Speech (TTS) client.
Type: Tts
Gets the authentication client (internal use).
Type: Auth
Gets the client configuration options.
Type: ClientConfig
Speech-to-Text client for audio transcription.
Transcribes an audio file to text.
Parameters:
file(FileSource, required): Audio file (File, Blob, Buffer, ReadStream, or file path)language(string, optional): Language code (e.g., 'en', 'es', 'fr')keywords(Record<string, string>, optional): Keywords map for boosting recognitionvadConfig(VadConfig, optional): Voice Activity Detection configuration
Supported audio formats: WAV, MP3, M4A, OGG, FLAC
Returns: Promise<TranscribeFileResponse>
transcript(string): Complete transcription with formattingraw_transcript(string): Raw transcription without post-processingsegments(Segment[]): Time segments where speech was detectedmetadata(TranscriptionMetadata): File metadata and transcription info
Creates a real-time streaming connection for audio transcription.
Parameters:
langCode(string, optional): Language code (e.g., 'en', 'es', 'fr')workflowId(string, optional): Custom workflow IDexecutionId(string, optional): Execution ID for tracking (auto-generated if not provided)timeZone(string, optional): Timezone for timestamps (default: 'UTC')keywords(Record<string, string>, optional): Keywords map for boosting recognitiontasksConfig(TasksConfig, optional): AI tasks configurationvadConfig(VadConfig, optional): Voice Activity Detection configuration
Returns: Promise<StreamingClient>
Audio format requirements: 16-bit PCM, 16kHz sample rate, mono channel
WebSocket client for real-time audio streaming.
Establishes the WebSocket connection. Returns this for chaining.
Closes the WebSocket connection. Returns this for chaining.
Sends audio data to the streaming service.
Parameters:
audioData(Buffer, required): Audio data (16-bit PCM, 16kHz, mono)metadata(BinaryDataMetadata, optional): Optional metadata for the chunk
Registers an event listener. Returns this for chaining.
Available events:
connect: Connection establisheddisconnect: Connection closedtranscript: Transcription results receivedstructured: Structured data extraction resultstranslation: Translation results (if enabled)schema_values_updated: Schema values update confirmationerror: Error occurredconnect_error: Connection error
Removes an event listener. Returns this for chaining.
Sets or updates keywords for recognition boosting.
Parameters:
keywords(Record<string, string>, required): Map of spoken phrases to written forms
Sets schema values for structured data extraction.
Parameters:
schemaValues(SchemaValues, required): Map of field paths to expected valuescallback(function, optional): Acknowledgment callback
Indicates whether the client is currently connected.
Type: boolean
Gets the unique socket connection ID.
Type: string | undefined
Text-to-Speech client for converting text to spoken audio.
Synthesizes text to speech with complete generation.
Parameters:
text(string, required): Text to convert to speechvoice_id(VoiceId | string, required): Voice identifier
Supported voice IDs (VoiceId enum):
| Enum Value | String Value | Description |
|---|---|---|
VoiceId.EnglishUSFemale |
'en_us_female' |
English (US) Female |
VoiceId.EnglishUSMale |
'en_us_male' |
English (US) Male |
VoiceId.SpanishFemale |
'es_female' |
Spanish Female |
VoiceId.SpanishMale |
'es_male' |
Spanish Male |
VoiceId.FrenchFemale |
'fr_female' |
French Female |
VoiceId.FrenchMale |
'fr_male' |
French Male |
VoiceId.GermanFemale |
'de_female' |
German Female |
VoiceId.GermanMale |
'de_male' |
German Male |
VoiceId.JapaneseFemale |
'ja_female' |
Japanese Female |
VoiceId.JapaneseMale |
'ja_male' |
Japanese Male |
VoiceId.PortugueseFemale |
'pt_female' |
Portuguese Female |
VoiceId.PortugueseMale |
'pt_male' |
Portuguese Male |
Returns: Promise<ReadableStream<Uint8Array>>
Example:
import { VoiceId } from '@aiola/sdk';
// Using enum (recommended - type-safe)
const audio = await client.tts.synthesize({
text: 'Hello, world!',
voice_id: VoiceId.EnglishUSFemale
});
// Or using string
const audio = await client.tts.synthesize({
text: 'Hello, world!',
voice_id: 'en_us_female'
});Synthesizes text to speech with streaming delivery (lower latency).
Parameters:
text(string, required): Text to convert to speechvoice_id(VoiceId | string, required): Voice identifier (see supported voices above)
Returns: Promise<ReadableStream<Uint8Array>>
Emitted when new transcription text is available.
Payload:
{
transcript: string // The transcribed text
}Emitted when structured data is extracted (requires FORM_FILLING task).
Payload:
{
results: Record<string, unknown> // Structured data extracted
}Emitted when translation is available (requires TRANSLATION task).
Payload:
{
translation: string // The translated text
}Emitted when schema values are successfully updated.
Payload:
{
status: 'ok' | 'error',
message?: string
}connect: Fired when WebSocket connection is establisheddisconnect: Fired when WebSocket connection is closederror: Fired when an error occurs (receives Error object)connect_error: Fired when a connection error occurs (receives Error object)
All errors thrown by the SDK are instances of AiolaError.
{
message: string // Human-readable error description
reason?: string // Detailed explanation from server
status?: number // HTTP status code
code?: string // Machine-readable error code
details?: unknown // Additional diagnostic information
}| Code | Description | How to Handle |
|---|---|---|
TOKEN_EXPIRED |
Access token has expired | Generate a new token using grantToken() |
INVALID_TOKEN |
Access token is malformed or invalid | Verify your API key and generate a new token |
MAX_CONCURRENCY_REACHED |
Too many concurrent sessions | Close existing sessions or wait for them to expire |
INVALID_AUDIO_FORMAT |
Audio format is not supported | Use WAV, MP3, M4A, OGG, or FLAC format |
RATE_LIMIT_EXCEEDED |
Too many requests in a short period | Implement exponential backoff and retry |
WORKFLOW_NOT_FOUND |
Specified workflow ID doesn't exist | Verify your workflow ID or use the default |
UNAUTHORIZED |
Invalid API key or insufficient permissions | Check your API key and account permissions |
VALIDATION_ERROR |
Request parameters are invalid | Check parameter types and required fields |
try {
const result = await client.stt.transcribeFile({ file: audioFile });
} catch (error) {
if (error instanceof AiolaError) {
console.error('Error code:', error.code);
console.error('Status:', error.status);
console.error('Message:', error.message);
// Handle specific errors
switch (error.code) {
case 'TOKEN_EXPIRED':
// Refresh token and retry
const { accessToken } = await AiolaClient.grantToken({ apiKey: AIOLA_API_KEY });
const newClient = new AiolaClient({ accessToken });
// Retry operation with new client
break;
case 'MAX_CONCURRENCY_REACHED':
// Wait and retry, or close other sessions
console.log('Please close existing sessions or wait');
break;
case 'RATE_LIMIT_EXCEEDED':
// Implement exponential backoff
await new Promise(resolve => setTimeout(resolve, 5000));
// Retry operation
break;
default:
console.error('Unexpected error:', error.reason || error.message);
}
}
}Supported formats: WAV, MP3, M4A, OGG, FLAC
Recommended specifications:
- Sample rate: 16kHz or higher
- Channels: Mono or stereo
- Bit depth: 16-bit
Required specifications:
- Format: 16-bit PCM (raw audio)
- Sample rate: 16kHz
- Channels: Mono (1 channel)
- Encoding: Little-endian
Chunk size recommendations:
- Minimum: 100ms of audio (~3,200 bytes)
- Recommended: 100-500ms chunks
- Maximum: 1000ms per chunk
Configure how speech and silence are detected in audio streams.
Configuration options:
{
threshold?: number // Detection threshold (0.0-1.0, default: ~0.5)
min_speech_ms?: number // Minimum speech duration (default: ~250ms)
min_silence_ms?: number // Minimum silence to split segments (default: ~500ms)
max_segment_ms?: number // Maximum segment duration (default: ~30000ms)
}Example:
const stream = await client.stt.stream({
langCode: 'en',
vadConfig: {
threshold: 0.6, // More conservative detection
min_speech_ms: 300, // Ignore very short sounds
min_silence_ms: 700, // Wait longer before splitting
max_segment_ms: 15000 // Shorter maximum segments
}
});Enable AI-powered analysis tasks during transcription.
Available tasks:
{
TRANSLATION?: {
src_lang_code: string, // Source language (e.g., 'en')
dst_lang_code: string // Target language (e.g., 'es')
},
ENTITY_DETECTION?: {},
ENTITY_DETECTION_FROM_LIST?: {
entity_list: string[] // List of entities to detect
},
KEY_PHRASES?: {},
PII_REDACTION?: {},
SENTIMENT_ANALYSIS?: {},
SUMMARIZATION?: {},
TOPIC_DETECTION?: {},
CONTENT_MODERATION?: {},
AUTO_CHAPTERS?: {},
FORM_FILLING?: {}
}Example:
const stream = await client.stt.stream({
langCode: 'en',
tasksConfig: {
TRANSLATION: {
src_lang_code: 'en',
dst_lang_code: 'es'
},
SENTIMENT_ANALYSIS: {},
PII_REDACTION: {}
}
});Node.js:
// File path
const result = await client.stt.transcribeFile({ file: './audio.wav' });
// Buffer
const buffer = await fs.readFile('./audio.wav');
const result = await client.stt.transcribeFile({ file: buffer });
// ReadStream
const stream = fs.createReadStream('./audio.wav');
const result = await client.stt.transcribeFile({ file: stream });Browser:
// File from input
const fileInput = document.querySelector('input[type="file"]');
const result = await client.stt.transcribeFile({ file: fileInput.files[0] });
// Blob
const blob = new Blob([audioData], { type: 'audio/wav' });
const result = await client.stt.transcribeFile({ file: blob });Node.js:
// Use Buffer
connection.send(Buffer.from(audioData));Browser:
// Use ArrayBuffer or TypedArray
const buffer = Buffer.from(audioData); // SDK provides Buffer polyfill
connection.send(buffer);Problem: connect_error events or connection timeouts
Solutions:
- Verify your network connection
- Check if firewall is blocking WebSocket connections
- Ensure you're using a valid access token
- Try increasing the timeout in your application
Problem: Poor transcription accuracy
Solutions:
- Ensure audio is in the correct format (16-bit PCM, 16kHz, mono for streaming)
- Check audio quality - remove background noise if possible
- Use the
keywordsparameter to boost recognition of domain-specific terms - Adjust VAD configuration for your use case
- Verify the correct language code is specified
Problem: TOKEN_EXPIRED errors
Solutions:
- Tokens have an expiration time - generate a new token when needed
- Implement automatic token refresh in your application
- The SDK validates tokens with a 5-minute buffer before expiration
Problem: MAX_CONCURRENCY_REACHED errors
Solutions:
- Close unused sessions using
AiolaClient.closeSession() - Wait for existing sessions to expire
- Upgrade your plan for higher concurrency limits
- Implement session pooling and reuse in your application
Problem: Unexpected disconnections during streaming
Solutions:
- The SDK automatically attempts to reconnect (up to 3 attempts)
- Listen to
disconnectandconnectevents to handle reconnections - Implement your own reconnection logic if needed
- Check for network stability issues
Problem: Errors when uploading files for transcription
Solutions:
- Verify the file format is supported (WAV, MP3, M4A, OGG, FLAC)
- Check file size limits
- Ensure proper file permissions (Node.js)
- Verify the file is not corrupted
If you encounter issues not covered here:
- Check the documentation
- Review the examples directory
- Contact Aiola support with:
- Error messages and codes
- SDK version
- Node.js/browser version
- Minimal code to reproduce the issue
MIT License - see LICENSE file for details.