aiOla JavaScript SDK

The official JavaScript/TypeScript SDK for the aiOla API, work seamlessly in both Node.js and browser environments.

Requirements

Node.js: 18.0.0 or higher

Documentation

Learn more about the aiOla API and how to use the SDK in our documentation.

Installation

npm install @aiola/sdk
# or
yarn add @aiola/sdk

Usage

Authentication

The aiOla SDK uses a two-step authentication process:

Generate Access Token: Use your API key to create a temporary access token, save it for later use
Create Client: Use the access token to instantiate the client, make sure to save

Step 1: Generate Access Token

import { AiolaClient } from '@aiola/sdk';

const { accessToken, sessionId } = await AiolaClient.grantToken({
  apiKey: AIOLA_API_KEY
});

Step 2: Create Client

const client = new AiolaClient({
  accessToken: accessToken
});

Error Handling

The SDK automatically handles common scenarios like concurrency limits:

try {
  const { accessToken } = await AiolaClient.grantToken({ apiKey: AIOLA_API_KEY });
} catch (error) {
  if (error.code === 'MAX_CONCURRENCY_REACHED') {
    console.log('Concurrency limit reached. Please wait for existing sessions to expire.');
  }
}

Session Management

Close Session on Server:

// Terminates the session on the server
await AiolaClient.closeSession(accessToken, {
  apiKey: AIOLA_API_KEY
});

Custom base URL (enterprises)

const { accessToken } = await AiolaClient.grantToken({
  apiKey: AIOLA_API_KEY,
  authBaseUrl: 'https://mycompany.auth.aiola.ai',
});

const client = new AiolaClient({
  accessToken,
  baseUrl: 'https://mycompany.api.aiola.ai',
});

Speech-to-Text – transcribe file

import { AiolaClient } from '@aiola/sdk';
import fs from 'fs/promises';

async function transcribeFile() {
  try {
    // Step 1: Generate access token
    const { accessToken } = await AiolaClient.grantToken({ apiKey: AIOLA_API_KEY });
    
    // Step 2: Create client
    const client = new AiolaClient({ accessToken });
    
    // Step 3: Transcribe file
    const file = await fs.readFile('path/to/your/audio.wav');
    
    const transcript = await client.stt.transcribeFile({ 
      file: file,
      language: 'en', //supported lan: en,de,fr,es,pr,zh,ja,it
      keywords: {
        "<word_to_catch>": "<word_transcribe>",
      },
    });

    console.log(transcript);
  } catch (error) {
    console.error('Error transcribing file:', error);
  }
}

transcribeFile();

Speech-to-Text – live streaming

import { AiolaClient } from '@aiola/sdk';

const AIOLA_API_KEY = process.env.AIOLA_API_KEY || 'YOUR_API_KEY';

// Stream audio in real-time for live transcription
async function liveStreaming() {
  try {
    // Step 1: Generate access token, save it
    const { accessToken } = await AiolaClient.grantToken({ apiKey: AIOLA_API_KEY });
    
    // Step 2: Create client using the access token
    const client = new AiolaClient({ accessToken });
    
    // Step 3: Create stream connection
    const connection = await client.stt.stream({
      langCode: 'en', //supported lan: en,de,fr,es,pr,zh,ja,it
      keywords: {
        "venus": "venuss", // "<word_to_catch>": "<word_transcribe>",
      },
    });

    connection.on('transcript', (data) => {
      console.log('Transcript:', data.transcript);
    });

    connection.on('connect', async () => {
      console.log('Connected to streaming service');

      // Get an audio file (or use a microphone stream)
      const response = await fetch("https://github.com/aiola-lab/aiola-js-sdk/raw/refs/heads/main/examples/stt/assets/sample-en.wav");
      const audioData = await response.arrayBuffer();

      // Send audio data
      connection.send(Buffer.from(audioData));
    });

    connection.on('disconnect', () => {
      console.log('Disconnected from streaming service');
    });

    connection.on('error', (error) => {
      console.error('Streaming error:', error);
    });

    connection.connect();
    
  } catch (error) {
    console.error('Error setting up streaming:', error);
  }
}

liveStreaming();

Setting Schema Values During Streaming

You can dynamically set schema values during a live streaming session to update recognition parameters in real-time:

connection.on('connect', () => {
  const contactSchema = {
    "contact.name": [
      "John Doe",
      "Jane Smith",
      "Bob Johnson",
    ],
    "contact.email": [
      "john@example.com",
      "jane@example.com",
    ],
  };

  connection.setSchemaValues(contactSchema, (response) => {
    if (response.status === "ok") {
      console.log("Schema values set successfully");
    } else {
      console.error("Error setting schema values:", response);
    }
  });
});

connection.on('schema_values_updated', (data) => {
  console.log("Schema values updated on server:", data);
});

Text-to-Speech

import fs from 'fs';
import { AiolaClient, VoiceId } from '@aiola/sdk';

async function createFile() {
  try {
    const { accessToken } = await AiolaClient.grantToken({ apiKey: AIOLA_API_KEY });
    
    const client = new AiolaClient({ accessToken });
    
    const audio = await client.tts.synthesize({
      text: 'Hello, how can I help you today?',
      voice_id: VoiceId.EnglishUSFemale, // Type-safe enum
    });

    const fileStream = fs.createWriteStream('./output.wav');
    for await (const chunk of audio) {
        fileStream.write(chunk);
    }
    
    console.log('Audio file created successfully');
  } catch (error) {
    console.error('Error creating audio file:', error);
  }
}

createFile();

Text-to-Speech – streaming

import { AiolaClient, VoiceId } from '@aiola/sdk';

async function streamTts() {
  try {
    const { accessToken } = await AiolaClient.grantToken({ apiKey: AIOLA_API_KEY });
    
    const client = new AiolaClient({ accessToken });
    
    const stream = await client.tts.stream({
      text: 'Hello, how can I help you today?',
      voice_id: VoiceId.EnglishUSFemale, // Type-safe enum
    });

    const audioChunks: Buffer[] = [];
    for await (const chunk of stream) {
      audioChunks.push(chunk);
    }
    
    console.log('Audio chunks received:', audioChunks.length);
  } catch (error) {
    console.error('Error streaming TTS:', error);
  }
}

streamTts();

Browser example

a ready-made web app that demonstrates how to use the SDK directly in a browser to stream microphone audio to aiOla Speech-to-Text and receive live transcripts.

cd examples/stt/browser-mic-stream
npm install
npm run dev

Uses the AudioWorklet API to capture microphone audio, convert it to 16-bit PCM (16 kHz, mono), and send it through the WebSocket returned.

API Reference

AiolaClient

Main client for interacting with the Aiola API.

Static Methods

`AiolaClient.grantToken(options)`

Generates an access token from an API key.

Parameters:

apiKey (string, required): Your Aiola API key
baseUrl (string, optional): Custom API base URL
authBaseUrl (string, optional): Custom authentication base URL
workflowId (string, optional): Custom workflow ID

Returns: Promise<GrantTokenResponse>

accessToken (string): JWT access token for API authentication
sessionId (string): Unique session identifier

`AiolaClient.closeSession(accessToken, options)`

Closes a session on the server and frees up concurrency slots.

Parameters:

accessToken (string, required): The access token for the session
options (AuthOptions, required): Authentication configuration

Returns: Promise<SessionCloseResponse>

Constructor

`new AiolaClient(options)`

Creates a new AiolaClient instance.

Parameters:

accessToken (string, required): Valid access token from grantToken()
baseUrl (string, optional): Custom API base URL
authBaseUrl (string, optional): Custom authentication base URL
workflowId (string, optional): Custom workflow ID

Properties

`client.stt`

Gets the Speech-to-Text (STT) client.

Type: Stt

`client.tts`

Gets the Text-to-Speech (TTS) client.

Type: Tts

`client.auth`

Gets the authentication client (internal use).

Type: Auth

`client.options`

Gets the client configuration options.

Type: ClientConfig

STT Client

Speech-to-Text client for audio transcription.

Methods

`client.stt.transcribeFile(options)`

Transcribes an audio file to text.

Parameters:

file (FileSource, required): Audio file (File, Blob, Buffer, ReadStream, or file path)
language (string, optional): Language code (e.g., 'en', 'es', 'fr')
keywords (Record<string, string>, optional): Keywords map for boosting recognition
vadConfig (VadConfig, optional): Voice Activity Detection configuration

Supported audio formats: WAV, MP3, M4A, OGG, FLAC

Returns: Promise<TranscribeFileResponse>

transcript (string): Complete transcription with formatting
raw_transcript (string): Raw transcription without post-processing
segments (Segment[]): Time segments where speech was detected
metadata (TranscriptionMetadata): File metadata and transcription info

`client.stt.stream(options)`

Creates a real-time streaming connection for audio transcription.

Parameters:

langCode (string, optional): Language code (e.g., 'en', 'es', 'fr')
workflowId (string, optional): Custom workflow ID
executionId (string, optional): Execution ID for tracking (auto-generated if not provided)
timeZone (string, optional): Timezone for timestamps (default: 'UTC')
keywords (Record<string, string>, optional): Keywords map for boosting recognition
tasksConfig (TasksConfig, optional): AI tasks configuration
vadConfig (VadConfig, optional): Voice Activity Detection configuration

Returns: Promise<StreamingClient>

Audio format requirements: 16-bit PCM, 16kHz sample rate, mono channel

StreamingClient

WebSocket client for real-time audio streaming.

Methods

`stream.connect()`

Establishes the WebSocket connection. Returns this for chaining.

`stream.disconnect()`

Closes the WebSocket connection. Returns this for chaining.

`stream.send(audioData, metadata?)`

Sends audio data to the streaming service.

Parameters:

audioData (Buffer, required): Audio data (16-bit PCM, 16kHz, mono)
metadata (BinaryDataMetadata, optional): Optional metadata for the chunk

`stream.on(event, listener)`

Registers an event listener. Returns this for chaining.

Available events:

connect: Connection established
disconnect: Connection closed
transcript: Transcription results received
structured: Structured data extraction results
translation: Translation results (if enabled)
schema_values_updated: Schema values update confirmation
error: Error occurred
connect_error: Connection error

`stream.off(event, listener?)`

Removes an event listener. Returns this for chaining.

`stream.setKeywords(keywords)`

Sets or updates keywords for recognition boosting.

Parameters:

keywords (Record<string, string>, required): Map of spoken phrases to written forms

`stream.setSchemaValues(schemaValues, callback?)`

Sets schema values for structured data extraction.

Parameters:

schemaValues (SchemaValues, required): Map of field paths to expected values
callback (function, optional): Acknowledgment callback

Properties

`stream.connected`

Indicates whether the client is currently connected.

Type: boolean

`stream.id`

Gets the unique socket connection ID.

Type: string | undefined

TTS Client

Text-to-Speech client for converting text to spoken audio.

Methods

`client.tts.synthesize(options)`

Synthesizes text to speech with complete generation.

Parameters:

text (string, required): Text to convert to speech
voice_id (VoiceId | string, required): Voice identifier

Supported voice IDs (VoiceId enum):

Enum Value	String Value	Description
`VoiceId.EnglishUSFemale`	`'en_us_female'`	English (US) Female
`VoiceId.EnglishUSMale`	`'en_us_male'`	English (US) Male
`VoiceId.SpanishFemale`	`'es_female'`	Spanish Female
`VoiceId.SpanishMale`	`'es_male'`	Spanish Male
`VoiceId.FrenchFemale`	`'fr_female'`	French Female
`VoiceId.FrenchMale`	`'fr_male'`	French Male
`VoiceId.GermanFemale`	`'de_female'`	German Female
`VoiceId.GermanMale`	`'de_male'`	German Male
`VoiceId.JapaneseFemale`	`'ja_female'`	Japanese Female
`VoiceId.JapaneseMale`	`'ja_male'`	Japanese Male
`VoiceId.PortugueseFemale`	`'pt_female'`	Portuguese Female
`VoiceId.PortugueseMale`	`'pt_male'`	Portuguese Male

Returns: Promise<ReadableStream<Uint8Array>>

Example:

import { VoiceId } from '@aiola/sdk';

// Using enum (recommended - type-safe)
const audio = await client.tts.synthesize({
  text: 'Hello, world!',
  voice_id: VoiceId.EnglishUSFemale
});

// Or using string
const audio = await client.tts.synthesize({
  text: 'Hello, world!',
  voice_id: 'en_us_female'
});

`client.tts.stream(options)`

Synthesizes text to speech with streaming delivery (lower latency).

Parameters:

text (string, required): Text to convert to speech
voice_id (VoiceId | string, required): Voice identifier (see supported voices above)

Returns: Promise<ReadableStream<Uint8Array>>

Streaming Events Reference

`transcript` Event

Emitted when new transcription text is available.

Payload:

{
  transcript: string  // The transcribed text
}

`structured` Event

Emitted when structured data is extracted (requires FORM_FILLING task).

Payload:

{
  results: Record<string, unknown>  // Structured data extracted
}

`translation` Event

Emitted when translation is available (requires TRANSLATION task).

Payload:

{
  translation: string  // The translated text
}

`schema_values_updated` Event

Emitted when schema values are successfully updated.

Payload:

{
  status: 'ok' | 'error',
  message?: string
}

Connection Events

connect: Fired when WebSocket connection is established
disconnect: Fired when WebSocket connection is closed
error: Fired when an error occurs (receives Error object)
connect_error: Fired when a connection error occurs (receives Error object)

Error Handling

All errors thrown by the SDK are instances of AiolaError.

Error Properties

{
  message: string      // Human-readable error description
  reason?: string      // Detailed explanation from server
  status?: number      // HTTP status code
  code?: string        // Machine-readable error code
  details?: unknown    // Additional diagnostic information
}

Common Error Codes

Code	Description	How to Handle
`TOKEN_EXPIRED`	Access token has expired	Generate a new token using `grantToken()`
`INVALID_TOKEN`	Access token is malformed or invalid	Verify your API key and generate a new token
`MAX_CONCURRENCY_REACHED`	Too many concurrent sessions	Close existing sessions or wait for them to expire
`INVALID_AUDIO_FORMAT`	Audio format is not supported	Use WAV, MP3, M4A, OGG, or FLAC format
`RATE_LIMIT_EXCEEDED`	Too many requests in a short period	Implement exponential backoff and retry
`WORKFLOW_NOT_FOUND`	Specified workflow ID doesn't exist	Verify your workflow ID or use the default
`UNAUTHORIZED`	Invalid API key or insufficient permissions	Check your API key and account permissions
`VALIDATION_ERROR`	Request parameters are invalid	Check parameter types and required fields

Error Handling Example

try {
  const result = await client.stt.transcribeFile({ file: audioFile });
} catch (error) {
  if (error instanceof AiolaError) {
    console.error('Error code:', error.code);
    console.error('Status:', error.status);
    console.error('Message:', error.message);
    
    // Handle specific errors
    switch (error.code) {
      case 'TOKEN_EXPIRED':
        // Refresh token and retry
        const { accessToken } = await AiolaClient.grantToken({ apiKey: AIOLA_API_KEY });
        const newClient = new AiolaClient({ accessToken });
        // Retry operation with new client
        break;
        
      case 'MAX_CONCURRENCY_REACHED':
        // Wait and retry, or close other sessions
        console.log('Please close existing sessions or wait');
        break;
        
      case 'RATE_LIMIT_EXCEEDED':
        // Implement exponential backoff
        await new Promise(resolve => setTimeout(resolve, 5000));
        // Retry operation
        break;
        
      default:
        console.error('Unexpected error:', error.reason || error.message);
    }
  }
}

Audio Format Requirements

File Transcription

Supported formats: WAV, MP3, M4A, OGG, FLAC

Recommended specifications:

Sample rate: 16kHz or higher
Channels: Mono or stereo
Bit depth: 16-bit

Streaming Audio

Required specifications:

Format: 16-bit PCM (raw audio)
Sample rate: 16kHz
Channels: Mono (1 channel)
Encoding: Little-endian

Chunk size recommendations:

Minimum: 100ms of audio (~3,200 bytes)
Recommended: 100-500ms chunks
Maximum: 1000ms per chunk

Voice Activity Detection (VAD)

Configure how speech and silence are detected in audio streams.

Configuration options:

{
  threshold?: number        // Detection threshold (0.0-1.0, default: ~0.5)
  min_speech_ms?: number    // Minimum speech duration (default: ~250ms)
  min_silence_ms?: number   // Minimum silence to split segments (default: ~500ms)
  max_segment_ms?: number   // Maximum segment duration (default: ~30000ms)
}

Example:

const stream = await client.stt.stream({
  langCode: 'en',
  vadConfig: {
    threshold: 0.6,          // More conservative detection
    min_speech_ms: 300,      // Ignore very short sounds
    min_silence_ms: 700,     // Wait longer before splitting
    max_segment_ms: 15000    // Shorter maximum segments
  }
});

AI Tasks Configuration

Enable AI-powered analysis tasks during transcription.

Available tasks:

{
  TRANSLATION?: {
    src_lang_code: string,   // Source language (e.g., 'en')
    dst_lang_code: string    // Target language (e.g., 'es')
  },
  ENTITY_DETECTION?: {},
  ENTITY_DETECTION_FROM_LIST?: {
    entity_list: string[]    // List of entities to detect
  },
  KEY_PHRASES?: {},
  PII_REDACTION?: {},
  SENTIMENT_ANALYSIS?: {},
  SUMMARIZATION?: {},
  TOPIC_DETECTION?: {},
  CONTENT_MODERATION?: {},
  AUTO_CHAPTERS?: {},
  FORM_FILLING?: {}
}

Example:

const stream = await client.stt.stream({
  langCode: 'en',
  tasksConfig: {
    TRANSLATION: {
      src_lang_code: 'en',
      dst_lang_code: 'es'
    },
    SENTIMENT_ANALYSIS: {},
    PII_REDACTION: {}
  }
});

Browser vs Node.js Differences

File Handling

Node.js:

// File path
const result = await client.stt.transcribeFile({ file: './audio.wav' });

// Buffer
const buffer = await fs.readFile('./audio.wav');
const result = await client.stt.transcribeFile({ file: buffer });

// ReadStream
const stream = fs.createReadStream('./audio.wav');
const result = await client.stt.transcribeFile({ file: stream });

Browser:

// File from input
const fileInput = document.querySelector('input[type="file"]');
const result = await client.stt.transcribeFile({ file: fileInput.files[0] });

// Blob
const blob = new Blob([audioData], { type: 'audio/wav' });
const result = await client.stt.transcribeFile({ file: blob });

Audio Streaming

Node.js:

// Use Buffer
connection.send(Buffer.from(audioData));

Browser:

// Use ArrayBuffer or TypedArray
const buffer = Buffer.from(audioData);  // SDK provides Buffer polyfill
connection.send(buffer);

Troubleshooting

Connection Issues

Problem: connect_error events or connection timeouts

Solutions:

Verify your network connection
Check if firewall is blocking WebSocket connections
Ensure you're using a valid access token
Try increasing the timeout in your application

Audio Quality Issues

Problem: Poor transcription accuracy

Solutions:

Ensure audio is in the correct format (16-bit PCM, 16kHz, mono for streaming)
Check audio quality - remove background noise if possible
Use the keywords parameter to boost recognition of domain-specific terms
Adjust VAD configuration for your use case
Verify the correct language code is specified

Token Expiration

Problem: TOKEN_EXPIRED errors

Solutions:

Tokens have an expiration time - generate a new token when needed
Implement automatic token refresh in your application
The SDK validates tokens with a 5-minute buffer before expiration

Concurrency Limits

Problem: MAX_CONCURRENCY_REACHED errors

Solutions:

Close unused sessions using AiolaClient.closeSession()
Wait for existing sessions to expire
Upgrade your plan for higher concurrency limits
Implement session pooling and reuse in your application

Streaming Disconnects

Problem: Unexpected disconnections during streaming

Solutions:

The SDK automatically attempts to reconnect (up to 3 attempts)
Listen to disconnect and connect events to handle reconnections
Implement your own reconnection logic if needed
Check for network stability issues

File Upload Errors

Problem: Errors when uploading files for transcription

Solutions:

Verify the file format is supported (WAV, MP3, M4A, OGG, FLAC)
Check file size limits
Ensure proper file permissions (Node.js)
Verify the file is not corrupted

Getting Help

If you encounter issues not covered here:

Check the documentation
Review the examples directory
Contact Aiola support with:
- Error messages and codes
- SDK version
- Node.js/browser version
- Minimal code to reproduce the issue

License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
.github/workflows		.github/workflows
examples		examples
src		src
tests		tests
.gitignore		.gitignore
.releaserc.json		.releaserc.json
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
RELEASE.md		RELEASE.md
eslint.config.mjs		eslint.config.mjs
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts

Folders and files

Latest commit

History

Repository files navigation

aiOla JavaScript SDK

Requirements

Documentation

Installation

Usage

Authentication

Step 1: Generate Access Token

Step 2: Create Client

Error Handling

Session Management

Custom base URL (enterprises)

Speech-to-Text – transcribe file

Speech-to-Text – live streaming

Setting Schema Values During Streaming

Text-to-Speech

Text-to-Speech – streaming

Browser example

API Reference

AiolaClient

Static Methods

AiolaClient.grantToken(options)

AiolaClient.closeSession(accessToken, options)

Constructor

new AiolaClient(options)

Properties

client.stt

client.tts

client.auth

client.options

STT Client

Methods

client.stt.transcribeFile(options)

client.stt.stream(options)

StreamingClient

Methods

stream.connect()

stream.disconnect()

stream.send(audioData, metadata?)

stream.on(event, listener)

stream.off(event, listener?)

stream.setKeywords(keywords)

stream.setSchemaValues(schemaValues, callback?)

Properties

stream.connected

stream.id

TTS Client

Methods

client.tts.synthesize(options)

client.tts.stream(options)

Streaming Events Reference

transcript Event

structured Event

translation Event

schema_values_updated Event

Connection Events

Error Handling

Error Properties

Common Error Codes

Error Handling Example

Audio Format Requirements

File Transcription

Streaming Audio

Voice Activity Detection (VAD)

AI Tasks Configuration

Browser vs Node.js Differences

File Handling

Audio Streaming

Troubleshooting

Connection Issues

Audio Quality Issues

Token Expiration

Concurrency Limits

Streaming Disconnects

File Upload Errors

Getting Help

License

`AiolaClient.grantToken(options)`

`AiolaClient.closeSession(accessToken, options)`

`new AiolaClient(options)`

`client.stt`

`client.tts`

`client.auth`

`client.options`

`client.stt.transcribeFile(options)`

`client.stt.stream(options)`

`stream.connect()`

`stream.disconnect()`

`stream.send(audioData, metadata?)`

`stream.on(event, listener)`

`stream.off(event, listener?)`

`stream.setKeywords(keywords)`

`stream.setSchemaValues(schemaValues, callback?)`

`stream.connected`

`stream.id`

`client.tts.synthesize(options)`

`client.tts.stream(options)`

`transcript` Event

`structured` Event

`translation` Event

`schema_values_updated` Event

Packages