-
-
Notifications
You must be signed in to change notification settings - Fork 141
Open
Description
Feature Request
Description
Add audio, speech, and music generation adapters to the @tanstack/ai-fal package. The fal adapter currently supports image and video generation, but fal's platform also offers 600+ models including audio modalities:
- Text-to-Speech (e.g.,
fal-ai/kokoro- multi-language TTS) - Text-to-Music (e.g.,
fal-ai/diffrhythm- music generation from prompts/lyrics) - Text-to-Sound Effects (e.g., sound effect generation from descriptions)
- Speech-to-Text (e.g.,
fal-ai/whisper,fal-ai/wizper- transcription)
Motivation
TanStack AI's fal adapter (@tanstack/ai-fal) currently implements falImage and falVideo adapters following the tree-shakeable adapter pattern. Adding audio/speech/music adapters would complete fal's media generation coverage and align with TanStack AI's goal of being a comprehensive, provider-agnostic AI SDK.
fal's audio models support:
- Multi-language text-to-speech with multiple voices
- Music generation from text prompts, lyrics, and reference audio
- Sound effect generation from descriptions
- Speech transcription and translation
Proposed API
Following the existing adapter pattern:
import { falAudio, falSpeech, falMusic } from '@tanstack/ai-fal/adapters'
// Text-to-Speech
const speechAdapter = falSpeech('fal-ai/kokoro')
// Text-to-Music
const musicAdapter = falMusic('fal-ai/diffrhythm')
// Text-to-Sound Effects
const soundAdapter = falAudio('fal-ai/sound-effects')Additional Context
- Existing adapters use
fal.subscribe()(image) andfal.queue(video) patterns - Audio generation may use either pattern depending on model latency
- The fal SDK (
@fal-ai/client) already supports audio responses withFile/Audiooutput types - Model metadata types in
model-meta.tswould need to be extended for audio models
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels