Stable Audio API

FastAPI server for Stability AI Stable Audio 3, using uv for Python environment and dependency management.

The Hugging Face models are gated. Before starting the server, accept the terms for each model you want to use and provide a token with access:

Setup

uv sync

export HF_TOKEN=hf_your_token_here
export STABLE_AUDIO_DEFAULT_MODEL=small-sfx
export STABLE_AUDIO_DEVICE=cpu
uv run stable-audio-api --host 0.0.0.0 --port 8000

The server automatically loads a .env file from this project directory. You can also ask uv to load it explicitly:

uv run --env-file .env stable-audio-api --host 0.0.0.0 --port 8000

Use STABLE_AUDIO_DEVICE=cuda on a CUDA machine, or leave it unset to let stable-audio-3 auto-detect cuda, mps, then cpu. The medium model requires CUDA with Flash Attention support in the upstream Stable Audio 3 package.

Generate Audio

Choose a model per request with the model property. Valid values are small-sfx, small-music, and medium.

For local development, the synchronous endpoint returns WAV bytes directly:

curl -X POST http://localhost:8000/v1/audio/generations \
  -H "Content-Type: application/json" \
  --output train.wav \
  -d '{
    "model": "small-sfx",
    "prompt": "chugging train coming into station with horn",
    "duration": 7,
    "steps": 8,
    "cfg_scale": 1.0,
    "seed": -1
  }'

The API also accepts full Hugging Face repo IDs as aliases, for example "model": "stabilityai/stable-audio-3-medium".

Generate With Jobs

For cloud deployments, use the async job endpoints. They return quickly, generate audio in the background, write the WAV to local storage or S3/R2, and expose a download URL when complete.

curl -X POST http://localhost:8000/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "model": "small-sfx",
    "prompt": "short metallic impact with room reverb",
    "duration": 5,
    "steps": 8
  }'

Response:

{
  "id": "68e48e7af36c4d829e3797a0b3e7687c",
  "status": "queued",
  "status_url": "http://localhost:8000/jobs/68e48e7af36c4d829e3797a0b3e7687c"
}

Poll status:

curl http://localhost:8000/jobs/68e48e7af36c4d829e3797a0b3e7687c

When status is succeeded, download_url points to the generated WAV. Without object storage configured, job outputs are written under outputs/ and served by the local API.

Job state is kept in memory. For multiple workers, restarts, or production serverless, use Redis/Postgres or another shared job store.

Endpoints

GET /health returns available models, preloaded models, loaded models, and duration limits.
POST /jobs starts a background generation job and returns a job ID.
GET /jobs/{id} returns job status and a download URL when complete.
POST /v1/audio/generations returns a audio/wav response.
POST /generate is an alias for the generation endpoint.

Configuration

Environment variable	Default	Description
`HF_TOKEN`	unset	Hugging Face token for gated model access.
`STABLE_AUDIO_DEFAULT_MODEL`	`small-sfx`	Default model when a request omits `model`.
`STABLE_AUDIO_MODEL`	unset	Backward-compatible alias for `STABLE_AUDIO_DEFAULT_MODEL`.
`STABLE_AUDIO_PRELOAD_MODELS`	default model	Comma-separated models to load at startup. Set empty to lazy-load only.
`STABLE_AUDIO_DEVICE`	unset	Optional `cuda`, `mps`, or `cpu`.
`STABLE_AUDIO_MODEL_HALF`	`true`	Use fp16 on CUDA. Automatically disabled by the model on CPU/MPS.
`STABLE_AUDIO_MAX_DURATION`	`380`	API-wide duration cap. Small models still cap at 120s; medium caps at 380s.
`STABLE_AUDIO_MAX_STEPS`	`50`	API sampling step limit.
`STABLE_AUDIO_OUTPUT_DIR`	`outputs`	Local output directory for job WAVs when S3/R2 is not configured.
`STABLE_AUDIO_STORAGE_BUCKET`	unset	S3/R2 bucket for job WAV output. Enables S3-compatible storage.
`STABLE_AUDIO_STORAGE_PREFIX`	`stable-audio/jobs`	Object key prefix for uploaded WAV files.
`STABLE_AUDIO_STORAGE_ENDPOINT_URL`	unset	S3-compatible endpoint URL, such as Cloudflare R2.
`STABLE_AUDIO_STORAGE_REGION`	`us-east-1`	S3 region. Use `auto` for Cloudflare R2 if desired.
`STABLE_AUDIO_STORAGE_PUBLIC_BASE_URL`	unset	Optional public/CDN base URL. If unset, the API generates presigned URLs.
`STABLE_AUDIO_PRESIGNED_URL_EXPIRES`	`3600`	Presigned download URL lifetime in seconds.

The upstream Stable Audio 3 package pins PyTorch and torchaudio. This project mirrors its CUDA 12.6 uv source configuration for Linux x86_64; macOS uses the standard PyPI wheels.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs		docs
src/stable_audio_api		src/stable_audio_api
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stable Audio API

Setup

Generate Audio

Generate With Jobs

Endpoints

Configuration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Stable Audio API

Setup

Generate Audio

Generate With Jobs

Endpoints

Configuration

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages