OpenAI-compatible TTS wrapper for Fish Audio S2 Pro GGUF, built for Pandrator-style workflows.
This server keeps the same endpoint shape used by our XTTS-style wrappers while running inference through s2.dll from FishS2Sharp runtime bundles (which wrap the local s2.cpp runtime).
- Fish Audio open-source model/project:
https://github.com/fishaudio/fish-speech - Fish Audio docs:
https://docs.fish.audio s2.cpplocal C++ runtime:https://github.com/rodrigomatta/s2.cpp- FishS2Sharp C# wrapper/runtime bundles:
https://github.com/subspecs/FishS2Sharp - S2 Pro GGUF model variants used by this wrapper:
https://huggingface.co/rodrigomt/s2-pro-gguf
# Windows
run.bat
# Linux / macOS
bash run.shThe launcher starts the API at:
http://0.0.0.0:8020
- CUDA-capable NVIDIA GPU (server is CUDA-first)
- Internet access on first run (for runtime/model downloads)
On startup, run.py now auto-downloads and keeps everything local:
- FishS2Sharp runtime bundle to
runtime/fishs2sharp/ - S2 GGUF model to
models/s2-pro-q8_0.ggufby default (use--model-q4for a smallerq4_k_mdownload) - tokenizer to
models/tokenizer.json
Useful bootstrap flags:
--skip-downloads(offline mode, use existing local files)--force-downloads(refresh local artifacts)--model-q4(explicit shortcut forq4_k_mmodel download)--model-quant <quant>(choose quant:f16,q8_0,q6_k,q5_k_m,q4_k_m,q3_k,q2_k)--n-gpu-layers(set explicit transformer layer offload count;-1keeps runtime default)
You can override artifact paths and sources with env vars:
FISHS2_RUNTIME_DIRFISHS2_S2_DLL_PATHFISHS2_MODEL_PATHFISHS2_TOKENIZER_PATHFISHS2_RUNTIME_ZIP_URLFISHS2_RUNTIME_ZIP_SHA256FISHS2_MODEL_URLFISHS2_TOKENIZER_URLFISHS2_MODEL_SHA256FISHS2_TOKENIZER_SHA256FISHS2_HF_REPO_ID(default:rodrigomt/s2-pro-gguf)FISHS2_MODEL_QUANT(default:q8_0)FISHS2_SKIP_DOWNLOADS=trueFISHS2_FORCE_DOWNLOADS=trueFISHS2_N_GPU_LAYERS(default:-1)
GET /healthGET /v1/modelsPOST /v1/audio/speechGET /v1/audio/voices
Compatibility aliases and fallbacks:
GET /v1/voices(alias)POST /v1/audio/voices(voice upload)POST /v1/files(legacy upload fallback)GET /v1/files(legacy voice discovery fallback)DELETE /v1/voices/{voice_id}(optional cleanup)
GET /v1/models returns Fish S2 entries based on settings:
FISHS2_DEFAULT_MODEL(default:fishaudio/s2-pro)FISHS2_MODEL_ALIASES(default:fishs2,fish-s2,s2-pro)
All listed aliases map to one backend runtime instance.
POST /v1/audio/speech accepts OpenAI-style fields:
modelinputvoiceresponse_formatspeed(must be1.0; FishS2 backend does not support speed control)instructions
And wrapper extension fields:
fishs2object (max_new_tokens,temperature,top_p,top_k,min_tokens_before_end,n_threads,verbose)reference_audio/ aliases (prompt_audio,ref_audio) for Fish-style reference pathreference_text/ alias (ref_text) for Fish-style transcriptspeaker_wav(explicit local reference audio path list)prompt_text(reference transcript)control(prepended as(control)...)
For Fish/S2, what matters is whether a reference audio is provided, and if so, a matching transcript is required.
V1 supports wav only.
Assume server is running at http://127.0.0.1:8020.
curl http://127.0.0.1:8020/health
curl http://127.0.0.1:8020/v1/modelscurl -X POST http://127.0.0.1:8020/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "fishs2",
"input": "Hello from Fish S2 CPP FastAPI.",
"voice": "default",
"response_format": "wav"
}' \
--output outputs/basic.wavcurl -X POST http://127.0.0.1:8020/v1/audio/voices \
-F "files=@voices/sample_male/sample_male_new.wav" \
-F "voice_id=sample_male" \
-F "prompt_text=This is a sample transcript for the reference clip."List stored voices:
curl http://127.0.0.1:8020/v1/audio/voicescurl -X POST http://127.0.0.1:8020/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "fishs2",
"input": "This line uses the uploaded sample_male profile.",
"voice": "sample_male",
"response_format": "wav"
}' \
--output outputs/from_profile.wavreference_audio is a local file path on the server machine.
curl -X POST http://127.0.0.1:8020/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "fishs2",
"input": "This line clones directly from a reference file.",
"voice": "default",
"reference_audio": "voices/sample_male/sample_male_new.wav",
"reference_text": "This is a sample transcript for the reference clip.",
"response_format": "wav"
}' \
--output outputs/direct_clone.wavcurl -X POST http://127.0.0.1:8020/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "fishs2",
"input": "Custom sampling settings example.",
"voice": "sample_male",
"response_format": "wav",
"fishs2": {
"max_new_tokens": 900,
"temperature": 0.7,
"top_p": 0.85,
"top_k": 40,
"min_tokens_before_end": 0,
"n_threads": 0,
"verbose": true
}
}' \
--output outputs/custom_params.wavcurl -X DELETE http://127.0.0.1:8020/v1/voices/sample_maleUpload fields accepted for compatibility:
files(multi-part list)audio_sample(single file)file(legacy single file)
Metadata fields:
voice_idnamepurposeprompt_text
Voice data is stored under voices/<voice_id>/ with a meta.json file.
When using a stored voice for synthesis, the first uploaded audio sample is used as reference audio.
run.pyblocks startup when--backend cudais selected and no NVIDIA GPU is detected.- To bypass this check intentionally, use
--skip-gpu-check(or setFISHS2_SKIP_GPU_CHECK=true). - On Windows, the runtime must be able to load CUDA dependencies (
nvcuda.dll,cudart64_12.dll,cublas64_12.dll). - The Windows pixi environment pins
cuda-cudartandlibcublasfrom conda-forge to providecudart64_12.dllandcublas64_12.dllinside the env. FISHS2_N_GPU_LAYERScontrols transformer GPU offload (-1keeps runtime default behavior, typically full offload on GPU backends).
If you hit Failed loading s2.dll or one of its dependencies:
- Re-run bootstrap with fresh artifacts:
python run.py --force-downloads-
Verify runtime files exist under
runtime/fishs2sharp/, especially:s2.dllggml-base.dllggml.dllggml-cpu.dllFishS2Sharp.dll- and
ggml-cuda.dllwhen using CUDA backend
-
Ensure CUDA runtime libraries are available to the process (
cudart64_12.dll,cublas64_12.dll).
The bootstrapper forces local, portable cache locations inside this repo:
.hf/for Hugging Face caches (HF_HOME,HF_HUB_CACHE,TRANSFORMERS_CACHE, etc.).pip-cache/for pip.pixi-cache/for pixi.pixi-cache/rattler/for rattler package cache (RATTLER_CACHE_DIR).tmp/for temporary downloads
This prevents fallback to user-global cache folders on the host machine.
run.bat and run.sh now export these cache overrides before pixi install, so first-time environment setup stays local too.
Optional DLL search path overrides:
FISHS2_RUNTIME_EXTRA_DLL_DIRS(comma or semicolon separated)
bin\pixi run pytest