Skip to content

QVAC-19254 tts-cpp: Supertonic + Chatterbox/S3Gen GPU sched for Adreno OpenCL#36

Open
pratiknarola-t wants to merge 1 commit into
masterfrom
QVAC-19254-tts-adreno-gpu
Open

QVAC-19254 tts-cpp: Supertonic + Chatterbox/S3Gen GPU sched for Adreno OpenCL#36
pratiknarola-t wants to merge 1 commit into
masterfrom
QVAC-19254-tts-adreno-gpu

Conversation

@pratiknarola-t
Copy link
Copy Markdown

QVAC-19254 — TTS Adreno GPU support

Supersedes #35 — the original PR was auto-closed when its branch was renamed to the correct ticket (QVAC-19213-tts-adreno-gpuQVAC-19254-tts-adreno-gpu). Same signed commit (5205428e), identical content.

Enables Chatterbox + Supertonic TTS on Adreno (OpenCL / Vulkan) by routing GPU-unsupported ops to CPU via ggml_backend_sched, with a supporting backend-tiering fix for Adreno's OpenCL device-string format.

Commits

  1. Supertonic GPU correctness via ggml_backend_schedtts-cpp/src/supertonic_*.{cpp,h}. Routes the CPU-only GGML_OP_CUSTOM kernels (depthwise/pointwise Conv1D, LayerNorm, dense matmul) to CPU via sched; everything else runs on the GPU primary. Lifts the prior "GPU rejected because customs are CPU-only" guard. Verified corr ≈ 0.998 vs CPU on Adreno 740 (Vulkan) and macOS (Metal).
  2. backend_selection: parse_adreno_version handles the OpenCL device stringtts-cpp/src/backend_selection.cpp. The OpenCL string is "QUALCOMM Adreno(TM) (OpenCL 3.0 Adreno(TM) 740)" — parsing only the first "Adreno" marker yielded 3 (from "OpenCL 3.0") and mis-tiered the GPU below Vulkan. The fix scans every marker and keeps the largest ≥ 100 (3-digit model). Recovers Adreno 740.
  3. Route S3Gen CONV_TRANSPOSE_1D to CPU via ggml_backend_schedtts-cpp/src/chatterbox_tts.cpp. The HiFT vocoder uses CONV_TRANSPOSE_1D, which neither ggml-opencl nor ggml-vulkan supports yet. The sched routes that op to CPU while keeping the rest on GPU. Includes the USAGE_WEIGHTS marking + per-call graph rebuild required by sched's GPU↔CPU copy machinery (mutates node->src[]).
  4. --dump-mel-path CLI flagtts-cpp/src/chatterbox_cli.cpp. Wires the CLI through to the existing opts.dump_mel_path field (the npy dump hooks are already on master), so a debug user can compare CPU vs GPU intermediates via --dump-mel-path /path/to/prefix.

Verification

On-device smoke against the just-synced qvac-ext-ggml/speech (ggml v0.10.2) + the matching Adreno OpenCL/Vulkan PRs (the QVAC-19253 ggml-vulkan PR + the QVAC-19254 ggml-opencl kernels PR):

Smoke Result
Chatterbox-OpenCL ✅ EXIT=0, 3.44 s WAV, RTF 37.6 (consistent with prior baseline)
Supertonic-OpenCL ✅ EXIT=0, 3.57 s WAV
Supertonic-Vulkan ✅ EXIT=0, 3.57 s WAV — Adreno 740 detected, Qualcomm-gated guards active, no crashes

Hygiene

  • All source comments scrubbed of QVAC-#### ticket refs + internal hypothesis-log IDs (H016/H017).
  • The verbose model_ctx / s3gen_sched_alloc blocks were compressed from 8/6 lines to 5/2 while preserving the essential SIGSEGV-prevention + threading-race rationale.
  • Diff confirms only comments changed in the cleanup (apart from the one trailing-comment edit on the dump_mel_path field declaration).

…atterbox/S3Gen)

Route Supertonic and Chatterbox/S3Gen GPU graphs through ggml_backend_sched so ops the GPU backend cannot run (CONV_TRANSPOSE_1D in the HiFT vocoder; the CPU-only GGML_OP_CUSTOM kernels in the Supertonic vector estimator/vocoder) are routed to CPU instead of asserting.

Capability-gate the Chatterbox HiFT scheduler: a backend that runs every op in the graph (Metal, CUDA, CPU) computes directly on the primary backend; only a backend missing an op (Adreno OpenCL / Vulkan) uses the [GPU,CPU] scheduler. The gate queries ggml_backend_supports_op per node, so it is generic and does not regress iOS Metal (which supports CONV_TRANSPOSE_1D natively and otherwise aborts in the scheduler's graph-split).

Gate Android GPU selection to Qualcomm Adreno: other Android GPU vendors are unvalidated and at least one (ARM Mali) aborts the host process uncatchably from graph compute, so non-Adreno devices fall through to CPU. parse_adreno_version handles the OpenCL device-name string (e.g. 'OpenCL 3.0 Adreno(TM) 740') by scanning every marker for the real model number.

Also expose the pre-existing S3Gen mel/encoder/CFM intermediate dump via the --dump-mel-path CLI flag.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant