Skip to content

QVAC-19213 tts-cpp: Supertonic + Chatterbox/S3Gen GPU sched for Adreno OpenCL#35

Closed
pratiknarola-t wants to merge 1 commit into
masterfrom
QVAC-19213-tts-adreno-gpu
Closed

QVAC-19213 tts-cpp: Supertonic + Chatterbox/S3Gen GPU sched for Adreno OpenCL#35
pratiknarola-t wants to merge 1 commit into
masterfrom
QVAC-19213-tts-adreno-gpu

Conversation

@pratiknarola-t
Copy link
Copy Markdown

QVAC-19213 — TTS Adreno GPU support

Enables Chatterbox + Supertonic TTS on Adreno (OpenCL / Vulkan) by routing GPU-unsupported ops to CPU via ggml_backend_sched, with a supporting backend-tiering fix for Adreno's OpenCL device-string format.

Commits

  1. Supertonic GPU correctness via ggml_backend_schedtts-cpp/src/supertonic_*.{cpp,h}. Routes the CPU-only GGML_OP_CUSTOM kernels (depthwise/pointwise Conv1D, LayerNorm, dense matmul) to CPU via sched; everything else runs on the GPU primary. Lifts the prior "GPU rejected because customs are CPU-only" guard. Verified corr ≈ 0.998 vs CPU on Adreno 740 (Vulkan) and macOS (Metal).
  2. backend_selection: parse_adreno_version handles the OpenCL device stringtts-cpp/src/backend_selection.cpp. The OpenCL string is "QUALCOMM Adreno(TM) (OpenCL 3.0 Adreno(TM) 740)" — parsing only the first "Adreno" marker yielded 3 (from "OpenCL 3.0") and mis-tiered the GPU below Vulkan. The fix scans every marker and keeps the largest ≥ 100 (3-digit model). Recovers Adreno 740.
  3. Route S3Gen CONV_TRANSPOSE_1D to CPU via ggml_backend_schedtts-cpp/src/chatterbox_tts.cpp. The HiFT vocoder uses CONV_TRANSPOSE_1D, which neither ggml-opencl nor ggml-vulkan supports yet. The sched routes that op to CPU while keeping the rest on GPU. Includes the USAGE_WEIGHTS marking + per-call graph rebuild required by sched's GPU↔CPU copy machinery (mutates node->src[]).
  4. --dump-mel-path CLI flagtts-cpp/src/chatterbox_cli.cpp. Wires the CLI through to the existing opts.dump_mel_path field (the npy dump hooks are already on master), so a debug user can compare CPU vs GPU intermediates via --dump-mel-path /path/to/prefix.

Verification

On-device smoke against the just-synced qvac-ext-ggml/speech (ggml v0.10.2) + the matching Adreno OpenCL/Vulkan PRs (qvac-ext-ggml PR #14 refined + new OpenCL kernels PR):

Smoke Result
Chatterbox-OpenCL ✅ EXIT=0, 3.44 s WAV, RTF 37.6 (consistent with prior baseline)
Supertonic-OpenCL ✅ EXIT=0, 3.57 s WAV
Supertonic-Vulkan ✅ EXIT=0, 3.57 s WAV — Adreno 740 detected, Qualcomm-gated guards active, no crashes

Hygiene

  • All source comments scrubbed of QVAC-#### ticket refs + internal hypothesis-log IDs (H016/H017).
  • The verbose model_ctx / s3gen_sched_alloc blocks were compressed from 8/6 lines to 5/2 while preserving the essential SIGSEGV-prevention + threading-race rationale.
  • Diff confirms only comments changed in the cleanup (apart from the one trailing-comment edit on the dump_mel_path field declaration).

@pratiknarola-t pratiknarola-t requested review from a team as code owners May 28, 2026 07:10
@pratiknarola-t pratiknarola-t changed the title QVAC-19213 tts-cpp: Supertonic + Chatterbox/S3Gen GPU sched for Adreno OpenCL/Vulkan QVAC-19213 tts-cpp: Supertonic + Chatterbox/S3Gen GPU sched for Adreno OpenCL May 28, 2026
@pratiknarola-t pratiknarola-t force-pushed the QVAC-19213-tts-adreno-gpu branch from 7930099 to afe2fd2 Compare May 28, 2026 08:39
Comment thread tts-cpp/scripts/bench-supertonic-onnx.py Fixed
Comment thread tts-cpp/scripts/bench-supertonic-onnx.py Fixed
Comment thread tts-cpp/scripts/bench-supertonic-onnx.py Fixed
Comment thread tts-cpp/scripts/bench-supertonic-onnx.py Fixed
Comment thread tts-cpp/scripts/bench-supertonic-onnx.py Fixed
Comment thread tts-cpp/scripts/dump-supertonic-reference.py Fixed
Comment thread tts-cpp/scripts/dump-supertonic-reference.py Fixed
Comment thread tts-cpp/scripts/dump-supertonic-reference.py Fixed
Comment thread tts-cpp/scripts/dump-supertonic-reference.py Fixed
Comment thread tts-cpp/scripts/dump-supertonic-reference.py Fixed
Copy link
Copy Markdown

@github-advanced-security github-advanced-security AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

@pratiknarola-t pratiknarola-t force-pushed the QVAC-19213-tts-adreno-gpu branch from 3613786 to 6d3751b Compare May 29, 2026 06:10
…atterbox/S3Gen)

Route Supertonic and Chatterbox/S3Gen GPU graphs through ggml_backend_sched so ops the GPU backend cannot run (CONV_TRANSPOSE_1D in the HiFT vocoder; the CPU-only GGML_OP_CUSTOM kernels in the Supertonic vector estimator/vocoder) are routed to CPU instead of asserting.

Capability-gate the Chatterbox HiFT scheduler: a backend that runs every op in the graph (Metal, CUDA, CPU) computes directly on the primary backend; only a backend missing an op (Adreno OpenCL / Vulkan) uses the [GPU,CPU] scheduler. The gate queries ggml_backend_supports_op per node, so it is generic and does not regress iOS Metal (which supports CONV_TRANSPOSE_1D natively and otherwise aborts in the scheduler's graph-split).

Gate Android GPU selection to Qualcomm Adreno: other Android GPU vendors are unvalidated and at least one (ARM Mali) aborts the host process uncatchably from graph compute, so non-Adreno devices fall through to CPU. parse_adreno_version handles the OpenCL device-name string (e.g. 'OpenCL 3.0 Adreno(TM) 740') by scanning every marker for the real model number.

Also expose the pre-existing S3Gen mel/encoder/CFM intermediate dump via the --dump-mel-path CLI flag.
@pratiknarola-t
Copy link
Copy Markdown
Author

Superseded by #36 — this PR was auto-closed when its branch was renamed to the correct ticket (QVAC-19213-tts-adreno-gpuQVAC-19254-tts-adreno-gpu). Same signed commit (5205428e), identical content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants