QVAC-19254 tts-cpp: Supertonic + Chatterbox/S3Gen GPU sched for Adreno OpenCL#36
Open
pratiknarola-t wants to merge 1 commit into
Open
QVAC-19254 tts-cpp: Supertonic + Chatterbox/S3Gen GPU sched for Adreno OpenCL#36pratiknarola-t wants to merge 1 commit into
pratiknarola-t wants to merge 1 commit into
Conversation
…atterbox/S3Gen) Route Supertonic and Chatterbox/S3Gen GPU graphs through ggml_backend_sched so ops the GPU backend cannot run (CONV_TRANSPOSE_1D in the HiFT vocoder; the CPU-only GGML_OP_CUSTOM kernels in the Supertonic vector estimator/vocoder) are routed to CPU instead of asserting. Capability-gate the Chatterbox HiFT scheduler: a backend that runs every op in the graph (Metal, CUDA, CPU) computes directly on the primary backend; only a backend missing an op (Adreno OpenCL / Vulkan) uses the [GPU,CPU] scheduler. The gate queries ggml_backend_supports_op per node, so it is generic and does not regress iOS Metal (which supports CONV_TRANSPOSE_1D natively and otherwise aborts in the scheduler's graph-split). Gate Android GPU selection to Qualcomm Adreno: other Android GPU vendors are unvalidated and at least one (ARM Mali) aborts the host process uncatchably from graph compute, so non-Adreno devices fall through to CPU. parse_adreno_version handles the OpenCL device-name string (e.g. 'OpenCL 3.0 Adreno(TM) 740') by scanning every marker for the real model number. Also expose the pre-existing S3Gen mel/encoder/CFM intermediate dump via the --dump-mel-path CLI flag.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
QVAC-19254 — TTS Adreno GPU support
Enables Chatterbox + Supertonic TTS on Adreno (OpenCL / Vulkan) by routing GPU-unsupported ops to CPU via
ggml_backend_sched, with a supporting backend-tiering fix for Adreno's OpenCL device-string format.Commits
ggml_backend_sched—tts-cpp/src/supertonic_*.{cpp,h}. Routes the CPU-onlyGGML_OP_CUSTOMkernels (depthwise/pointwise Conv1D, LayerNorm, dense matmul) to CPU via sched; everything else runs on the GPU primary. Lifts the prior "GPU rejected because customs are CPU-only" guard. Verified corr ≈ 0.998 vs CPU on Adreno 740 (Vulkan) and macOS (Metal).backend_selection:parse_adreno_versionhandles the OpenCL device string —tts-cpp/src/backend_selection.cpp. The OpenCL string is"QUALCOMM Adreno(TM) (OpenCL 3.0 Adreno(TM) 740)"— parsing only the first "Adreno" marker yielded3(from "OpenCL 3.0") and mis-tiered the GPU below Vulkan. The fix scans every marker and keeps the largest ≥ 100 (3-digit model). Recovers Adreno 740.CONV_TRANSPOSE_1Dto CPU viaggml_backend_sched—tts-cpp/src/chatterbox_tts.cpp. The HiFT vocoder usesCONV_TRANSPOSE_1D, which neitherggml-openclnorggml-vulkansupports yet. The sched routes that op to CPU while keeping the rest on GPU. Includes theUSAGE_WEIGHTSmarking + per-call graph rebuild required by sched's GPU↔CPU copy machinery (mutatesnode->src[]).--dump-mel-pathCLI flag —tts-cpp/src/chatterbox_cli.cpp. Wires the CLI through to the existingopts.dump_mel_pathfield (the npy dump hooks are already on master), so a debug user can compare CPU vs GPU intermediates via--dump-mel-path /path/to/prefix.Verification
On-device smoke against the just-synced
qvac-ext-ggml/speech(ggml v0.10.2) + the matching Adreno OpenCL/Vulkan PRs (the QVAC-19253 ggml-vulkan PR + the QVAC-19254 ggml-opencl kernels PR):Hygiene
QVAC-####ticket refs + internal hypothesis-log IDs (H016/H017).model_ctx/s3gen_sched_allocblocks were compressed from 8/6 lines to 5/2 while preserving the essential SIGSEGV-prevention + threading-race rationale.dump_mel_pathfield declaration).