QVAC-19213 tts-cpp: Supertonic + Chatterbox/S3Gen GPU sched for Adreno OpenCL#35
Closed
pratiknarola-t wants to merge 1 commit into
Closed
QVAC-19213 tts-cpp: Supertonic + Chatterbox/S3Gen GPU sched for Adreno OpenCL#35pratiknarola-t wants to merge 1 commit into
pratiknarola-t wants to merge 1 commit into
Conversation
7930099 to
afe2fd2
Compare
There was a problem hiding this comment.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
3613786 to
6d3751b
Compare
…atterbox/S3Gen) Route Supertonic and Chatterbox/S3Gen GPU graphs through ggml_backend_sched so ops the GPU backend cannot run (CONV_TRANSPOSE_1D in the HiFT vocoder; the CPU-only GGML_OP_CUSTOM kernels in the Supertonic vector estimator/vocoder) are routed to CPU instead of asserting. Capability-gate the Chatterbox HiFT scheduler: a backend that runs every op in the graph (Metal, CUDA, CPU) computes directly on the primary backend; only a backend missing an op (Adreno OpenCL / Vulkan) uses the [GPU,CPU] scheduler. The gate queries ggml_backend_supports_op per node, so it is generic and does not regress iOS Metal (which supports CONV_TRANSPOSE_1D natively and otherwise aborts in the scheduler's graph-split). Gate Android GPU selection to Qualcomm Adreno: other Android GPU vendors are unvalidated and at least one (ARM Mali) aborts the host process uncatchably from graph compute, so non-Adreno devices fall through to CPU. parse_adreno_version handles the OpenCL device-name string (e.g. 'OpenCL 3.0 Adreno(TM) 740') by scanning every marker for the real model number. Also expose the pre-existing S3Gen mel/encoder/CFM intermediate dump via the --dump-mel-path CLI flag.
0a85f1d to
5205428
Compare
Author
|
Superseded by #36 — this PR was auto-closed when its branch was renamed to the correct ticket ( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
QVAC-19213 — TTS Adreno GPU support
Enables Chatterbox + Supertonic TTS on Adreno (OpenCL / Vulkan) by routing GPU-unsupported ops to CPU via
ggml_backend_sched, with a supporting backend-tiering fix for Adreno's OpenCL device-string format.Commits
ggml_backend_sched—tts-cpp/src/supertonic_*.{cpp,h}. Routes the CPU-onlyGGML_OP_CUSTOMkernels (depthwise/pointwise Conv1D, LayerNorm, dense matmul) to CPU via sched; everything else runs on the GPU primary. Lifts the prior "GPU rejected because customs are CPU-only" guard. Verified corr ≈ 0.998 vs CPU on Adreno 740 (Vulkan) and macOS (Metal).backend_selection:parse_adreno_versionhandles the OpenCL device string —tts-cpp/src/backend_selection.cpp. The OpenCL string is"QUALCOMM Adreno(TM) (OpenCL 3.0 Adreno(TM) 740)"— parsing only the first "Adreno" marker yielded3(from "OpenCL 3.0") and mis-tiered the GPU below Vulkan. The fix scans every marker and keeps the largest ≥ 100 (3-digit model). Recovers Adreno 740.CONV_TRANSPOSE_1Dto CPU viaggml_backend_sched—tts-cpp/src/chatterbox_tts.cpp. The HiFT vocoder usesCONV_TRANSPOSE_1D, which neitherggml-openclnorggml-vulkansupports yet. The sched routes that op to CPU while keeping the rest on GPU. Includes theUSAGE_WEIGHTSmarking + per-call graph rebuild required by sched's GPU↔CPU copy machinery (mutatesnode->src[]).--dump-mel-pathCLI flag —tts-cpp/src/chatterbox_cli.cpp. Wires the CLI through to the existingopts.dump_mel_pathfield (the npy dump hooks are already on master), so a debug user can compare CPU vs GPU intermediates via--dump-mel-path /path/to/prefix.Verification
On-device smoke against the just-synced
qvac-ext-ggml/speech(ggml v0.10.2) + the matching Adreno OpenCL/Vulkan PRs (qvac-ext-ggml PR #14 refined + new OpenCL kernels PR):Hygiene
QVAC-####ticket refs + internal hypothesis-log IDs (H016/H017).model_ctx/s3gen_sched_allocblocks were compressed from 8/6 lines to 5/2 while preserving the essential SIGSEGV-prevention + threading-race rationale.dump_mel_pathfield declaration).