Skip to content

ggml-webgpu: Support GPU profiling beyond the maximum query count#22995

Open
yomaytk wants to merge 1 commit into
ggml-org:masterfrom
yomaytk:new-flush-gpu-profile
Open

ggml-webgpu: Support GPU profiling beyond the maximum query count#22995
yomaytk wants to merge 1 commit into
ggml-org:masterfrom
yomaytk:new-flush-gpu-profile

Conversation

@yomaytk
Copy link
Copy Markdown
Contributor

@yomaytk yomaytk commented May 13, 2026

Overview

This PR fixes the bug described in the Additional Information section.

  • Flush timestamp slots and reset the timestamp state when the number of used timestamp slots is nearly full.

I confirmed that GPU profiles can now be collected for Qwen3.5-35B-A3B-GGUF and several other models (Qwen3.5, Qwen3.6, Gemma 4, and Llama 3).

Additional Information

I noticed that unsloth/Qwen3.5-35B-A3B-GGUF overflowed the timestamp QuerySet when I tried to collect a GPU profile:

llama.cpp/ggml/src/ggml-webgpu/ggml-webgpu.cpp:571: GGML_ASSERT(ctx->profile_timestamp_query_count + 2 <= WEBGPU_MAX_PROFILE_QUERY_COUNT) failed

This suggests that we need logic to allow profile collection even when a model requires more than 4096 timestamp queries.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES - I used AI to investigate WebGPU specification

@yomaytk yomaytk requested a review from a team as a code owner May 13, 2026 00:42
@github-actions github-actions Bot added ggml changes relating to the ggml tensor library for machine learning WebGPU labels May 13, 2026
@reeselevine
Copy link
Copy Markdown
Contributor

thanks, this is a nice clean addition!

@reeselevine reeselevine requested a review from CISC May 13, 2026 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning WebGPU

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants