[Scripts] Add op_coverage diagnostic for recent LLMs by YWHyuk · Pull Request #245 · PSAL-POSTECH/PyTorchSim

YWHyuk · 2026-05-26T07:03:07Z

Summary

Add scripts/op_coverage.py, a two-phase aten-op coverage probe for the torch.compile + npu:0 path.
Phase 1 enumerates aten ops via a custom dynamo backend over eager forward.
Phase 2 runs torch.compile on npu:0 and parses the failure traceback to surface the first-failing aten op per model.

Models covered (transformers 4.51.3)

qwen2, gemma, gemma2, phi3, qwen3, qwen3_moe, gemma3, deepseek_v3, llama4, glm4, olmo2, granite, phimoe, mamba2, mllama.

Each builder uses num_hidden_layers=2, small but realistic hidden/head dims, batch=1, seq_len=32, fp32. Configs were tuned to match each model's invariants (e.g. mamba2 SSM invariant num_heads * head_dim == expand * hidden_size, mllama rope_scaling["rope_type"]).

Usage

python scripts/op_coverage.py                     # all 15 models
python scripts/op_coverage.py --models qwen3      # subset
python scripts/op_coverage.py --enumerate-only    # skip NPU compile (fast)

Results land in $TORCHSIM_LOG_PATH/op_coverage/<timestamp>/ as one <model>.log per model plus a summary.txt (status + per-model unique ops).

Why

Surface coverage gaps for newer LLMs (Qwen3, DeepSeek-V3, Llama4, Phi-MoE, Mamba2, ...) in one shot so we can decide which aten ops, MLIR templates, or decompositions to prioritise. The current tests/ allowlist gates correctness on existing supported ops; this script complements it by enumerating what is not yet supported.

Findings on develop @ `5045837` (already filed / known)

13/13 of the older-style transformer LLMs pass Phase 2.
llama4 fails in prologue-fusion buffer matching: issue [Bug][Frontend] Prologue fusion buffer-size matcher ignores slice/chunk views (llama4 SwiGLU) #244.
mamba2 fails because PyTorchSim's aten.convolution decomposition is hard-coded to 4D inputs (depthwise conv1d in the SSM mixer is not yet supported). Will file separately if useful.

Test plan

Phase 1 enumeration runs over all 15 models without crashing the script (failures are caught and surfaced in summary).
Phase 2 traceback parser extracts the first-failing aten op when MLIR codegen raises.
Add to CI? Not yet -- this is a developer tool, slow to run, and depends on GPU-free Phase 2 succeeding end-to-end which it does not yet for mamba2/llama4. Could be wired in as a non-blocking nightly later.

Add scripts/op_coverage.py, a two-phase aten-op coverage probe for the torch.compile + npu:0 path. Phase 1 enumerates aten ops via a custom dynamo backend over eager forward; Phase 2 actually runs torch.compile on npu:0 and parses the failure traceback to surface the first-failing aten op per model. Includes 15 model builders aligned with transformers 4.51.3: qwen2, gemma, gemma2, phi3, qwen3, qwen3_moe, gemma3, deepseek_v3, llama4, glm4, olmo2, granite, phimoe, mamba2, mllama. The mamba2 builder uses the SSM invariant num_heads * head_dim == expand * hidden_size (modeling_mamba2.py:171). The mllama builder passes rope_scaling={"rope_type": "default"} so MllamaRotaryEmbedding can init without a full Llama-3.2 scaling config. Usage: python scripts/op_coverage.py # all 15 models python scripts/op_coverage.py --models qwen3 # subset python scripts/op_coverage.py --enumerate-only Results written to $TORCHSIM_LOG_PATH/op_coverage/<timestamp>/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Scripts] Add op_coverage diagnostic for recent LLMs#245

[Scripts] Add op_coverage diagnostic for recent LLMs#245
YWHyuk wants to merge 1 commit into
developfrom
feature/op-coverage-script

YWHyuk commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

YWHyuk commented May 26, 2026

Summary

Models covered (transformers 4.51.3)

Usage

Why

Findings on develop @ 5045837 (already filed / known)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Findings on develop @ `5045837` (already filed / known)