Skip to content

[Scripts] Add op_coverage diagnostic for recent LLMs#245

Open
YWHyuk wants to merge 1 commit into
developfrom
feature/op-coverage-script
Open

[Scripts] Add op_coverage diagnostic for recent LLMs#245
YWHyuk wants to merge 1 commit into
developfrom
feature/op-coverage-script

Conversation

@YWHyuk
Copy link
Copy Markdown
Collaborator

@YWHyuk YWHyuk commented May 26, 2026

Summary

  • Add scripts/op_coverage.py, a two-phase aten-op coverage probe for the torch.compile + npu:0 path.
  • Phase 1 enumerates aten ops via a custom dynamo backend over eager forward.
  • Phase 2 runs torch.compile on npu:0 and parses the failure traceback to surface the first-failing aten op per model.

Models covered (transformers 4.51.3)

qwen2, gemma, gemma2, phi3, qwen3, qwen3_moe, gemma3, deepseek_v3, llama4, glm4, olmo2, granite, phimoe, mamba2, mllama.

Each builder uses num_hidden_layers=2, small but realistic hidden/head dims, batch=1, seq_len=32, fp32. Configs were tuned to match each model's invariants (e.g. mamba2 SSM invariant num_heads * head_dim == expand * hidden_size, mllama rope_scaling["rope_type"]).

Usage

python scripts/op_coverage.py                     # all 15 models
python scripts/op_coverage.py --models qwen3      # subset
python scripts/op_coverage.py --enumerate-only    # skip NPU compile (fast)

Results land in $TORCHSIM_LOG_PATH/op_coverage/<timestamp>/ as one <model>.log per model plus a summary.txt (status + per-model unique ops).

Why

Surface coverage gaps for newer LLMs (Qwen3, DeepSeek-V3, Llama4, Phi-MoE, Mamba2, ...) in one shot so we can decide which aten ops, MLIR templates, or decompositions to prioritise. The current tests/ allowlist gates correctness on existing supported ops; this script complements it by enumerating what is not yet supported.

Findings on develop @ 5045837 (already filed / known)

Test plan

  • Phase 1 enumeration runs over all 15 models without crashing the script (failures are caught and surfaced in summary).
  • Phase 2 traceback parser extracts the first-failing aten op when MLIR codegen raises.
  • Add to CI? Not yet -- this is a developer tool, slow to run, and depends on GPU-free Phase 2 succeeding end-to-end which it does not yet for mamba2/llama4. Could be wired in as a non-blocking nightly later.

Add scripts/op_coverage.py, a two-phase aten-op coverage probe for the
torch.compile + npu:0 path. Phase 1 enumerates aten ops via a custom
dynamo backend over eager forward; Phase 2 actually runs torch.compile
on npu:0 and parses the failure traceback to surface the first-failing
aten op per model.

Includes 15 model builders aligned with transformers 4.51.3:
qwen2, gemma, gemma2, phi3, qwen3, qwen3_moe, gemma3, deepseek_v3,
llama4, glm4, olmo2, granite, phimoe, mamba2, mllama.

The mamba2 builder uses the SSM invariant num_heads * head_dim ==
expand * hidden_size (modeling_mamba2.py:171). The mllama builder
passes rope_scaling={"rope_type": "default"} so MllamaRotaryEmbedding
can init without a full Llama-3.2 scaling config.

Usage:
  python scripts/op_coverage.py                # all 15 models
  python scripts/op_coverage.py --models qwen3 # subset
  python scripts/op_coverage.py --enumerate-only

Results written to $TORCHSIM_LOG_PATH/op_coverage/<timestamp>/.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant