build: bump gptqmodel from 4.0.0.dev0+cu126torch2.7 to 7.0.0 by dependabot[bot] · Pull Request #657 · PrunaAI/pruna

dependabot · 2026-05-05T01:24:39Z

Bumps gptqmodel from 4.0.0.dev0+cu126torch2.7 to 7.0.0.

Release notes

🚀 GPTQModel v7.0.0

🔥 Major

New Huawei Ascend NPU quantization support with torch based kernels for inference

All CUDA/ROCm compiled kernels are now JIT (just-in-time) compiled on first use

Pip/UV install no longer requires the --no-build-isolation flag

🧠 New model support and compatibility wins

Added support for GLM 5/5.1, GLM OCR, GLM ASR, Gemma 3n, Falcon Mamba, and InternVL Chat.

Extended OpenVINO GPTQ patching to understand GPTQModel's newer kernels.

Fixed Qwen3 dtype handling, Qwen3.5 MoE module-tree assertions, Qwen2-VL calibration input capture, and Qwen 3.6 MoE regressions.

Fixed Llama4Router replacement behavior, Phi-3 defused MLP module mapping, Phi-4 runtime requirements, Instella rope-scaling compatibility, Ling compatibility, Mixtral MoE checkpoint module names, Brumby thread safety, Baichuan compatibility, and Gemma 3 saving.

Fixed exllamav3_torch import under meta-device context.

⚡ Kernels, JIT, and hardware acceleration

Moved all compilation required kernels to JIT compilation on first-use and cleaned up Marlin import probing, CUDA header handling, nvcc flag checks, and Torch/CUDA mismatch handling.

Synced Marlin/Machete kernels with upstream and added hardware-specific Marlin boost paths.

Guarded CUTLASS version mismatches and fixed generated-kernel staleness.

Added global kernel rebuild support for CI and safer shared extension locks.

Added Ascend NPU support.

Fixed AWQ JIT cache invalidation, illegal memory access, SM120 execution, GEMM_Fast shared-memory launch, and BF16 bias validation.

Fixed BACKEND.MARLIN loading for gptq_v2 format and added Marlin import coverage.

🔥 Quantization, AWQ, FP8, and dequant

Added FP8/FP4 CPU dequant and DeepSeek FP8 .scale dequant export.

Added dtype auto-decoding and decode path updates.

Reduced AWQ scale-search activation memory and split AWQ integration tests for cleaner coverage.

Fail fast on unsupported act-group-aware GPTQ shapes instead of continuing into invalid layouts.

Fixed INT3 qzero format conversion, GAR width compatibility, and GPTQ batched keep-mask handling.

Improved AWQ W4A8 and BF16 validation paths, plus post-quant MoE routing behavior.

Used loader device selection for EoRA adapter generation.

🐢 LazyTurtle, loading, and model plumbing

Refactored input capture into BaseQModel and model-specific QModels for cleaner replay and calibration flows.

Renamed and hardened the turtle path into LazyTurtle, with stricter materialization failures and better expected-skip handling.

Fixed LazyTurtle materialization for non-square fused experts, PhiMoE, nested HF weight renames, reversed WeightRenaming semantics, and non-Safetensors checkpoints.

Improved out-of-model tensor handling for MTP prefix/files paths.

Removed BaseModel.loader_requires_dtype and normalized config dtype handling through get_hf_config_dtype().

Fixed multi-GPU replay output retention, GPTQ finalizer overlap, and quantization OOMs from retained callable cache keys.

🧰 CI, packaging, and developer workflow

Cleaned up CI shell logic, environment setup, UV cache handling, reusable Torch tests, CPU-only grouping, runner selection, retry behavior, and offload temp paths.

Kept CI and Torch CUDA versions aligned, moved to newer Docker images, and surfaced real exit codes and GPU names.

Removed lm-eval, deprecated tests, deprecated artifact IDs, pause UI lifecycle code, and tabulate from CI/test paths.

... (truncated)

Commits

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [gptqmodel](https://github.com/ModelCloud/GPTQModel) from 4.0.0.dev0+cu126torch2.7 to 7.0.0. - [Release notes](https://github.com/ModelCloud/GPTQModel/releases) - [Commits](https://github.com/ModelCloud/GPTQModel/commits/v7.0.0) --- updated-dependencies: - dependency-name: gptqmodel dependency-version: 7.0.0 dependency-type: direct:development update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot Bot added the python-dependencies label May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build: bump gptqmodel from 4.0.0.dev0+cu126torch2.7 to 7.0.0#657

build: bump gptqmodel from 4.0.0.dev0+cu126torch2.7 to 7.0.0#657
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/gptqmodel-7.0.0

dependabot Bot commented on behalf of github May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

dependabot Bot commented on behalf of github May 5, 2026

🚀 GPTQModel v7.0.0

🔥 Major

🧠 New model support and compatibility wins

⚡ Kernels, JIT, and hardware acceleration

🔥 Quantization, AWQ, FP8, and dequant

🐢 LazyTurtle, loading, and model plumbing

🧰 CI, packaging, and developer workflow

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants