Skip to content

build: bump gptqmodel from 4.0.0.dev0+cu126torch2.7 to 7.0.0#657

Open
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/gptqmodel-7.0.0
Open

build: bump gptqmodel from 4.0.0.dev0+cu126torch2.7 to 7.0.0#657
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/gptqmodel-7.0.0

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot Bot commented on behalf of github May 5, 2026

Bumps gptqmodel from 4.0.0.dev0+cu126torch2.7 to 7.0.0.

Release notes

Sourced from gptqmodel's releases.

🚀 GPTQModel v7.0.0

🔥 Major

  • New Huawei Ascend NPU quantization support with torch based kernels for inference
  • All CUDA/ROCm compiled kernels are now JIT (just-in-time) compiled on first use
  • Pip/UV install no longer requires the --no-build-isolation flag

🧠 New model support and compatibility wins

  • Added support for GLM 5/5.1, GLM OCR, GLM ASR, Gemma 3n, Falcon Mamba, and InternVL Chat.
  • Extended OpenVINO GPTQ patching to understand GPTQModel's newer kernels.
  • Fixed Qwen3 dtype handling, Qwen3.5 MoE module-tree assertions, Qwen2-VL calibration input capture, and Qwen 3.6 MoE regressions.
  • Fixed Llama4Router replacement behavior, Phi-3 defused MLP module mapping, Phi-4 runtime requirements, Instella rope-scaling compatibility, Ling compatibility, Mixtral MoE checkpoint module names, Brumby thread safety, Baichuan compatibility, and Gemma 3 saving.
  • Fixed exllamav3_torch import under meta-device context.

⚡ Kernels, JIT, and hardware acceleration

  • Moved all compilation required kernels to JIT compilation on first-use and cleaned up Marlin import probing, CUDA header handling, nvcc flag checks, and Torch/CUDA mismatch handling.
  • Synced Marlin/Machete kernels with upstream and added hardware-specific Marlin boost paths.
  • Guarded CUTLASS version mismatches and fixed generated-kernel staleness.
  • Added global kernel rebuild support for CI and safer shared extension locks.
  • Added Ascend NPU support.
  • Fixed AWQ JIT cache invalidation, illegal memory access, SM120 execution, GEMM_Fast shared-memory launch, and BF16 bias validation.
  • Fixed BACKEND.MARLIN loading for gptq_v2 format and added Marlin import coverage.

🔥 Quantization, AWQ, FP8, and dequant

  • Added FP8/FP4 CPU dequant and DeepSeek FP8 .scale dequant export.
  • Added dtype auto-decoding and decode path updates.
  • Reduced AWQ scale-search activation memory and split AWQ integration tests for cleaner coverage.
  • Fail fast on unsupported act-group-aware GPTQ shapes instead of continuing into invalid layouts.
  • Fixed INT3 qzero format conversion, GAR width compatibility, and GPTQ batched keep-mask handling.
  • Improved AWQ W4A8 and BF16 validation paths, plus post-quant MoE routing behavior.
  • Used loader device selection for EoRA adapter generation.

🐢 LazyTurtle, loading, and model plumbing

  • Refactored input capture into BaseQModel and model-specific QModels for cleaner replay and calibration flows.
  • Renamed and hardened the turtle path into LazyTurtle, with stricter materialization failures and better expected-skip handling.
  • Fixed LazyTurtle materialization for non-square fused experts, PhiMoE, nested HF weight renames, reversed WeightRenaming semantics, and non-Safetensors checkpoints.
  • Improved out-of-model tensor handling for MTP prefix/files paths.
  • Removed BaseModel.loader_requires_dtype and normalized config dtype handling through get_hf_config_dtype().
  • Fixed multi-GPU replay output retention, GPTQ finalizer overlap, and quantization OOMs from retained callable cache keys.

🧰 CI, packaging, and developer workflow

  • Cleaned up CI shell logic, environment setup, UV cache handling, reusable Torch tests, CPU-only grouping, runner selection, retry behavior, and offload temp paths.
  • Kept CI and Torch CUDA versions aligned, moved to newer Docker images, and surfaced real exit codes and GPU names.
  • Removed lm-eval, deprecated tests, deprecated artifact IDs, pause UI lifecycle code, and tabulate from CI/test paths.

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [gptqmodel](https://github.com/ModelCloud/GPTQModel) from 4.0.0.dev0+cu126torch2.7 to 7.0.0.
- [Release notes](https://github.com/ModelCloud/GPTQModel/releases)
- [Commits](https://github.com/ModelCloud/GPTQModel/commits/v7.0.0)

---
updated-dependencies:
- dependency-name: gptqmodel
  dependency-version: 7.0.0
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants