Skip to content

Quantized models in FP8/NVFP4 QAT do not show an improvement in accuracy when compared with PTQ models #806

@elizabetht

Description

@elizabetht

Before submitting an issue, please make sure it hasn't been already addressed by searching through the existing and past issues.

Describe the bug

Steps/Code to reproduce bug

Expected behavior

Accuracy evaluations should improve when compared with PTQ models using same format (ie NVFP4/FP8)

Who can help?

  • ?

System information

  • Container used (if applicable): ?
  • OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): ?
  • CPU architecture (x86_64, aarch64): ?
  • GPU name (e.g. H100, A100, L40S): ?
  • GPU memory size: ?
  • Number of GPUs: ?
  • Library versions (if applicable):
    • Python: ?
    • ModelOpt version or commit hash: ?
    • CUDA: ?
    • PyTorch: ?
    • Transformers: ?
    • TensorRT-LLM: ?
    • ONNXRuntime: ?
    • TensorRT: ?
  • Any other details that may help: ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions