Quantized models in FP8/NVFP4 QAT do not show an improvement in accuracy when compared with PTQ models

**Before submitting an issue, please make sure it hasn't been already addressed by searching through the [existing and past issues](https://github.com/NVIDIA/Model-Optimizer/issues?q=is%3Aissue).**

## Describe the bug


- I followed the notebook https://github.com/NVIDIA/Model-Optimizer/blob/main/examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb to perform NVFP4 QAT (as well as FP8 QAT). Both models when deployed using vLLM and evaluated using ifEval benchmarks show a great reduction in accuracy when compared with PTQ quantized models. Its my understanding that QAT models should show improvement in accuracy when compared with PTQ quantized models.

### Steps/Code to reproduce bug



- Calibration Size 512
- model used: meta-llama/Llama-3.1-8B-Instruct
- https://github.com/elizabetht/language-modeling-from-scratch/blob/main/quantization/model-optimizer/qat/output_executed_nvfp4_qat.ipynb

### Expected behavior
Accuracy evaluations should improve when compared with PTQ models using same format (ie NVFP4/FP8)

### Who can help?



- ?

## System information



- Container used (if applicable): ?
- OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): ? 
- CPU architecture (x86_64, aarch64): ?
- GPU name (e.g. H100, A100, L40S): ?
- GPU memory size: ?
- Number of GPUs: ?
- Library versions (if applicable):
  - Python: ?
  - ModelOpt version or commit hash: ?
  - CUDA: ?
  - PyTorch: ?
  - Transformers: ?
  - TensorRT-LLM: ?
  - ONNXRuntime: ?
  - TensorRT: ?
- Any other details that may help: ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantized models in FP8/NVFP4 QAT do not show an improvement in accuracy when compared with PTQ models #806

Describe the bug

Steps/Code to reproduce bug

Expected behavior

Who can help?

System information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Quantized models in FP8/NVFP4 QAT do not show an improvement in accuracy when compared with PTQ models #806

Description

Describe the bug

Steps/Code to reproduce bug

Expected behavior

Who can help?

System information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions