Incorrect results of TensorRT 10.16.1 when running with `--best` optimized resnet152 on GPU RTX PRO 6000 Blackwell

## Description

When using torchvision built-in resnet152 with `DEFAULT` weights (i.e., `ResNet152_Weights.IMAGENET1K_V2`), export it to ONNX with `torch.onnx.export`, and convert to tensorrt engine with `trtexec --best` and run it, the inference accuracy drops to near zero.

This doesn't happen when the weights are changed to `ResNet152_Weights.IMAGENET1K_V1`. I've also changed the batch size, the image preprocessing pipeline, yet the issue persists.

## Environment



**TensorRT Version**: 10.16.1.11

**NVIDIA GPU**: NVIDIA RTX PRO 6000 Blackwell

**NVIDIA Driver Version**: 580.142

**CUDA Version**: 13.2

**CUDNN Version**: 9.10.2


Operating System: Ubuntu 24.04 LTS

Python Version (if applicable): 3.12.3

Tensorflow Version (if applicable): N/A

PyTorch Version (if applicable): 2.12.0a0+0291f960b6.nv26.4.48445190

Baremetal or Container (if so, version): `nvcr.io/nvidia/pytorch:26.04-py3` docker image.


## Relevant Files

- [repro_tensorrt_best_resnet152.py](https://github.com/user-attachments/files/27904994/repro_tensorrt_best_resnet152.py)
- `trtexec` output:
  - (67% accuracy for reference) [imagenet1k_v1_resnet152_fp16.log](https://github.com/user-attachments/files/27904998/imagenet1k_v1_resnet152_fp16.log)
  - (66% accuracy for reference) [imagenet1k_v1_resnet152_best.log](https://github.com/user-attachments/files/27905000/imagenet1k_v1_resnet152_best.log)
  - (72% accuracy for reference) [imagenet1k_v2_resnet152_fp16.log](https://github.com/user-attachments/files/27905002/imagenet1k_v2_resnet152_fp16.log)
  - (0% accuracy) [imagenet1k_v2_resnet152_best.log](https://github.com/user-attachments/files/27905004/imagenet1k_v2_resnet152_best.log)

## Steps To Reproduce

Download the attached file: `repro_tensorrt_best_resnet152.py`.

**Commands or scripts**:

```sh
docker run --rm -it --gpus all \
  --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
  -v $PWD:/workspace \
  -v .cache:/root/.cache \
  nvcr.io/nvidia/pytorch:26.04-py3 \
  bash -lc "pip3 install git+https://github.com/modestyachts/ImageNetV2_pytorch && python3 repro_tensorrt_best_resnet152.py"
```

Note the ~0% accuracy issue:

```
imagenet1k_v2: ResNet152_Weights.IMAGENET1K_V2
pytorch: top-1=71.71%, top-5=89.98%
trt_fp16: top-1=71.71%, top-5=89.99%
trt_best: top-1=0.10%, top-5=0.48%

imagenet1k_v1: ResNet152_Weights.IMAGENET1K_V1
pytorch: top-1=66.96%, top-5=87.22%
trt_fp16: top-1=66.95%, top-5=87.25%
trt_best: top-1=66.00%, top-5=86.64%
```

Both weights optimized with `trtexec --best` have the following warnings, but only the V2 weights will have near-zero accuracy:

```
[W] [TRT] Dequantize 1 [SCALE] has invalid precision Int8, ignored.
```

I'm suspecting this may be due to [not providing a calibration file](https://docs.nvidia.com/deeplearning/tensorrt/latest/reference/command-line-programs.html#serialized-engine-generation), but I'm unsure why it differs so much in V1 and V2 weights.

**Have you tried [the latest release](https://developer.nvidia.com/tensorrt)?**: Yes

**Attach the captured .json and .bin files from [TensorRT's API Capture tool](https://docs.nvidia.com/deeplearning/tensorrt/latest/inference-library/capture-replay.html) if you're on an x86_64 Unix system**

**Can this model run on other frameworks?** For example run ONNX model with ONNXRuntime (`polygraphy run <model.onnx> --onnxrt`): Yes

```
# polygraphy run resnet152.onnx --onnxrt
[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
[I] RUNNING | Command: /usr/local/bin/polygraphy run resnet152.onnx --onnxrt
[I] onnxrt-runner-N0-05/17/26-17:42:11  | Activating and starting inference
[I] Creating ONNX-Runtime Inference Session with providers: ['CPUExecutionProvider']
[I] onnxrt-runner-N0-05/17/26-17:42:11 
    ---- Inference Input(s) ----
    {input [dtype=float32, shape=(50, 3, 224, 224)]}
[I] onnxrt-runner-N0-05/17/26-17:42:11 
    ---- Inference Output(s) ----
    {logits [dtype=float32, shape=(50, 1000)]}
[I] onnxrt-runner-N0-05/17/26-17:42:11  | Completed 1 iteration(s) in 5762 ms | Average inference time: 5762 ms.
[I] PASSED | Runtime: 7.970s | Command: /usr/local/bin/polygraphy run resnet152.onnx --onnxrt
```

Disclaimer: The minimal repro code is generated by Codex to keep the code minimal based on a much lengthier codebase.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect results of TensorRT 10.16.1 when running with `--best` optimized resnet152 on GPU RTX PRO 6000 Blackwell #4780

Description

Environment

Relevant Files

Steps To Reproduce

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Incorrect results of TensorRT 10.16.1 when running with --best optimized resnet152 on GPU RTX PRO 6000 Blackwell #4780

Description

Description

Environment

Relevant Files

Steps To Reproduce

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Incorrect results of TensorRT 10.16.1 when running with `--best` optimized resnet152 on GPU RTX PRO 6000 Blackwell #4780