Skip to content

TensorRT Lean cannot deserialize engine built in full TensorRT (ReformatRunner error) #4790

@abdessayedala666

Description

@abdessayedala666

Summary

I am unable to load a .trt engine using tensorrt_lean runtime. The engine fails during deserialization with a ReformatRunner error, even though the setup is inference-only and uses a CUDA runtime container.

Environment
Docker base image: nvidia/cuda:13.1.2-cudnn-runtime-ubuntu24.04
TensorRT version: 10.16.1.11 (tensorrt_lean)
Python: 3.12
GPU: NVIDIA GPU (CUDA enabled container)
Engine format: .trt (prebuilt outside container)
Installed packages
tensorrt_lean
numpy
Issue description

When trying to deserialize a TensorRT engine using:

import tensorrt_lean as trt

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)

runtime = trt.Runtime(TRT_LOGGER)
with open("trt_weights_f16/mos.trt", "rb") as f:
engine = runtime.deserialize_cuda_engine(f.read())

The following error occurs:

[TRT] [E] IRuntime::deserializeCudaEngine: Error Code 1: Internal Error
Unexpected call to stub loadRunner for ReformatRunner

As a result:

engine is None

and inference cannot proceed.

Expected behavior

The engine should deserialize successfully and allow inference using:

context = engine.create_execution_context()
context.execute_async_v3(...)
Actual behavior
Engine fails during deserialization
ReformatRunner stub error is triggered
engine == None
No inference possible
What I tried
Switching to inference-only CUDA runtime image
Using tensorrt_lean instead of full TensorRT
Minimal Python inference script (no PyCUDA, no training dependencies)
Verifying engine file path and loading logic
Key observation

The engine was built using a full TensorRT environment, and fails when loaded with tensorrt_lean.

It seems that tensorrt_lean does not support certain internal runners (e.g., ReformatRunner) required by the engine.

Question

Is there a compatibility requirement between:

TensorRT engine build environment
TensorRT Lean runtime

Specifically:

Are engines built with full TensorRT incompatible with Lean runtime?
Is there a required “Lean-compatible engine export” workflow?
Additional context

This setup is intended for inference-only deployment, and the goal was to use a minimal runtime container without full TensorRT SDK.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Module:RuntimeOther generic runtime issues that does not fall into other modules

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions