Skip to content

Add ONNX Conversion and Triton Inference Server Support#1

Open
rensortino wants to merge 13 commits intomainfrom
feat/onnx_conversion
Open

Add ONNX Conversion and Triton Inference Server Support#1
rensortino wants to merge 13 commits intomainfrom
feat/onnx_conversion

Conversation

@rensortino
Copy link
Collaborator

Add ONNX Conversion and Triton Inference Server Support

Overview

This PR adds comprehensive support for exporting SAM3 models to ONNX format and deploying them using Triton Inference Server. The implementation includes model export utilities, ONNX runtime inference, PyTorch inference, and full Triton Server integration with TensorRT conversion capabilities.

Key Features

🚀 Model Export & Conversion

  • ONNX Export: Export SAM3 model components (vision encoder, text encoder, geometry encoder, decoder) to ONNX format with dynamic batch support
  • TensorRT Conversion: Scripts to convert ONNX models to TensorRT for optimized GPU inference
  • Protobuf Config Generation: Automated extraction of Triton Server configuration files from ONNX models

🔥 Inference Engines

  • PyTorch Inference: Direct PyTorch-based inference for development and testing
  • ONNX Runtime Inference: Optimized inference using ONNX Runtime with CUDA support
  • Triton Inference Server: Production-ready deployment with model pipeline orchestration

📦 Deployment Infrastructure

  • Triton Model Repository: Complete model repository structure with configuration files
  • Model Pipeline: Python backend pipeline that orchestrates multiple model components
  • Conda Environment Packaging: Scripts to package Python dependencies for Triton Server

🎨 Developer Experience

  • Gradio App: Interactive web interface for testing model inference
  • Justfile: Convenient commands for common tasks (export, inference, server management)
  • Comprehensive Documentation: README with usage examples and setup instructions

Changes Summary

New Files Added

Core Inference Scripts

  • export_to_onnx.py - Export SAM3 model components to ONNX format
  • inference_onnx.py - ONNX Runtime inference engine
  • inference_torch.py - PyTorch inference engine
  • model.py - Model wrapper classes for ONNX export (VisionEncoderWrapper, TextEncoderWrapper, GeometryEncoderWrapper, DecoderWrapper)
  • utils.py - Shared utilities for preprocessing, postprocessing, and visualization

Triton Integration

  • call_triton_models.py - Client script to call individual Triton models
  • call_triton_pipeline.py - Client script to call Triton pipeline model
  • model_repository/ - Complete Triton model repository structure:
    • vision-encoder/config.pbtxt
    • text-encoder/config.pbtxt
    • geometry-encoder/config.pbtxt
    • decoder/config.pbtxt
    • pipeline/config.pbtxt + Python backend implementation

Scripts & Utilities

  • scripts/pbtxt_from_onnx.py - Extract Triton config files from ONNX models
  • scripts/convert_to_tensorrt.sh - Convert ONNX models to TensorRT
  • scripts/pack_conda_env.sh - Package conda environment for Triton Python backend
  • gradio_app.py - Interactive Gradio web interface

Build & Documentation

  • Justfile - Task runner with convenient commands
  • README.md - Comprehensive documentation
  • .gitignore - Git ignore rules
  • uv.lock - Dependency lock file for reproducible builds

Dependencies

  • Python 3.8+
  • uv package manager (for dependency management)
  • CUDA-capable GPU (recommended)
  • Docker (for Triton Server and TensorRT conversion)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant