Add ONNX Conversion and Triton Inference Server Support by rensortino · Pull Request #1 · carcutter/cc_segmentation_sam3

rensortino · 2025-12-16T13:54:03Z

Add ONNX Conversion and Triton Inference Server Support

Overview

This PR adds comprehensive support for exporting SAM3 models to ONNX format and deploying them using Triton Inference Server. The implementation includes model export utilities, ONNX runtime inference, PyTorch inference, and full Triton Server integration with TensorRT conversion capabilities.

Key Features

🚀 Model Export & Conversion

ONNX Export: Export SAM3 model components (vision encoder, text encoder, geometry encoder, decoder) to ONNX format with dynamic batch support
TensorRT Conversion: Scripts to convert ONNX models to TensorRT for optimized GPU inference
Protobuf Config Generation: Automated extraction of Triton Server configuration files from ONNX models

🔥 Inference Engines

PyTorch Inference: Direct PyTorch-based inference for development and testing
ONNX Runtime Inference: Optimized inference using ONNX Runtime with CUDA support
Triton Inference Server: Production-ready deployment with model pipeline orchestration

📦 Deployment Infrastructure

Triton Model Repository: Complete model repository structure with configuration files
Model Pipeline: Python backend pipeline that orchestrates multiple model components
Conda Environment Packaging: Scripts to package Python dependencies for Triton Server

🎨 Developer Experience

Gradio App: Interactive web interface for testing model inference
Justfile: Convenient commands for common tasks (export, inference, server management)
Comprehensive Documentation: README with usage examples and setup instructions

Changes Summary

New Files Added

Core Inference Scripts

export_to_onnx.py - Export SAM3 model components to ONNX format
inference_onnx.py - ONNX Runtime inference engine
inference_torch.py - PyTorch inference engine
model.py - Model wrapper classes for ONNX export (VisionEncoderWrapper, TextEncoderWrapper, GeometryEncoderWrapper, DecoderWrapper)
utils.py - Shared utilities for preprocessing, postprocessing, and visualization

Triton Integration

call_triton_models.py - Client script to call individual Triton models
call_triton_pipeline.py - Client script to call Triton pipeline model
model_repository/ - Complete Triton model repository structure:
- vision-encoder/config.pbtxt
- text-encoder/config.pbtxt
- geometry-encoder/config.pbtxt
- decoder/config.pbtxt
- pipeline/config.pbtxt + Python backend implementation

Scripts & Utilities

scripts/pbtxt_from_onnx.py - Extract Triton config files from ONNX models
scripts/convert_to_tensorrt.sh - Convert ONNX models to TensorRT
scripts/pack_conda_env.sh - Package conda environment for Triton Python backend
gradio_app.py - Interactive Gradio web interface

Build & Documentation

Justfile - Task runner with convenient commands
README.md - Comprehensive documentation
.gitignore - Git ignore rules
uv.lock - Dependency lock file for reproducible builds

Dependencies

Python 3.8+
uv package manager (for dependency management)
CUDA-capable GPU (recommended)
Docker (for Triton Server and TensorRT conversion)

rensortino added 13 commits December 16, 2025 12:42

chore: add .gitignore

87202e4

feat: add gradio app

3847beb

feat: add PyTorch inference

9bc635d

feat: add ONNX export and inference scripts

a41162d

feat: add scripts for conversion to TensorRT

710f6ac

feat: upload Triton Server configs and model files

a94b86b

fix: upload missing model.py file

de53a4e

feat: upload script to pack conda environment for Triton

59002a7

feat: add scripts to call triton models

4bc6db8

feat: extract protobuf config from ONNX files

8602e63

build: add Justfile

04c9f82

docs: add README

7db7a79

build: add lockfile

a7093a2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ONNX Conversion and Triton Inference Server Support#1

Add ONNX Conversion and Triton Inference Server Support#1
rensortino wants to merge 13 commits intomainfrom
feat/onnx_conversion

rensortino commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

rensortino commented Dec 16, 2025

Add ONNX Conversion and Triton Inference Server Support

Overview

Key Features

🚀 Model Export & Conversion

🔥 Inference Engines

📦 Deployment Infrastructure

🎨 Developer Experience

Changes Summary

New Files Added

Core Inference Scripts

Triton Integration

Scripts & Utilities

Build & Documentation

Dependencies

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant