Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions examples/windows/torch_onnx/llm_export/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# LLM Export (Windows)

Export LLMs from PyTorch to ONNX with quantization and GQA surgery.

## Supported Precisions

- `nvfp4` — NVIDIA FP4 quantization
- `int4_awq` — INT4 AWQ quantization
- `int8_sq` — INT8 SmoothQuant

## Usage

### NVFP4

```bash
python llm_export.py --hf_model_path "meta-llama/Llama-3.2-3B-Instruct" --dtype nvfp4 --output_dir ./llama3.2-3b-nvfp4
```

### INT4 AWQ

```bash
python llm_export.py --hf_model_path "meta-llama/Llama-3.2-3B-Instruct" --dtype int4_awq --output_dir ./llama3.2-3b-int4
```

### INT8 SmoothQuant

```bash
python llm_export.py --hf_model_path "Qwen/Qwen2.5-3B-Instruct" --dtype int8_sq --output_dir ./qwen-3b-int8
```

## Options

| Argument | Description |
|---|---|
| `--hf_model_path` | HuggingFace model name or local path |
| `--dtype` | Quantization precision (`fp16`, `fp8`, `int4_awq`, `int8_sq`, `nvfp4`) |
| `--output_dir` | Directory to save the exported ONNX model |
| `--calib_size` | Calibration dataset size (default: 512) |
| `--save_original` | Save the pre-surgery ONNX for debugging |
| `--trust_remote_code` | Trust remote code when loading from HuggingFace |
| `--onnx_path` | Skip export, run surgery on an existing ONNX |
| `--config_path` | Path to config.json if not alongside the model |
Loading
Loading