Modelopt-windows documentation update #812

vishalpandya1990 · 2026-01-23T13:03:50Z

What does this PR do?

Documentation

Overview:

Update support matrix, changelog, deployment page, example readmes as per recent feature and model support on Windows side.

Testing

No testing, its just documentation change

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

New Features
- Added ONNX Mixed Precision Weight-only quantization (INT4/INT8) support.
- Introduced diffusion-model quantization on Windows.
- Added new accuracy benchmarks (Perplexity and KL-Divergence).
- Expanded deployment with multiple ONNX Runtime Execution Providers (CUDA, DirectML, TensorRT-RTX).
Bug Fixes
- Fixed ONNX 1.19 compatibility issue with CuPy during INT4 AWQ quantization.
Documentation
- Updated installation guides with system requirements and multiple backend options.
- Reorganized deployment documentation with comprehensive execution provider guidance.
- Expanded example workflows with improved setup instructions and support matrices.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: vipandya <vipandya@nvidia.com>

coderabbitai · 2026-01-23T13:04:17Z

📝 Walkthrough

Walkthrough

Documentation refactor expanding ONNX Runtime Execution Provider (EP) support on Windows beyond DirectML to include CUDA, TensorRT-RTX, and CPU options. Includes new 0.41 release notes, updated system requirements tables, revised installation guides, and refreshed support matrices across multiple documentation and example files.

Changes

Cohort / File(s)	Summary
Release Notes `CHANGELOG-Windows.rst`	Added new 0.41 (TBD) section with bug fixes for ONNX 1.19/CuPy compatibility and new features for mixed-precision/diffusion-model quantization and accuracy benchmarks. Updated 0.33 section with refined wording for LLM quantization and DirectML deployment references.
Deployment Docs `docs/source/deployment/2_onnxruntime.rst`	Renamed section from DirectML to ONNX Runtime. Expanded overview to introduce multiple EPs (CUDA, DirectML, TensorRT-RTX, CPU) with guidance on selection. Added compatibility note clarifying EP-specific model requirements.
Getting Started—Overview `docs/source/getting_started/1_overview.rst`	Updated Model Optimizer link and added TensorRT-RTX as additional backend option alongside DirectML in Windows section.
Getting Started—Installation `docs/source/getting_started/windows/_installation_for_Windows.rst`	Added system requirements table covering OS, architecture, Python, CUDA, ONNX Runtime, driver, and GPU specs.
Getting Started—Standalone Setup `docs/source/getting_started/windows/_installation_standalone.rst`	Added CUDA Toolkit and CuDNN prerequisites. Reframed installation focus to ONNX module. Introduced explicit EP options (onnxruntime-trt-rtx, onnxruntime-directml, onnxruntime-gpu) with default changed from DirectML to GPU (CUDA). Added guidance for EP switching and verification requiring exactly one EP installed.
Getting Started—Olive Installation `docs/source/getting_started/windows/_installation_with_olive.rst`	Reworded intro to emphasize general model optimization. Expanded Prerequisites with explicit DirectML EP packages and example commands. Updated quantization pass reference link. Removed phi3-specific example references.
Support Matrix & Guides `docs/source/guides/0_support_matrix.rst`, `docs/source/guides/windows_guides/_ONNX_PTQ_guide.rst`	Updated feature tables to replace ORT-DirectML with expanded EP coverage (ORT-DML, ORT-CUDA, ORT-TRT-RTX). Clarified EP definitions. Simplified Windows model section to reference external matrix. Updated deployment reference from DirectML-specific to ONNX Runtime guidance.
FAQs `docs/source/support/2_faqs.rst`	Minor wording refinements; added caution about CuPy compatibility with CUDA toolkit.
Examples—Windows Root `examples/windows/README.md`	Updated deployment reference from DirectML to ONNX Runtime. Replaced single support matrix reference with table listing model types and corresponding links.
Examples—GenAI LLM `examples/windows/onnx_ptq/genai_llm/README.md`	Major restructuring with Table of Contents, expanded Overview (added TensorRT-RTX/CUDA backends), new Setup and dedicated Quantization sections. Replaced Command Line Arguments with comprehensive Arguments section including new options (\-\-output_path, \-\-use_zero_point, \-\-block_size, \-\-awqlite_alpha_step, etc.). Expanded example command with ONNX path and flags. Reorganized Evaluate and Deployment sections. Replaced support matrix with detailed table and GenAI note. Added Troubleshoot section.
Examples—SAM2 & Whisper `examples/windows/onnx_ptq/sam2/README.md`, `examples/windows/onnx_ptq/whisper/README.md`	Added new Support Matrix sections in TOC and as dedicated sections with tables for INT8/FP8 modes and explanatory notes. No logic changes.
Examples—Diffusers `examples/windows/torch_onnx/diffusers/README.md`	Renamed "Quantization Support Matrix" to "Support Matrix". Reformatted table. Replaced external link reference with inline NVFP4 performance notes and new footnotes on Blackwell GPU requirements and RAM recommendations for Flux.1.Dev.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Modelopt-windows documentation update' is vague and generic, using the non-descriptive term 'update' without conveying specific details about the primary changes.	Consider a more specific title that highlights key changes, such as 'Add ONNX Runtime execution provider support and update Windows documentation' or 'Update Windows documentation for TensorRT-RTX and CUDA support'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

docs/source/getting_started/windows/_installation_for_Windows.rst (1)

18-18: Clarify CUDA version requirements.

The system requirements table specifies CUDA >=12.0 (Line 18), while the note mentions CUDA-12.8+ for Blackwell GPU support (Line 28). This may confuse users about the actual minimum CUDA version required.

Consider clarifying whether:

CUDA 12.0 is the general minimum, with 12.8+ needed only for Blackwell GPUs

Or if the table should be updated to reflect 12.8+ as the universal minimum

Also applies to: 28-28
docs/source/deployment/2_onnxruntime.rst (1)
42-42: Fix double slash in URL.

The URL contains a double slash before the closing: python// should be python/.
🔗 Proposed fix
-- Explore `inference scripts <https://github.com/microsoft/onnxruntime-genai/tree/main/examples/python//>`_ in the ORT GenAI example repository for generating output sequences using a single function call.
+- Explore `inference scripts <https://github.com/microsoft/onnxruntime-genai/tree/main/examples/python/>`_ in the ORT GenAI example repository for generating output sequences using a single function call.

🤖 Fix all issues with AI agents

In `@CHANGELOG-Windows.rst`:
- Line 15: Replace the misspelled link text "Perlexity" with the correct
spelling "Perplexity" in the CHANGELOG entry (the link label that currently
reads `Perlexity
<https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_);
ensure the link URL and formatting remain unchanged and only the visible label
is corrected.

In `@docs/source/getting_started/1_overview.rst`:
- Line 14: The Markdown link for ModelOpt-Windows uses a mixed tree/file path
causing a redirect; update the URL in the sentence that references
`ModelOpt-Windows` to use the correct GitHub blob path
`https://github.com/NVIDIA/Model-Optimizer/blob/main/examples/windows/README.md`
or point to the directory
`https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows` so the
link resolves directly without a 301 redirect.

In `@docs/source/getting_started/windows/_installation_standalone.rst`:
- Line 51: There is a typo in the sentence that reads 'The default CUDA version
neeedd for *onnxruntime-gpu* since v1.19.0 is 12.x.' — change "neeedd" to
"needed" so it reads 'The default CUDA version needed for *onnxruntime-gpu*
since v1.19.0 is 12.x.' Update the sentence where "ModelOpt-Windows installs
*onnxruntime-gpu*" is mentioned to correct that single word.

In `@docs/source/getting_started/windows/_installation_with_olive.rst`:
- Line 65: Replace the broken GitHub link target in the rst line that currently
reads "overview
<https://github.com/microsoft/Olive/blob/main/docs/architecture.md>"_ with the
Olive docs site URL (for example "overview
<https://microsoft.github.io/Olive/>"_), keeping the visible link text the same
so the sentence points to the actual Olive architecture documentation.

In `@docs/source/guides/0_support_matrix.rst`:
- Line 101: Replace the incorrect GitHub anchor URL in the README line
referencing the model support matrix (the text containing
"https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows#support-matrix")
with the official Model Optimizer Windows installation/support page URL that
documents supported platform requirements and GPU specifications; update the
link target in the phrase "details <...>" so it points directly to the official
Windows installation/support documentation rather than the GitHub examples
anchor.

In `@examples/windows/onnx_ptq/genai_llm/README.md`:
- Line 32: Update the README sentence to tighten wording and fix typos: change
"ONNX GenAI compatible" to "GenAI-compatible", use "precision" consistently
(e.g., "select precision" or "precision level"), and replace informal phrasing
like "choose from" with "select from" for clarity; apply the same edits to the
other occurrences mentioned (the paragraphs around the same phrasing at the
other locations) so all instances use "GenAI-compatible", consistent "precision"
wording, and "select from" phrasing for a uniform, clearer README.
- Around line 56-57: The README lists the `--dataset` supported values as "cnn,
pilevel" but the description calls it "pile-val"; pick one canonical value
(recommend "pile-val") and update the `--dataset` supported-values list and the
descriptive text to match, and also search for any validation or flag-parsing
logic that references `pilevel` and update it to the chosen canonical token so
the flag, description, and code all match (`--dataset`, cnn, pile-val).

🧹 Nitpick comments (5)

examples/windows/torch_onnx/diffusers/README.md (1)
95-109: Consider improving clarity and consistency.

The Support Matrix section rename improves consistency with other README files, and the new footnotes provide valuable context. However, consider the following refinements:

Line 109: The note about "some known performance issues with NVFP4 model execution" is vague. Consider being more specific about what issues users might encounter or providing a reference to a tracking issue.

Lines 103, 105: Footnote formatting is inconsistent - these lines lack ending punctuation while line 107 includes a period.
♻️ Suggested improvements
-> *1. NVFP4 inference requires Blackwell GPUs for speedup.*
+> *1. NVFP4 inference requires Blackwell GPUs for speedup.*

-> *2. It is recommended to enable cpu-offloading and have 128+ GB of system RAM for quantizing Flux.1.Dev on RTX5090.*
+> *2. It is recommended to enable cpu-offloading and have 128+ GB of system RAM for quantizing Flux.1.Dev on RTX5090.*

-> *There are some known performance issues with NVFP4 model execution using TRTRTX EP. Stay tuned for further updates!*
+> *NVFP4 model execution using TRTRTX EP has known performance limitations. Stay tuned for further updates!*
CHANGELOG-Windows.rst (1)

14-14: Consider more descriptive link text.

The link text "example script" could be more descriptive, similar to line 13's "example for GenAI LLMs". Consider something like "diffusion models quantization example" for consistency and clarity.

docs/source/getting_started/windows/_installation_standalone.rst (1)

72-76: Minor: Consider consistent capitalization in verification checklist.

The verification item "Onnxruntime Package" uses different capitalization compared to other items like "Python Interpreter" and "Task Manager" (title case). Consider using "ONNX Runtime Package" for consistency.

docs/source/deployment/2_onnxruntime.rst (2)

9-16: Good addition of multi-EP support overview.

The execution provider descriptions effectively communicate the options available to users. The guidance to select based on model, hardware, and deployment requirements is helpful.

Optional: Consider clarifying DirectML EP scope.

Line 12's description "Enables deployment on a wide range of GPUs" could be more specific about which GPU vendors (e.g., AMD, Intel, NVIDIA) or hardware generations are supported to help users make informed decisions.

32-34: Clarify that EP constraints are build-optimization specific, not inherent to ONNX portability.

The note's core guidance—rebuild/re-export models for different EPs—is sound practice for ONNX Runtime GenAI. However, the explanation should be more precise: models are constrained to their export EP+precision combination because the GenAI model builder produces optimizations specific to that configuration, not because ONNX itself prevents cross-EP portability. While the underlying ONNX/ORT framework supports heterogeneous execution across EPs, GenAI's build process outputs precision- and EP-optimized artifacts that don't always transfer directly. Refine the note to clarify this is a practical build/optimization constraint (rebuild when targeting a different EP) rather than an inherent incompatibility, and optionally reference the model builder's documented EP/precision support matrix.

coderabbitai · 2026-01-23T13:08:44Z

CHANGELOG-Windows.rst

+
+- Add support for ONNX Mixed Precision Weight-only quantization using INT4 and INT8 precisions. Refer quantization `example for GenAI LLMs <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/onnx_ptq/genai_llm>`_.
+- Add support for some diffusion models' quantization on Windows. Refer `example script <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/torch_onnx/diffusers>`_ for details.
+- Add `Perlexity <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_ and `KL-Divergence <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/kl_divergence_metrics>`_ accuracy benchmarks.


⚠️ Potential issue | 🟡 Minor

Fix typo: "Perlexity" should be "Perplexity".

The word "Perlexity" is misspelled and should be "Perplexity".

🐛 Proposed fix

-- Add `Perlexity <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_ and `KL-Divergence <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/kl_divergence_metrics>`_ accuracy benchmarks. +- Add `Perplexity <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_ and `KL-Divergence <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/kl_divergence_metrics>`_ accuracy benchmarks.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- Add `Perlexity <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_ and `KL-Divergence <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/kl_divergence_metrics>`_ accuracy benchmarks.

- Add `Perplexity <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_ and `KL-Divergence <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/kl_divergence_metrics>`_ accuracy benchmarks.

🤖 Prompt for AI Agents

In `@CHANGELOG-Windows.rst` at line 15, Replace the misspelled link text "Perlexity" with the correct spelling "Perplexity" in the CHANGELOG entry (the link label that currently reads `Perlexity <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_); ensure the link URL and formatting remain unchanged and only the visible label is corrected.

coderabbitai · 2026-01-23T13:08:44Z

docs/source/getting_started/1_overview.rst

 techniques to produce optimized & quantized checkpoints. Seamlessly integrated within the NVIDIA AI software ecosystem, the quantized checkpoint generated from Model Optimizer is ready for deployment in downstream inference frameworks like `TensorRT-LLM <https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/quantization>`_ or `TensorRT <https://github.com/NVIDIA/TensorRT>`_ (Linux). ModelOpt is integrated with `NVIDIA NeMo <https://github.com/NVIDIA-NeMo/NeMo>`_ and `Megatron-LM <https://github.com/NVIDIA/Megatron-LM>`_ for training-in-the-loop optimization techniques. For enterprise users, the 8-bit quantization with Stable Diffusion is also available on `NVIDIA NIM <https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/>`_.

-For Windows users, the `Model Optimizer for Windows <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/README.md>`_ (ModelOpt-Windows) delivers model compression techniques, including quantization, on Windows RTX PC systems. ModelOpt-Windows is optimized for efficient quantization, featuring local GPU calibration, reduced system and video memory consumption, and swift processing times. It integrates seamlessly with the Windows ecosystem, with optimized ONNX models as output for `Microsoft DirectML <https://github.com/microsoft/DirectML>`_ backends. Furthermore, ModelOpt-Windows supports SDKs such as `Microsoft Olive <https://github.com/microsoft/Olive>`_ and `ONNX Runtime <https://github.com/microsoft/onnxruntime>`_, enabling the deployment of quantized models across various independent hardware vendors through the DirectML path.
+For Windows users, the `Model Optimizer for Windows <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/README.md>`_ (ModelOpt-Windows) delivers model compression techniques, including quantization, on Windows RTX PC systems. ModelOpt-Windows is optimized for efficient quantization, featuring local GPU calibration, reduced system and video memory consumption, and swift processing times. It integrates seamlessly with the Windows ecosystem, with optimized ONNX models as output for `Microsoft DirectML <https://github.com/microsoft/DirectML>`_ and `TensorRT-RTX <https://github.com/NVIDIA/TensorRT-RTX>`_ backends. Furthermore, ModelOpt-Windows supports SDKs such as `Microsoft Olive <https://github.com/microsoft/Olive>`_ and `ONNX Runtime <https://github.com/microsoft/onnxruntime>`_, enabling the deployment of quantized models across various independent hardware vendors through the DirectML path.


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if the GitHub URL is accessible echo "Testing URL: https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/README.md" curl -s -o /dev/null -w "HTTP Status Code: %{http_code}\n" "https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/README.md"

Repository: NVIDIA/Model-Optimizer

Length of output: 179

🏁 Script executed:

#!/bin/bash # Check the actual GitHub repository structure for NVIDIA/Model-Optimizer # Try to access the repository and find Windows-related documentation echo "=== Checking repository structure ===" # Check if examples/windows directory exists using GitHub API curl -s -L "https://api.github.com/repos/NVIDIA/Model-Optimizer/contents/examples" | grep -i windows echo -e "\n=== Checking main branch structure ===" # Try without the specific README.md file curl -s -I "https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows" | head -5 echo -e "\n=== Testing alternative URL format ===" # Test if accessing without tree/main works curl -s -o /dev/null -w "HTTP Status Code: %{http_code}\n" "https://github.com/NVIDIA/Model-Optimizer/blob/main/examples/windows/README.md"

Repository: NVIDIA/Model-Optimizer

Length of output: 890

Correct the GitHub URL format.

The Windows documentation link uses an incorrect URL format. The URL https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/README.md mixes directory (tree) and file (blob) path patterns, which causes a 301 redirect.

Use blob instead of tree when linking to a specific file:
https://github.com/NVIDIA/Model-Optimizer/blob/main/examples/windows/README.md

Alternatively, link to the directory view without the filename:
https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows

🤖 Prompt for AI Agents

In `@docs/source/getting_started/1_overview.rst` at line 14, The Markdown link for ModelOpt-Windows uses a mixed tree/file path causing a redirect; update the URL in the sentence that references `ModelOpt-Windows` to use the correct GitHub blob path `https://github.com/NVIDIA/Model-Optimizer/blob/main/examples/windows/README.md` or point to the directory `https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows` so the link resolves directly without a 301 redirect.

coderabbitai · 2026-01-23T13:08:44Z

docs/source/getting_started/windows/_installation_standalone.rst

-  .. code-block:: bash
-
-      pip uninstall onnxruntime-directml
+By default, ModelOpt-Windows installs *onnxruntime-gpu*. The default CUDA version neeedd for *onnxruntime-gpu* since v1.19.0 is 12.x. The *onnxruntime-gpu* package (i.e. CUDA EP) has CUDA and cuDNN dependencies:


⚠️ Potential issue | 🟡 Minor

Fix typo: "neeedd" should be "needed".

The word "neeedd" has an extra 'd' and should be "needed".

🐛 Proposed fix

-By default, ModelOpt-Windows installs *onnxruntime-gpu*. The default CUDA version neeedd for *onnxruntime-gpu* since v1.19.0 is 12.x. The *onnxruntime-gpu* package (i.e. CUDA EP) has CUDA and cuDNN dependencies: +By default, ModelOpt-Windows installs *onnxruntime-gpu*. The default CUDA version needed for *onnxruntime-gpu* since v1.19.0 is 12.x. The *onnxruntime-gpu* package (i.e. CUDA EP) has CUDA and cuDNN dependencies:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

By default, ModelOpt-Windows installs *onnxruntime-gpu*. The default CUDA version neeedd for *onnxruntime-gpu* since v1.19.0 is 12.x. The *onnxruntime-gpu* package (i.e. CUDA EP) has CUDA and cuDNN dependencies:

By default, ModelOpt-Windows installs *onnxruntime-gpu*. The default CUDA version needed for *onnxruntime-gpu* since v1.19.0 is 12.x. The *onnxruntime-gpu* package (i.e. CUDA EP) has CUDA and cuDNN dependencies:

🤖 Prompt for AI Agents

In `@docs/source/getting_started/windows/_installation_standalone.rst` at line 51, There is a typo in the sentence that reads 'The default CUDA version neeedd for *onnxruntime-gpu* since v1.19.0 is 12.x.' — change "neeedd" to "needed" so it reads 'The default CUDA version needed for *onnxruntime-gpu* since v1.19.0 is 12.x.' Update the sentence where "ModelOpt-Windows installs *onnxruntime-gpu*" is mentioned to correct that single word.

coderabbitai · 2026-01-23T13:08:44Z

docs/source/getting_started/windows/_installation_with_olive.rst


 #. Currently, the Model Optimizer - Windows only supports Onnx Runtime GenAI based LLM models in the Olive workflow.
-#. To try out different LLMs and EPs in the Olive workflow of ModelOpt-Windows, refer the details provided in `phi3 <https://github.com/microsoft/Olive/tree/main/examples/phi3#quantize-models-with-nvidia-Model-Optimizer>`_ Olive example.
+#. For overview about Olive workflow, refer this `overview <https://github.com/microsoft/Olive/blob/main/docs/architecture.md>`_ guide. 


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if the Olive architecture documentation URL is accessible curl -s -L -I "https://github.com/microsoft/Olive/blob/main/docs/architecture.md" | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 5037

🌐 Web query:

Olive Microsoft GitHub repository documentation architecture.md

💡 Result:

I couldn’t find an architecture.md (or ARCHITECTURE.md) file in the microsoft/Olive GitHub repository’s default branch. The repo’s documentation is primarily published on the Olive docs site, and the closest “architecture” description is the Overview page, which explains Olive’s core concepts (workflow, passes, evaluators/metrics, and search/auto-tuning strategy). [1], [2]

Where to look instead

Repo home / docs entry point: microsoft/Olive [1]

Conceptual architecture (“How Olive works”): Olive docs → Overview [2]

Fix the Olive documentation link.

The referenced URL points to a non-existent file. The microsoft/Olive repository does not contain docs/architecture.md at the specified path. Update the link to point to the actual Olive architecture documentation available on the Olive docs site.

🤖 Prompt for AI Agents

In `@docs/source/getting_started/windows/_installation_with_olive.rst` at line 65, Replace the broken GitHub link target in the rst line that currently reads "overview <https://github.com/microsoft/Olive/blob/main/docs/architecture.md>"_ with the Olive docs site URL (for example "overview <https://microsoft.github.io/Olive/>"_), keeping the visible link text the same so the sentence points to the actual Olive architecture documentation.

coderabbitai · 2026-01-23T13:08:44Z

docs/source/guides/0_support_matrix.rst

-    - ``ONNX INT8 Max`` means INT8 (W8A8) quantization of ONNX model using Max calibration. Similar holds true for the term ``ONNX FP8 Max``.
-    - The LLMs in above table are `GenAI <https://github.com/microsoft/onnxruntime-genai/>`_ built LLMs unless specified otherwise.
-    - Check `examples <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/onnx_ptq/>`_ for specific instructions and scripts.
+    Please checkout the model support matrix `details <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows#support-matrix>`_.


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

GitHub NVIDIA Model-Optimizer examples windows support-matrix

💡 Result:

GitHub examples (Windows): NVIDIA Model Optimizer keeps Windows-specific example scripts under examples/windows in the NVIDIA/Model-Optimizer repo, linked from the main README’s “Post Training Quantization → windows” examples entry. [1]

Support matrix (Windows / “ModelOpt-Windows”): The Windows platform requirements are documented in the Windows installation page (acts as the practical support matrix): Windows + amd64, Python >=3.10,<3.13, CUDA >=12.0, ONNX Runtime 1.20.0, NVIDIA driver 565.90+, and NVIDIA GPU RTX 40 / 50 series (single-GPU supported). [2]

Windows quantization feature notes: The Windows quick-start covers ONNX PTQ (incl. INT4 AWQ) and points to the Support Matrix for supported features/models; it also notes ONNX opset requirements (FP8 needs opset 19+, INT4 needs opset 21+). [3]

Pointers (paths):

NVIDIA/Model-Optimizer/examples/windows Docs: Installation for Windows (ModelOpt-Windows) Docs: Quick Start: Quantization (Windows)

Sources: [1] (github.com) [2] (nvidia.github.io) [3] (nvidia.github.io)

Citations:

1: https://github.com/NVIDIA/Model-Optimizer

2: https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/windows/_installation_for_Windows.html?utm_source=openai

3: https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/4_quantization_windows.html?utm_source=openai

🏁 Script executed:

#!/bin/bash # Check the GitHub URL structure and verify the repository exists curl -s -I "https://github.com/NVIDIA/Model-Optimizer" | head -1

Repository: NVIDIA/Model-Optimizer

Length of output: 188

Update the Windows model support matrix link to point to the official documentation.

The current link references https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows#support-matrix, but the actual Windows support matrix is documented in the official Model Optimizer documentation at the Windows installation page, not as an anchor on the GitHub examples directory. Update the reference to point directly to the supported platform requirements and GPU specifications.

🤖 Prompt for AI Agents

In `@docs/source/guides/0_support_matrix.rst` at line 101, Replace the incorrect GitHub anchor URL in the README line referencing the model support matrix (the text containing "https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows#support-matrix") with the official Model Optimizer Windows installation/support page URL that documents supported platform requirements and GPU specifications; update the link target in the phrase "details <...>" so it points directly to the official Windows installation/support documentation rather than the GitHub examples anchor.

coderabbitai · 2026-01-23T13:08:45Z

examples/windows/onnx_ptq/genai_llm/README.md

-### Prepare ORT-GenAI Compatible Base Model
+## Prepare ORT-GenAI Compatible Base Model

 You may generate the base model using the model builder that comes with onnxruntime-genai. The ORT-GenAI's [model-builder](https://github.com/microsoft/onnxruntime-genai/tree/main/src/python/py/models) downloads the original Pytorch model from Hugging Face, and produces an ONNX GenAI compatible base model in ONNX format. See example command-line below:


⚠️ Potential issue | 🟡 Minor

Tighten wording/typos for clarity.
A few small fixes will reduce confusion (e.g., “GenAI-compatible”, “precision”, “choose from”).

✏️ Suggested edits

-... produces an ONNX GenAI compatible base model in ONNX format. +... produces an ONNX GenAI-compatible base model in ONNX format. -| `--enable_mixed_quant` | Default: disabled mixed quant | Use this option to enable mixed precsion quantization| -| `--layers_8bit` | Default: None | Use this option to Overrides default mixed quant strategy| +| `--enable_mixed_quant` | Default: disabled mixed quant | Use this option to enable mixed precision quantization| +| `--layers_8bit` | Default: None | Use this option to override the default mixed-quant strategy| -1. For the `algo` argument, we have following options to choose form: awq_lite, awq_clip, rtn, rtn_dq. +1. For the `algo` argument, we have following options to choose from: awq_lite, awq_clip, rtn, rtn_dq. -> *All LLMs in the above table are [GenAI](https://github.com/microsoft/onnxruntime-genai/) built LLMs.* +> *All LLMs in the above table are [GenAI](https://github.com/microsoft/onnxruntime-genai/) built LLMs.*

Also applies to: 70-71, 83-83, 130-130

🧰 Tools

🪛 LanguageTool

[grammar] ~32-~32: Use a hyphen to join words.
Context: ...Hugging Face, and produces an ONNX GenAI compatible base model in ONNX format. Se...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents

In `@examples/windows/onnx_ptq/genai_llm/README.md` at line 32, Update the README sentence to tighten wording and fix typos: change "ONNX GenAI compatible" to "GenAI-compatible", use "precision" consistently (e.g., "select precision" or "precision level"), and replace informal phrasing like "choose from" with "select from" for clarity; apply the same edits to the other occurrences mentioned (the paragraphs around the same phrasing at the other locations) so all instances use "GenAI-compatible", consistent "precision" wording, and "select from" phrasing for a uniform, clearer README.

coderabbitai · 2026-01-23T13:08:45Z

examples/windows/onnx_ptq/genai_llm/README.md

 | `--dataset` | cnn (default), pilevel | Choose calibration dataset: cnn_dailymail or pile-val. |
 | `--algo` | awq_lite (default), awq_clip, rtn, rtn_dq | Select the quantization algorithm. |


⚠️ Potential issue | 🟡 Minor

Fix dataset value mismatch (pilevel vs pile-val).
The supported value list says pilevel, but the description refers to “pile-val”. Pick one canonical flag/value so users don’t pass an invalid option.

✅ Suggested edit

-| `--dataset` | cnn (default), pilevel | Choose calibration dataset: cnn_dailymail or pile-val. | +| `--dataset` | cnn (default), pileval | Choose calibration dataset: cnn_dailymail or pile-val. |

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

| `--dataset` | cnn (default), pilevel | Choose calibration dataset: cnn_dailymail or pile-val. |

| `--algo` | awq_lite (default), awq_clip, rtn, rtn_dq | Select the quantization algorithm. |

| `--dataset` | cnn (default), pileval | Choose calibration dataset: cnn_dailymail or pile-val. |

| `--algo` | awq_lite (default), awq_clip, rtn, rtn_dq | Select the quantization algorithm. |

🤖 Prompt for AI Agents

In `@examples/windows/onnx_ptq/genai_llm/README.md` around lines 56 - 57, The README lists the `--dataset` supported values as "cnn, pilevel" but the description calls it "pile-val"; pick one canonical value (recommend "pile-val") and update the `--dataset` supported-values list and the descriptive text to match, and also search for any validation or flag-parsing logic that references `pilevel` and update it to the chosen canonical token so the flag, description, and code all match (`--dataset`, cnn, pile-val).

codecov · 2026-01-23T13:17:20Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.17%. Comparing base (2a08622) to head (23cd1e8).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #812   +/-   ##
=======================================
  Coverage   74.17%   74.17%           
=======================================
  Files         192      192           
  Lines       19246    19246           
=======================================
  Hits        14276    14276           
  Misses       4970     4970

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

modelopt windows documentation update

23cd1e8

Signed-off-by: vipandya <vipandya@nvidia.com>

vishalpandya1990 requested a review from a team as a code owner January 23, 2026 13:03

vishalpandya1990 requested a review from zhanghaoc January 23, 2026 13:03

vishalpandya1990 requested a review from ynankani January 23, 2026 13:05

coderabbitai bot reviewed Jan 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modelopt-windows documentation update #812

Modelopt-windows documentation update #812

vishalpandya1990 commented Jan 23, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 23, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 23, 2026

Uh oh!

coderabbitai bot Jan 23, 2026

Uh oh!

coderabbitai bot Jan 23, 2026

Uh oh!

coderabbitai bot Jan 23, 2026

Uh oh!

coderabbitai bot Jan 23, 2026

Uh oh!

coderabbitai bot Jan 23, 2026

Uh oh!

coderabbitai bot Jan 23, 2026

Uh oh!

codecov bot commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	- Add `Perlexity <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_ and `KL-Divergence <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/kl_divergence_metrics>`_ accuracy benchmarks.
	- Add `Perplexity <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_ and `KL-Divergence <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/kl_divergence_metrics>`_ accuracy benchmarks.

	By default, ModelOpt-Windows installs onnxruntime-gpu. The default CUDA version neeedd for onnxruntime-gpu since v1.19.0 is 12.x. The onnxruntime-gpu package (i.e. CUDA EP) has CUDA and cuDNN dependencies:
	By default, ModelOpt-Windows installs onnxruntime-gpu. The default CUDA version needed for onnxruntime-gpu since v1.19.0 is 12.x. The onnxruntime-gpu package (i.e. CUDA EP) has CUDA and cuDNN dependencies:

		\| `--dataset` \| cnn (default), pilevel \| Choose calibration dataset: cnn_dailymail or pile-val. \|
		\| `--algo` \| awq_lite (default), awq_clip, rtn, rtn_dq \| Select the quantization algorithm. \|

Modelopt-windows documentation update #812

Are you sure you want to change the base?

Modelopt-windows documentation update #812

Conversation

vishalpandya1990 commented Jan 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 23, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vishalpandya1990 commented Jan 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 23, 2026 •

edited

Loading