Skip to content

Conversation

@vishalpandya1990
Copy link
Contributor

@vishalpandya1990 vishalpandya1990 commented Jan 23, 2026

What does this PR do?

Documentation

Overview:

  • Update support matrix, changelog, deployment page, example readmes as per recent feature and model support on Windows side.

Testing

  • No testing, its just documentation change

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

  • New Features

    • Added ONNX Mixed Precision Weight-only quantization (INT4/INT8) support.
    • Introduced diffusion-model quantization on Windows.
    • Added new accuracy benchmarks (Perplexity and KL-Divergence).
    • Expanded deployment with multiple ONNX Runtime Execution Providers (CUDA, DirectML, TensorRT-RTX).
  • Bug Fixes

    • Fixed ONNX 1.19 compatibility issue with CuPy during INT4 AWQ quantization.
  • Documentation

    • Updated installation guides with system requirements and multiple backend options.
    • Reorganized deployment documentation with comprehensive execution provider guidance.
    • Expanded example workflows with improved setup instructions and support matrices.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: vipandya <vipandya@nvidia.com>
@vishalpandya1990 vishalpandya1990 requested a review from a team as a code owner January 23, 2026 13:03
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 23, 2026

📝 Walkthrough

Walkthrough

Documentation refactor expanding ONNX Runtime Execution Provider (EP) support on Windows beyond DirectML to include CUDA, TensorRT-RTX, and CPU options. Includes new 0.41 release notes, updated system requirements tables, revised installation guides, and refreshed support matrices across multiple documentation and example files.

Changes

Cohort / File(s) Summary
Release Notes
CHANGELOG-Windows.rst
Added new 0.41 (TBD) section with bug fixes for ONNX 1.19/CuPy compatibility and new features for mixed-precision/diffusion-model quantization and accuracy benchmarks. Updated 0.33 section with refined wording for LLM quantization and DirectML deployment references.
Deployment Docs
docs/source/deployment/2_onnxruntime.rst
Renamed section from DirectML to ONNX Runtime. Expanded overview to introduce multiple EPs (CUDA, DirectML, TensorRT-RTX, CPU) with guidance on selection. Added compatibility note clarifying EP-specific model requirements.
Getting Started—Overview
docs/source/getting_started/1_overview.rst
Updated Model Optimizer link and added TensorRT-RTX as additional backend option alongside DirectML in Windows section.
Getting Started—Installation
docs/source/getting_started/windows/_installation_for_Windows.rst
Added system requirements table covering OS, architecture, Python, CUDA, ONNX Runtime, driver, and GPU specs.
Getting Started—Standalone Setup
docs/source/getting_started/windows/_installation_standalone.rst
Added CUDA Toolkit and CuDNN prerequisites. Reframed installation focus to ONNX module. Introduced explicit EP options (onnxruntime-trt-rtx, onnxruntime-directml, onnxruntime-gpu) with default changed from DirectML to GPU (CUDA). Added guidance for EP switching and verification requiring exactly one EP installed.
Getting Started—Olive Installation
docs/source/getting_started/windows/_installation_with_olive.rst
Reworded intro to emphasize general model optimization. Expanded Prerequisites with explicit DirectML EP packages and example commands. Updated quantization pass reference link. Removed phi3-specific example references.
Support Matrix & Guides
docs/source/guides/0_support_matrix.rst, docs/source/guides/windows_guides/_ONNX_PTQ_guide.rst
Updated feature tables to replace ORT-DirectML with expanded EP coverage (ORT-DML, ORT-CUDA, ORT-TRT-RTX). Clarified EP definitions. Simplified Windows model section to reference external matrix. Updated deployment reference from DirectML-specific to ONNX Runtime guidance.
FAQs
docs/source/support/2_faqs.rst
Minor wording refinements; added caution about CuPy compatibility with CUDA toolkit.
Examples—Windows Root
examples/windows/README.md
Updated deployment reference from DirectML to ONNX Runtime. Replaced single support matrix reference with table listing model types and corresponding links.
Examples—GenAI LLM
examples/windows/onnx_ptq/genai_llm/README.md
Major restructuring with Table of Contents, expanded Overview (added TensorRT-RTX/CUDA backends), new Setup and dedicated Quantization sections. Replaced Command Line Arguments with comprehensive Arguments section including new options (\-\-output_path, \-\-use_zero_point, \-\-block_size, \-\-awqlite_alpha_step, etc.). Expanded example command with ONNX path and flags. Reorganized Evaluate and Deployment sections. Replaced support matrix with detailed table and GenAI note. Added Troubleshoot section.
Examples—SAM2 & Whisper
examples/windows/onnx_ptq/sam2/README.md, examples/windows/onnx_ptq/whisper/README.md
Added new Support Matrix sections in TOC and as dedicated sections with tables for INT8/FP8 modes and explanatory notes. No logic changes.
Examples—Diffusers
examples/windows/torch_onnx/diffusers/README.md
Renamed "Quantization Support Matrix" to "Support Matrix". Reformatted table. Replaced external link reference with inline NVFP4 performance notes and new footnotes on Blackwell GPU requirements and RAM recommendations for Flux.1.Dev.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Modelopt-windows documentation update' is vague and generic, using the non-descriptive term 'update' without conveying specific details about the primary changes. Consider a more specific title that highlights key changes, such as 'Add ONNX Runtime execution provider support and update Windows documentation' or 'Update Windows documentation for TensorRT-RTX and CUDA support'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
docs/source/getting_started/windows/_installation_for_Windows.rst (1)

18-18: Clarify CUDA version requirements.

The system requirements table specifies CUDA >=12.0 (Line 18), while the note mentions CUDA-12.8+ for Blackwell GPU support (Line 28). This may confuse users about the actual minimum CUDA version required.

Consider clarifying whether:

  • CUDA 12.0 is the general minimum, with 12.8+ needed only for Blackwell GPUs
  • Or if the table should be updated to reflect 12.8+ as the universal minimum

Also applies to: 28-28

docs/source/deployment/2_onnxruntime.rst (1)

42-42: Fix double slash in URL.

The URL contains a double slash before the closing: python// should be python/.

🔗 Proposed fix
-- Explore `inference scripts <https://github.com/microsoft/onnxruntime-genai/tree/main/examples/python//>`_ in the ORT GenAI example repository for generating output sequences using a single function call.
+- Explore `inference scripts <https://github.com/microsoft/onnxruntime-genai/tree/main/examples/python/>`_ in the ORT GenAI example repository for generating output sequences using a single function call.
🤖 Fix all issues with AI agents
In `@CHANGELOG-Windows.rst`:
- Line 15: Replace the misspelled link text "Perlexity" with the correct
spelling "Perplexity" in the CHANGELOG entry (the link label that currently
reads `Perlexity
<https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_);
ensure the link URL and formatting remain unchanged and only the visible label
is corrected.

In `@docs/source/getting_started/1_overview.rst`:
- Line 14: The Markdown link for ModelOpt-Windows uses a mixed tree/file path
causing a redirect; update the URL in the sentence that references
`ModelOpt-Windows` to use the correct GitHub blob path
`https://github.com/NVIDIA/Model-Optimizer/blob/main/examples/windows/README.md`
or point to the directory
`https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows` so the
link resolves directly without a 301 redirect.

In `@docs/source/getting_started/windows/_installation_standalone.rst`:
- Line 51: There is a typo in the sentence that reads 'The default CUDA version
neeedd for *onnxruntime-gpu* since v1.19.0 is 12.x.' — change "neeedd" to
"needed" so it reads 'The default CUDA version needed for *onnxruntime-gpu*
since v1.19.0 is 12.x.' Update the sentence where "ModelOpt-Windows installs
*onnxruntime-gpu*" is mentioned to correct that single word.

In `@docs/source/getting_started/windows/_installation_with_olive.rst`:
- Line 65: Replace the broken GitHub link target in the rst line that currently
reads "overview
<https://github.com/microsoft/Olive/blob/main/docs/architecture.md>"_ with the
Olive docs site URL (for example "overview
<https://microsoft.github.io/Olive/>"_), keeping the visible link text the same
so the sentence points to the actual Olive architecture documentation.

In `@docs/source/guides/0_support_matrix.rst`:
- Line 101: Replace the incorrect GitHub anchor URL in the README line
referencing the model support matrix (the text containing
"https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows#support-matrix")
with the official Model Optimizer Windows installation/support page URL that
documents supported platform requirements and GPU specifications; update the
link target in the phrase "details <...>" so it points directly to the official
Windows installation/support documentation rather than the GitHub examples
anchor.

In `@examples/windows/onnx_ptq/genai_llm/README.md`:
- Line 32: Update the README sentence to tighten wording and fix typos: change
"ONNX GenAI compatible" to "GenAI-compatible", use "precision" consistently
(e.g., "select precision" or "precision level"), and replace informal phrasing
like "choose from" with "select from" for clarity; apply the same edits to the
other occurrences mentioned (the paragraphs around the same phrasing at the
other locations) so all instances use "GenAI-compatible", consistent "precision"
wording, and "select from" phrasing for a uniform, clearer README.
- Around line 56-57: The README lists the `--dataset` supported values as "cnn,
pilevel" but the description calls it "pile-val"; pick one canonical value
(recommend "pile-val") and update the `--dataset` supported-values list and the
descriptive text to match, and also search for any validation or flag-parsing
logic that references `pilevel` and update it to the chosen canonical token so
the flag, description, and code all match (`--dataset`, cnn, pile-val).
🧹 Nitpick comments (5)
examples/windows/torch_onnx/diffusers/README.md (1)

95-109: Consider improving clarity and consistency.

The Support Matrix section rename improves consistency with other README files, and the new footnotes provide valuable context. However, consider the following refinements:

  1. Line 109: The note about "some known performance issues with NVFP4 model execution" is vague. Consider being more specific about what issues users might encounter or providing a reference to a tracking issue.

  2. Lines 103, 105: Footnote formatting is inconsistent - these lines lack ending punctuation while line 107 includes a period.

♻️ Suggested improvements
-> *<sup>1.</sup> NVFP4 inference requires Blackwell GPUs for speedup.*
+> *<sup>1.</sup> NVFP4 inference requires Blackwell GPUs for speedup.*

-> *<sup>2.</sup> It is recommended to enable cpu-offloading and have 128+ GB of system RAM for quantizing Flux.1.Dev on RTX5090.*
+> *<sup>2.</sup> It is recommended to enable cpu-offloading and have 128+ GB of system RAM for quantizing Flux.1.Dev on RTX5090.*

-> *There are some known performance issues with NVFP4 model execution using TRTRTX EP. Stay tuned for further updates!*
+> *NVFP4 model execution using TRTRTX EP has known performance limitations. Stay tuned for further updates!*
CHANGELOG-Windows.rst (1)

14-14: Consider more descriptive link text.

The link text "example script" could be more descriptive, similar to line 13's "example for GenAI LLMs". Consider something like "diffusion models quantization example" for consistency and clarity.

docs/source/getting_started/windows/_installation_standalone.rst (1)

72-76: Minor: Consider consistent capitalization in verification checklist.

The verification item "Onnxruntime Package" uses different capitalization compared to other items like "Python Interpreter" and "Task Manager" (title case). Consider using "ONNX Runtime Package" for consistency.

docs/source/deployment/2_onnxruntime.rst (2)

9-16: Good addition of multi-EP support overview.

The execution provider descriptions effectively communicate the options available to users. The guidance to select based on model, hardware, and deployment requirements is helpful.

Optional: Consider clarifying DirectML EP scope.

Line 12's description "Enables deployment on a wide range of GPUs" could be more specific about which GPU vendors (e.g., AMD, Intel, NVIDIA) or hardware generations are supported to help users make informed decisions.


32-34: Clarify that EP constraints are build-optimization specific, not inherent to ONNX portability.

The note's core guidance—rebuild/re-export models for different EPs—is sound practice for ONNX Runtime GenAI. However, the explanation should be more precise: models are constrained to their export EP+precision combination because the GenAI model builder produces optimizations specific to that configuration, not because ONNX itself prevents cross-EP portability. While the underlying ONNX/ORT framework supports heterogeneous execution across EPs, GenAI's build process outputs precision- and EP-optimized artifacts that don't always transfer directly. Refine the note to clarify this is a practical build/optimization constraint (rebuild when targeting a different EP) rather than an inherent incompatibility, and optionally reference the model builder's documented EP/precision support matrix.


- Add support for ONNX Mixed Precision Weight-only quantization using INT4 and INT8 precisions. Refer quantization `example for GenAI LLMs <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/onnx_ptq/genai_llm>`_.
- Add support for some diffusion models' quantization on Windows. Refer `example script <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/torch_onnx/diffusers>`_ for details.
- Add `Perlexity <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_ and `KL-Divergence <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/kl_divergence_metrics>`_ accuracy benchmarks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo: "Perlexity" should be "Perplexity".

The word "Perlexity" is misspelled and should be "Perplexity".

🐛 Proposed fix
-- Add `Perlexity <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_ and `KL-Divergence <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/kl_divergence_metrics>`_ accuracy benchmarks.
+- Add `Perplexity <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_ and `KL-Divergence <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/kl_divergence_metrics>`_ accuracy benchmarks.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- Add `Perlexity <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_ and `KL-Divergence <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/kl_divergence_metrics>`_ accuracy benchmarks.
- Add `Perplexity <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_ and `KL-Divergence <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/kl_divergence_metrics>`_ accuracy benchmarks.
🤖 Prompt for AI Agents
In `@CHANGELOG-Windows.rst` at line 15, Replace the misspelled link text
"Perlexity" with the correct spelling "Perplexity" in the CHANGELOG entry (the
link label that currently reads `Perlexity
<https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_);
ensure the link URL and formatting remain unchanged and only the visible label
is corrected.

techniques to produce optimized & quantized checkpoints. Seamlessly integrated within the NVIDIA AI software ecosystem, the quantized checkpoint generated from Model Optimizer is ready for deployment in downstream inference frameworks like `TensorRT-LLM <https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/quantization>`_ or `TensorRT <https://github.com/NVIDIA/TensorRT>`_ (Linux). ModelOpt is integrated with `NVIDIA NeMo <https://github.com/NVIDIA-NeMo/NeMo>`_ and `Megatron-LM <https://github.com/NVIDIA/Megatron-LM>`_ for training-in-the-loop optimization techniques. For enterprise users, the 8-bit quantization with Stable Diffusion is also available on `NVIDIA NIM <https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/>`_.

For Windows users, the `Model Optimizer for Windows <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/README.md>`_ (ModelOpt-Windows) delivers model compression techniques, including quantization, on Windows RTX PC systems. ModelOpt-Windows is optimized for efficient quantization, featuring local GPU calibration, reduced system and video memory consumption, and swift processing times. It integrates seamlessly with the Windows ecosystem, with optimized ONNX models as output for `Microsoft DirectML <https://github.com/microsoft/DirectML>`_ backends. Furthermore, ModelOpt-Windows supports SDKs such as `Microsoft Olive <https://github.com/microsoft/Olive>`_ and `ONNX Runtime <https://github.com/microsoft/onnxruntime>`_, enabling the deployment of quantized models across various independent hardware vendors through the DirectML path.
For Windows users, the `Model Optimizer for Windows <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/README.md>`_ (ModelOpt-Windows) delivers model compression techniques, including quantization, on Windows RTX PC systems. ModelOpt-Windows is optimized for efficient quantization, featuring local GPU calibration, reduced system and video memory consumption, and swift processing times. It integrates seamlessly with the Windows ecosystem, with optimized ONNX models as output for `Microsoft DirectML <https://github.com/microsoft/DirectML>`_ and `TensorRT-RTX <https://github.com/NVIDIA/TensorRT-RTX>`_ backends. Furthermore, ModelOpt-Windows supports SDKs such as `Microsoft Olive <https://github.com/microsoft/Olive>`_ and `ONNX Runtime <https://github.com/microsoft/onnxruntime>`_, enabling the deployment of quantized models across various independent hardware vendors through the DirectML path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if the GitHub URL is accessible
echo "Testing URL: https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/README.md"
curl -s -o /dev/null -w "HTTP Status Code: %{http_code}\n" "https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/README.md"

Repository: NVIDIA/Model-Optimizer

Length of output: 179


🏁 Script executed:

#!/bin/bash
# Check the actual GitHub repository structure for NVIDIA/Model-Optimizer
# Try to access the repository and find Windows-related documentation

echo "=== Checking repository structure ==="
# Check if examples/windows directory exists using GitHub API
curl -s -L "https://api.github.com/repos/NVIDIA/Model-Optimizer/contents/examples" | grep -i windows

echo -e "\n=== Checking main branch structure ==="
# Try without the specific README.md file
curl -s -I "https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows" | head -5

echo -e "\n=== Testing alternative URL format ==="
# Test if accessing without tree/main works
curl -s -o /dev/null -w "HTTP Status Code: %{http_code}\n" "https://github.com/NVIDIA/Model-Optimizer/blob/main/examples/windows/README.md"

Repository: NVIDIA/Model-Optimizer

Length of output: 890


Correct the GitHub URL format.

The Windows documentation link uses an incorrect URL format. The URL https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/README.md mixes directory (tree) and file (blob) path patterns, which causes a 301 redirect.

Use blob instead of tree when linking to a specific file:
https://github.com/NVIDIA/Model-Optimizer/blob/main/examples/windows/README.md

Alternatively, link to the directory view without the filename:
https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows

🤖 Prompt for AI Agents
In `@docs/source/getting_started/1_overview.rst` at line 14, The Markdown link for
ModelOpt-Windows uses a mixed tree/file path causing a redirect; update the URL
in the sentence that references `ModelOpt-Windows` to use the correct GitHub
blob path
`https://github.com/NVIDIA/Model-Optimizer/blob/main/examples/windows/README.md`
or point to the directory
`https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows` so the
link resolves directly without a 301 redirect.

.. code-block:: bash
pip uninstall onnxruntime-directml
By default, ModelOpt-Windows installs *onnxruntime-gpu*. The default CUDA version neeedd for *onnxruntime-gpu* since v1.19.0 is 12.x. The *onnxruntime-gpu* package (i.e. CUDA EP) has CUDA and cuDNN dependencies:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo: "neeedd" should be "needed".

The word "neeedd" has an extra 'd' and should be "needed".

🐛 Proposed fix
-By default, ModelOpt-Windows installs *onnxruntime-gpu*. The default CUDA version neeedd for *onnxruntime-gpu* since v1.19.0 is 12.x. The *onnxruntime-gpu* package (i.e. CUDA EP) has CUDA and cuDNN dependencies:
+By default, ModelOpt-Windows installs *onnxruntime-gpu*. The default CUDA version needed for *onnxruntime-gpu* since v1.19.0 is 12.x. The *onnxruntime-gpu* package (i.e. CUDA EP) has CUDA and cuDNN dependencies:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
By default, ModelOpt-Windows installs *onnxruntime-gpu*. The default CUDA version neeedd for *onnxruntime-gpu* since v1.19.0 is 12.x. The *onnxruntime-gpu* package (i.e. CUDA EP) has CUDA and cuDNN dependencies:
By default, ModelOpt-Windows installs *onnxruntime-gpu*. The default CUDA version needed for *onnxruntime-gpu* since v1.19.0 is 12.x. The *onnxruntime-gpu* package (i.e. CUDA EP) has CUDA and cuDNN dependencies:
🤖 Prompt for AI Agents
In `@docs/source/getting_started/windows/_installation_standalone.rst` at line 51,
There is a typo in the sentence that reads 'The default CUDA version neeedd for
*onnxruntime-gpu* since v1.19.0 is 12.x.' — change "neeedd" to "needed" so it
reads 'The default CUDA version needed for *onnxruntime-gpu* since v1.19.0 is
12.x.' Update the sentence where "ModelOpt-Windows installs *onnxruntime-gpu*"
is mentioned to correct that single word.


#. Currently, the Model Optimizer - Windows only supports Onnx Runtime GenAI based LLM models in the Olive workflow.
#. To try out different LLMs and EPs in the Olive workflow of ModelOpt-Windows, refer the details provided in `phi3 <https://github.com/microsoft/Olive/tree/main/examples/phi3#quantize-models-with-nvidia-Model-Optimizer>`_ Olive example.
#. For overview about Olive workflow, refer this `overview <https://github.com/microsoft/Olive/blob/main/docs/architecture.md>`_ guide.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if the Olive architecture documentation URL is accessible
curl -s -L -I "https://github.com/microsoft/Olive/blob/main/docs/architecture.md" | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 5037


🌐 Web query:

Olive Microsoft GitHub repository documentation architecture.md

💡 Result:

I couldn’t find an architecture.md (or ARCHITECTURE.md) file in the microsoft/Olive GitHub repository’s default branch. The repo’s documentation is primarily published on the Olive docs site, and the closest “architecture” description is the Overview page, which explains Olive’s core concepts (workflow, passes, evaluators/metrics, and search/auto-tuning strategy). [1], [2]

Where to look instead

  • Repo home / docs entry point: microsoft/Olive [1]
  • Conceptual architecture (“How Olive works”): Olive docs → Overview [2]

Fix the Olive documentation link.

The referenced URL points to a non-existent file. The microsoft/Olive repository does not contain docs/architecture.md at the specified path. Update the link to point to the actual Olive architecture documentation available on the Olive docs site.

🤖 Prompt for AI Agents
In `@docs/source/getting_started/windows/_installation_with_olive.rst` at line 65,
Replace the broken GitHub link target in the rst line that currently reads
"overview <https://github.com/microsoft/Olive/blob/main/docs/architecture.md>"_
with the Olive docs site URL (for example "overview
<https://microsoft.github.io/Olive/>"_), keeping the visible link text the same
so the sentence points to the actual Olive architecture documentation.

- ``ONNX INT8 Max`` means INT8 (W8A8) quantization of ONNX model using Max calibration. Similar holds true for the term ``ONNX FP8 Max``.
- The LLMs in above table are `GenAI <https://github.com/microsoft/onnxruntime-genai/>`_ built LLMs unless specified otherwise.
- Check `examples <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/onnx_ptq/>`_ for specific instructions and scripts.
Please checkout the model support matrix `details <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows#support-matrix>`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

GitHub NVIDIA Model-Optimizer examples windows support-matrix

💡 Result:

  • GitHub examples (Windows): NVIDIA Model Optimizer keeps Windows-specific example scripts under examples/windows in the NVIDIA/Model-Optimizer repo, linked from the main README’s “Post Training Quantization → windows” examples entry. [1]
  • Support matrix (Windows / “ModelOpt-Windows”): The Windows platform requirements are documented in the Windows installation page (acts as the practical support matrix): Windows + amd64, Python >=3.10,<3.13, CUDA >=12.0, ONNX Runtime 1.20.0, NVIDIA driver 565.90+, and NVIDIA GPU RTX 40 / 50 series (single-GPU supported). [2]
  • Windows quantization feature notes: The Windows quick-start covers ONNX PTQ (incl. INT4 AWQ) and points to the Support Matrix for supported features/models; it also notes ONNX opset requirements (FP8 needs opset 19+, INT4 needs opset 21+). [3]

Pointers (paths):

NVIDIA/Model-Optimizer/examples/windows
Docs: Installation for Windows (ModelOpt-Windows)
Docs: Quick Start: Quantization (Windows)

Sources: [1] (github.com) [2] (nvidia.github.io) [3] (nvidia.github.io)

Citations:


🏁 Script executed:

#!/bin/bash
# Check the GitHub URL structure and verify the repository exists
curl -s -I "https://github.com/NVIDIA/Model-Optimizer" | head -1

Repository: NVIDIA/Model-Optimizer

Length of output: 188


Update the Windows model support matrix link to point to the official documentation.

The current link references https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows#support-matrix, but the actual Windows support matrix is documented in the official Model Optimizer documentation at the Windows installation page, not as an anchor on the GitHub examples directory. Update the reference to point directly to the supported platform requirements and GPU specifications.

🤖 Prompt for AI Agents
In `@docs/source/guides/0_support_matrix.rst` at line 101, Replace the incorrect
GitHub anchor URL in the README line referencing the model support matrix (the
text containing
"https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows#support-matrix")
with the official Model Optimizer Windows installation/support page URL that
documents supported platform requirements and GPU specifications; update the
link target in the phrase "details <...>" so it points directly to the official
Windows installation/support documentation rather than the GitHub examples
anchor.

### Prepare ORT-GenAI Compatible Base Model
## Prepare ORT-GenAI Compatible Base Model

You may generate the base model using the model builder that comes with onnxruntime-genai. The ORT-GenAI's [model-builder](https://github.com/microsoft/onnxruntime-genai/tree/main/src/python/py/models) downloads the original Pytorch model from Hugging Face, and produces an ONNX GenAI compatible base model in ONNX format. See example command-line below:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tighten wording/typos for clarity.
A few small fixes will reduce confusion (e.g., “GenAI-compatible”, “precision”, “choose from”).

✏️ Suggested edits
-... produces an ONNX GenAI compatible base model in ONNX format.
+... produces an ONNX GenAI-compatible base model in ONNX format.

-| `--enable_mixed_quant` | Default: disabled mixed quant | Use this option to enable mixed precsion quantization|
-| `--layers_8bit` | Default: None | Use this option to Overrides default mixed quant strategy|
+| `--enable_mixed_quant` | Default: disabled mixed quant | Use this option to enable mixed precision quantization|
+| `--layers_8bit` | Default: None | Use this option to override the default mixed-quant strategy|

-1. For the `algo` argument, we have following options to choose form: awq_lite, awq_clip, rtn, rtn_dq.
+1. For the `algo` argument, we have following options to choose from: awq_lite, awq_clip, rtn, rtn_dq.

-> *All LLMs in the above table are [GenAI](https://github.com/microsoft/onnxruntime-genai/) built LLMs.*
+> *All LLMs in the above table are [GenAI](https://github.com/microsoft/onnxruntime-genai/) built LLMs.*

Also applies to: 70-71, 83-83, 130-130

🧰 Tools
🪛 LanguageTool

[grammar] ~32-~32: Use a hyphen to join words.
Context: ...Hugging Face, and produces an ONNX GenAI compatible base model in ONNX format. Se...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents
In `@examples/windows/onnx_ptq/genai_llm/README.md` at line 32, Update the README
sentence to tighten wording and fix typos: change "ONNX GenAI compatible" to
"GenAI-compatible", use "precision" consistently (e.g., "select precision" or
"precision level"), and replace informal phrasing like "choose from" with
"select from" for clarity; apply the same edits to the other occurrences
mentioned (the paragraphs around the same phrasing at the other locations) so
all instances use "GenAI-compatible", consistent "precision" wording, and
"select from" phrasing for a uniform, clearer README.

Comment on lines 56 to 57
| `--dataset` | cnn (default), pilevel | Choose calibration dataset: cnn_dailymail or pile-val. |
| `--algo` | awq_lite (default), awq_clip, rtn, rtn_dq | Select the quantization algorithm. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix dataset value mismatch (pilevel vs pile-val).
The supported value list says pilevel, but the description refers to “pile-val”. Pick one canonical flag/value so users don’t pass an invalid option.

✅ Suggested edit
-| `--dataset` | cnn (default), pilevel | Choose calibration dataset: cnn_dailymail or pile-val. |
+| `--dataset` | cnn (default), pileval | Choose calibration dataset: cnn_dailymail or pile-val. |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| `--dataset` | cnn (default), pilevel | Choose calibration dataset: cnn_dailymail or pile-val. |
| `--algo` | awq_lite (default), awq_clip, rtn, rtn_dq | Select the quantization algorithm. |
| `--dataset` | cnn (default), pileval | Choose calibration dataset: cnn_dailymail or pile-val. |
| `--algo` | awq_lite (default), awq_clip, rtn, rtn_dq | Select the quantization algorithm. |
🤖 Prompt for AI Agents
In `@examples/windows/onnx_ptq/genai_llm/README.md` around lines 56 - 57, The
README lists the `--dataset` supported values as "cnn, pilevel" but the
description calls it "pile-val"; pick one canonical value (recommend "pile-val")
and update the `--dataset` supported-values list and the descriptive text to
match, and also search for any validation or flag-parsing logic that references
`pilevel` and update it to the chosen canonical token so the flag, description,
and code all match (`--dataset`, cnn, pile-val).

@codecov
Copy link

codecov bot commented Jan 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.17%. Comparing base (2a08622) to head (23cd1e8).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #812   +/-   ##
=======================================
  Coverage   74.17%   74.17%           
=======================================
  Files         192      192           
  Lines       19246    19246           
=======================================
  Hits        14276    14276           
  Misses       4970     4970           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant