-
Notifications
You must be signed in to change notification settings - Fork 243
Modelopt-windows documentation update #812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: vipandya <vipandya@nvidia.com>
📝 WalkthroughWalkthroughDocumentation refactor expanding ONNX Runtime Execution Provider (EP) support on Windows beyond DirectML to include CUDA, TensorRT-RTX, and CPU options. Includes new 0.41 release notes, updated system requirements tables, revised installation guides, and refreshed support matrices across multiple documentation and example files. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
docs/source/getting_started/windows/_installation_for_Windows.rst (1)
18-18: Clarify CUDA version requirements.The system requirements table specifies CUDA
>=12.0(Line 18), while the note mentionsCUDA-12.8+for Blackwell GPU support (Line 28). This may confuse users about the actual minimum CUDA version required.Consider clarifying whether:
- CUDA 12.0 is the general minimum, with 12.8+ needed only for Blackwell GPUs
- Or if the table should be updated to reflect 12.8+ as the universal minimum
Also applies to: 28-28
docs/source/deployment/2_onnxruntime.rst (1)
42-42: Fix double slash in URL.The URL contains a double slash before the closing:
python//should bepython/.🔗 Proposed fix
-- Explore `inference scripts <https://github.com/microsoft/onnxruntime-genai/tree/main/examples/python//>`_ in the ORT GenAI example repository for generating output sequences using a single function call. +- Explore `inference scripts <https://github.com/microsoft/onnxruntime-genai/tree/main/examples/python/>`_ in the ORT GenAI example repository for generating output sequences using a single function call.
🤖 Fix all issues with AI agents
In `@CHANGELOG-Windows.rst`:
- Line 15: Replace the misspelled link text "Perlexity" with the correct
spelling "Perplexity" in the CHANGELOG entry (the link label that currently
reads `Perlexity
<https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_);
ensure the link URL and formatting remain unchanged and only the visible label
is corrected.
In `@docs/source/getting_started/1_overview.rst`:
- Line 14: The Markdown link for ModelOpt-Windows uses a mixed tree/file path
causing a redirect; update the URL in the sentence that references
`ModelOpt-Windows` to use the correct GitHub blob path
`https://github.com/NVIDIA/Model-Optimizer/blob/main/examples/windows/README.md`
or point to the directory
`https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows` so the
link resolves directly without a 301 redirect.
In `@docs/source/getting_started/windows/_installation_standalone.rst`:
- Line 51: There is a typo in the sentence that reads 'The default CUDA version
neeedd for *onnxruntime-gpu* since v1.19.0 is 12.x.' — change "neeedd" to
"needed" so it reads 'The default CUDA version needed for *onnxruntime-gpu*
since v1.19.0 is 12.x.' Update the sentence where "ModelOpt-Windows installs
*onnxruntime-gpu*" is mentioned to correct that single word.
In `@docs/source/getting_started/windows/_installation_with_olive.rst`:
- Line 65: Replace the broken GitHub link target in the rst line that currently
reads "overview
<https://github.com/microsoft/Olive/blob/main/docs/architecture.md>"_ with the
Olive docs site URL (for example "overview
<https://microsoft.github.io/Olive/>"_), keeping the visible link text the same
so the sentence points to the actual Olive architecture documentation.
In `@docs/source/guides/0_support_matrix.rst`:
- Line 101: Replace the incorrect GitHub anchor URL in the README line
referencing the model support matrix (the text containing
"https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows#support-matrix")
with the official Model Optimizer Windows installation/support page URL that
documents supported platform requirements and GPU specifications; update the
link target in the phrase "details <...>" so it points directly to the official
Windows installation/support documentation rather than the GitHub examples
anchor.
In `@examples/windows/onnx_ptq/genai_llm/README.md`:
- Line 32: Update the README sentence to tighten wording and fix typos: change
"ONNX GenAI compatible" to "GenAI-compatible", use "precision" consistently
(e.g., "select precision" or "precision level"), and replace informal phrasing
like "choose from" with "select from" for clarity; apply the same edits to the
other occurrences mentioned (the paragraphs around the same phrasing at the
other locations) so all instances use "GenAI-compatible", consistent "precision"
wording, and "select from" phrasing for a uniform, clearer README.
- Around line 56-57: The README lists the `--dataset` supported values as "cnn,
pilevel" but the description calls it "pile-val"; pick one canonical value
(recommend "pile-val") and update the `--dataset` supported-values list and the
descriptive text to match, and also search for any validation or flag-parsing
logic that references `pilevel` and update it to the chosen canonical token so
the flag, description, and code all match (`--dataset`, cnn, pile-val).
🧹 Nitpick comments (5)
examples/windows/torch_onnx/diffusers/README.md (1)
95-109: Consider improving clarity and consistency.The Support Matrix section rename improves consistency with other README files, and the new footnotes provide valuable context. However, consider the following refinements:
Line 109: The note about "some known performance issues with NVFP4 model execution" is vague. Consider being more specific about what issues users might encounter or providing a reference to a tracking issue.
Lines 103, 105: Footnote formatting is inconsistent - these lines lack ending punctuation while line 107 includes a period.
♻️ Suggested improvements
-> *<sup>1.</sup> NVFP4 inference requires Blackwell GPUs for speedup.* +> *<sup>1.</sup> NVFP4 inference requires Blackwell GPUs for speedup.* -> *<sup>2.</sup> It is recommended to enable cpu-offloading and have 128+ GB of system RAM for quantizing Flux.1.Dev on RTX5090.* +> *<sup>2.</sup> It is recommended to enable cpu-offloading and have 128+ GB of system RAM for quantizing Flux.1.Dev on RTX5090.* -> *There are some known performance issues with NVFP4 model execution using TRTRTX EP. Stay tuned for further updates!* +> *NVFP4 model execution using TRTRTX EP has known performance limitations. Stay tuned for further updates!*CHANGELOG-Windows.rst (1)
14-14: Consider more descriptive link text.The link text "example script" could be more descriptive, similar to line 13's "example for GenAI LLMs". Consider something like "diffusion models quantization example" for consistency and clarity.
docs/source/getting_started/windows/_installation_standalone.rst (1)
72-76: Minor: Consider consistent capitalization in verification checklist.The verification item "Onnxruntime Package" uses different capitalization compared to other items like "Python Interpreter" and "Task Manager" (title case). Consider using "ONNX Runtime Package" for consistency.
docs/source/deployment/2_onnxruntime.rst (2)
9-16: Good addition of multi-EP support overview.The execution provider descriptions effectively communicate the options available to users. The guidance to select based on model, hardware, and deployment requirements is helpful.
Optional: Consider clarifying DirectML EP scope.
Line 12's description "Enables deployment on a wide range of GPUs" could be more specific about which GPU vendors (e.g., AMD, Intel, NVIDIA) or hardware generations are supported to help users make informed decisions.
32-34: Clarify that EP constraints are build-optimization specific, not inherent to ONNX portability.The note's core guidance—rebuild/re-export models for different EPs—is sound practice for ONNX Runtime GenAI. However, the explanation should be more precise: models are constrained to their export EP+precision combination because the GenAI model builder produces optimizations specific to that configuration, not because ONNX itself prevents cross-EP portability. While the underlying ONNX/ORT framework supports heterogeneous execution across EPs, GenAI's build process outputs precision- and EP-optimized artifacts that don't always transfer directly. Refine the note to clarify this is a practical build/optimization constraint (rebuild when targeting a different EP) rather than an inherent incompatibility, and optionally reference the model builder's documented EP/precision support matrix.
|
|
||
| - Add support for ONNX Mixed Precision Weight-only quantization using INT4 and INT8 precisions. Refer quantization `example for GenAI LLMs <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/onnx_ptq/genai_llm>`_. | ||
| - Add support for some diffusion models' quantization on Windows. Refer `example script <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/torch_onnx/diffusers>`_ for details. | ||
| - Add `Perlexity <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_ and `KL-Divergence <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/kl_divergence_metrics>`_ accuracy benchmarks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo: "Perlexity" should be "Perplexity".
The word "Perlexity" is misspelled and should be "Perplexity".
🐛 Proposed fix
-- Add `Perlexity <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_ and `KL-Divergence <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/kl_divergence_metrics>`_ accuracy benchmarks.
+- Add `Perplexity <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_ and `KL-Divergence <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/kl_divergence_metrics>`_ accuracy benchmarks.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - Add `Perlexity <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_ and `KL-Divergence <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/kl_divergence_metrics>`_ accuracy benchmarks. | |
| - Add `Perplexity <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_ and `KL-Divergence <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/kl_divergence_metrics>`_ accuracy benchmarks. |
🤖 Prompt for AI Agents
In `@CHANGELOG-Windows.rst` at line 15, Replace the misspelled link text
"Perlexity" with the correct spelling "Perplexity" in the CHANGELOG entry (the
link label that currently reads `Perlexity
<https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/perplexity_metrics>`_);
ensure the link URL and formatting remain unchanged and only the visible label
is corrected.
| techniques to produce optimized & quantized checkpoints. Seamlessly integrated within the NVIDIA AI software ecosystem, the quantized checkpoint generated from Model Optimizer is ready for deployment in downstream inference frameworks like `TensorRT-LLM <https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/quantization>`_ or `TensorRT <https://github.com/NVIDIA/TensorRT>`_ (Linux). ModelOpt is integrated with `NVIDIA NeMo <https://github.com/NVIDIA-NeMo/NeMo>`_ and `Megatron-LM <https://github.com/NVIDIA/Megatron-LM>`_ for training-in-the-loop optimization techniques. For enterprise users, the 8-bit quantization with Stable Diffusion is also available on `NVIDIA NIM <https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/>`_. | ||
|
|
||
| For Windows users, the `Model Optimizer for Windows <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/README.md>`_ (ModelOpt-Windows) delivers model compression techniques, including quantization, on Windows RTX PC systems. ModelOpt-Windows is optimized for efficient quantization, featuring local GPU calibration, reduced system and video memory consumption, and swift processing times. It integrates seamlessly with the Windows ecosystem, with optimized ONNX models as output for `Microsoft DirectML <https://github.com/microsoft/DirectML>`_ backends. Furthermore, ModelOpt-Windows supports SDKs such as `Microsoft Olive <https://github.com/microsoft/Olive>`_ and `ONNX Runtime <https://github.com/microsoft/onnxruntime>`_, enabling the deployment of quantized models across various independent hardware vendors through the DirectML path. | ||
| For Windows users, the `Model Optimizer for Windows <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/README.md>`_ (ModelOpt-Windows) delivers model compression techniques, including quantization, on Windows RTX PC systems. ModelOpt-Windows is optimized for efficient quantization, featuring local GPU calibration, reduced system and video memory consumption, and swift processing times. It integrates seamlessly with the Windows ecosystem, with optimized ONNX models as output for `Microsoft DirectML <https://github.com/microsoft/DirectML>`_ and `TensorRT-RTX <https://github.com/NVIDIA/TensorRT-RTX>`_ backends. Furthermore, ModelOpt-Windows supports SDKs such as `Microsoft Olive <https://github.com/microsoft/Olive>`_ and `ONNX Runtime <https://github.com/microsoft/onnxruntime>`_, enabling the deployment of quantized models across various independent hardware vendors through the DirectML path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if the GitHub URL is accessible
echo "Testing URL: https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/README.md"
curl -s -o /dev/null -w "HTTP Status Code: %{http_code}\n" "https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/README.md"Repository: NVIDIA/Model-Optimizer
Length of output: 179
🏁 Script executed:
#!/bin/bash
# Check the actual GitHub repository structure for NVIDIA/Model-Optimizer
# Try to access the repository and find Windows-related documentation
echo "=== Checking repository structure ==="
# Check if examples/windows directory exists using GitHub API
curl -s -L "https://api.github.com/repos/NVIDIA/Model-Optimizer/contents/examples" | grep -i windows
echo -e "\n=== Checking main branch structure ==="
# Try without the specific README.md file
curl -s -I "https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows" | head -5
echo -e "\n=== Testing alternative URL format ==="
# Test if accessing without tree/main works
curl -s -o /dev/null -w "HTTP Status Code: %{http_code}\n" "https://github.com/NVIDIA/Model-Optimizer/blob/main/examples/windows/README.md"Repository: NVIDIA/Model-Optimizer
Length of output: 890
Correct the GitHub URL format.
The Windows documentation link uses an incorrect URL format. The URL https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/README.md mixes directory (tree) and file (blob) path patterns, which causes a 301 redirect.
Use blob instead of tree when linking to a specific file:
https://github.com/NVIDIA/Model-Optimizer/blob/main/examples/windows/README.md
Alternatively, link to the directory view without the filename:
https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows
🤖 Prompt for AI Agents
In `@docs/source/getting_started/1_overview.rst` at line 14, The Markdown link for
ModelOpt-Windows uses a mixed tree/file path causing a redirect; update the URL
in the sentence that references `ModelOpt-Windows` to use the correct GitHub
blob path
`https://github.com/NVIDIA/Model-Optimizer/blob/main/examples/windows/README.md`
or point to the directory
`https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows` so the
link resolves directly without a 301 redirect.
| .. code-block:: bash | ||
| pip uninstall onnxruntime-directml | ||
| By default, ModelOpt-Windows installs *onnxruntime-gpu*. The default CUDA version neeedd for *onnxruntime-gpu* since v1.19.0 is 12.x. The *onnxruntime-gpu* package (i.e. CUDA EP) has CUDA and cuDNN dependencies: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo: "neeedd" should be "needed".
The word "neeedd" has an extra 'd' and should be "needed".
🐛 Proposed fix
-By default, ModelOpt-Windows installs *onnxruntime-gpu*. The default CUDA version neeedd for *onnxruntime-gpu* since v1.19.0 is 12.x. The *onnxruntime-gpu* package (i.e. CUDA EP) has CUDA and cuDNN dependencies:
+By default, ModelOpt-Windows installs *onnxruntime-gpu*. The default CUDA version needed for *onnxruntime-gpu* since v1.19.0 is 12.x. The *onnxruntime-gpu* package (i.e. CUDA EP) has CUDA and cuDNN dependencies:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| By default, ModelOpt-Windows installs *onnxruntime-gpu*. The default CUDA version neeedd for *onnxruntime-gpu* since v1.19.0 is 12.x. The *onnxruntime-gpu* package (i.e. CUDA EP) has CUDA and cuDNN dependencies: | |
| By default, ModelOpt-Windows installs *onnxruntime-gpu*. The default CUDA version needed for *onnxruntime-gpu* since v1.19.0 is 12.x. The *onnxruntime-gpu* package (i.e. CUDA EP) has CUDA and cuDNN dependencies: |
🤖 Prompt for AI Agents
In `@docs/source/getting_started/windows/_installation_standalone.rst` at line 51,
There is a typo in the sentence that reads 'The default CUDA version neeedd for
*onnxruntime-gpu* since v1.19.0 is 12.x.' — change "neeedd" to "needed" so it
reads 'The default CUDA version needed for *onnxruntime-gpu* since v1.19.0 is
12.x.' Update the sentence where "ModelOpt-Windows installs *onnxruntime-gpu*"
is mentioned to correct that single word.
|
|
||
| #. Currently, the Model Optimizer - Windows only supports Onnx Runtime GenAI based LLM models in the Olive workflow. | ||
| #. To try out different LLMs and EPs in the Olive workflow of ModelOpt-Windows, refer the details provided in `phi3 <https://github.com/microsoft/Olive/tree/main/examples/phi3#quantize-models-with-nvidia-Model-Optimizer>`_ Olive example. | ||
| #. For overview about Olive workflow, refer this `overview <https://github.com/microsoft/Olive/blob/main/docs/architecture.md>`_ guide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if the Olive architecture documentation URL is accessible
curl -s -L -I "https://github.com/microsoft/Olive/blob/main/docs/architecture.md" | head -20Repository: NVIDIA/Model-Optimizer
Length of output: 5037
🌐 Web query:
Olive Microsoft GitHub repository documentation architecture.md
💡 Result:
I couldn’t find an architecture.md (or ARCHITECTURE.md) file in the microsoft/Olive GitHub repository’s default branch. The repo’s documentation is primarily published on the Olive docs site, and the closest “architecture” description is the Overview page, which explains Olive’s core concepts (workflow, passes, evaluators/metrics, and search/auto-tuning strategy). [1], [2]
Where to look instead
- Repo home / docs entry point:
microsoft/Olive[1] - Conceptual architecture (“How Olive works”): Olive docs → Overview [2]
Fix the Olive documentation link.
The referenced URL points to a non-existent file. The microsoft/Olive repository does not contain docs/architecture.md at the specified path. Update the link to point to the actual Olive architecture documentation available on the Olive docs site.
🤖 Prompt for AI Agents
In `@docs/source/getting_started/windows/_installation_with_olive.rst` at line 65,
Replace the broken GitHub link target in the rst line that currently reads
"overview <https://github.com/microsoft/Olive/blob/main/docs/architecture.md>"_
with the Olive docs site URL (for example "overview
<https://microsoft.github.io/Olive/>"_), keeping the visible link text the same
so the sentence points to the actual Olive architecture documentation.
| - ``ONNX INT8 Max`` means INT8 (W8A8) quantization of ONNX model using Max calibration. Similar holds true for the term ``ONNX FP8 Max``. | ||
| - The LLMs in above table are `GenAI <https://github.com/microsoft/onnxruntime-genai/>`_ built LLMs unless specified otherwise. | ||
| - Check `examples <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows/onnx_ptq/>`_ for specific instructions and scripts. | ||
| Please checkout the model support matrix `details <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows#support-matrix>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
GitHub NVIDIA Model-Optimizer examples windows support-matrix
💡 Result:
- GitHub examples (Windows): NVIDIA Model Optimizer keeps Windows-specific example scripts under
examples/windowsin theNVIDIA/Model-Optimizerrepo, linked from the main README’s “Post Training Quantization → windows” examples entry. [1] - Support matrix (Windows / “ModelOpt-Windows”): The Windows platform requirements are documented in the Windows installation page (acts as the practical support matrix): Windows + amd64, Python
>=3.10,<3.13, CUDA>=12.0, ONNX Runtime1.20.0, NVIDIA driver565.90+, and NVIDIA GPU RTX 40 / 50 series (single-GPU supported). [2] - Windows quantization feature notes: The Windows quick-start covers ONNX PTQ (incl. INT4 AWQ) and points to the Support Matrix for supported features/models; it also notes ONNX opset requirements (FP8 needs opset 19+, INT4 needs opset 21+). [3]
Pointers (paths):
NVIDIA/Model-Optimizer/examples/windows
Docs: Installation for Windows (ModelOpt-Windows)
Docs: Quick Start: Quantization (Windows)
Sources: [1] (github.com) [2] (nvidia.github.io) [3] (nvidia.github.io)
Citations:
- 1: https://github.com/NVIDIA/Model-Optimizer
- 2: https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/windows/_installation_for_Windows.html?utm_source=openai
- 3: https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/4_quantization_windows.html?utm_source=openai
🏁 Script executed:
#!/bin/bash
# Check the GitHub URL structure and verify the repository exists
curl -s -I "https://github.com/NVIDIA/Model-Optimizer" | head -1Repository: NVIDIA/Model-Optimizer
Length of output: 188
Update the Windows model support matrix link to point to the official documentation.
The current link references https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows#support-matrix, but the actual Windows support matrix is documented in the official Model Optimizer documentation at the Windows installation page, not as an anchor on the GitHub examples directory. Update the reference to point directly to the supported platform requirements and GPU specifications.
🤖 Prompt for AI Agents
In `@docs/source/guides/0_support_matrix.rst` at line 101, Replace the incorrect
GitHub anchor URL in the README line referencing the model support matrix (the
text containing
"https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/windows#support-matrix")
with the official Model Optimizer Windows installation/support page URL that
documents supported platform requirements and GPU specifications; update the
link target in the phrase "details <...>" so it points directly to the official
Windows installation/support documentation rather than the GitHub examples
anchor.
| ### Prepare ORT-GenAI Compatible Base Model | ||
| ## Prepare ORT-GenAI Compatible Base Model | ||
|
|
||
| You may generate the base model using the model builder that comes with onnxruntime-genai. The ORT-GenAI's [model-builder](https://github.com/microsoft/onnxruntime-genai/tree/main/src/python/py/models) downloads the original Pytorch model from Hugging Face, and produces an ONNX GenAI compatible base model in ONNX format. See example command-line below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tighten wording/typos for clarity.
A few small fixes will reduce confusion (e.g., “GenAI-compatible”, “precision”, “choose from”).
✏️ Suggested edits
-... produces an ONNX GenAI compatible base model in ONNX format.
+... produces an ONNX GenAI-compatible base model in ONNX format.
-| `--enable_mixed_quant` | Default: disabled mixed quant | Use this option to enable mixed precsion quantization|
-| `--layers_8bit` | Default: None | Use this option to Overrides default mixed quant strategy|
+| `--enable_mixed_quant` | Default: disabled mixed quant | Use this option to enable mixed precision quantization|
+| `--layers_8bit` | Default: None | Use this option to override the default mixed-quant strategy|
-1. For the `algo` argument, we have following options to choose form: awq_lite, awq_clip, rtn, rtn_dq.
+1. For the `algo` argument, we have following options to choose from: awq_lite, awq_clip, rtn, rtn_dq.
-> *All LLMs in the above table are [GenAI](https://github.com/microsoft/onnxruntime-genai/) built LLMs.*
+> *All LLMs in the above table are [GenAI](https://github.com/microsoft/onnxruntime-genai/) built LLMs.*Also applies to: 70-71, 83-83, 130-130
🧰 Tools
🪛 LanguageTool
[grammar] ~32-~32: Use a hyphen to join words.
Context: ...Hugging Face, and produces an ONNX GenAI compatible base model in ONNX format. Se...
(QB_NEW_EN_HYPHEN)
🤖 Prompt for AI Agents
In `@examples/windows/onnx_ptq/genai_llm/README.md` at line 32, Update the README
sentence to tighten wording and fix typos: change "ONNX GenAI compatible" to
"GenAI-compatible", use "precision" consistently (e.g., "select precision" or
"precision level"), and replace informal phrasing like "choose from" with
"select from" for clarity; apply the same edits to the other occurrences
mentioned (the paragraphs around the same phrasing at the other locations) so
all instances use "GenAI-compatible", consistent "precision" wording, and
"select from" phrasing for a uniform, clearer README.
| | `--dataset` | cnn (default), pilevel | Choose calibration dataset: cnn_dailymail or pile-val. | | ||
| | `--algo` | awq_lite (default), awq_clip, rtn, rtn_dq | Select the quantization algorithm. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix dataset value mismatch (pilevel vs pile-val).
The supported value list says pilevel, but the description refers to “pile-val”. Pick one canonical flag/value so users don’t pass an invalid option.
✅ Suggested edit
-| `--dataset` | cnn (default), pilevel | Choose calibration dataset: cnn_dailymail or pile-val. |
+| `--dataset` | cnn (default), pileval | Choose calibration dataset: cnn_dailymail or pile-val. |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | `--dataset` | cnn (default), pilevel | Choose calibration dataset: cnn_dailymail or pile-val. | | |
| | `--algo` | awq_lite (default), awq_clip, rtn, rtn_dq | Select the quantization algorithm. | | |
| | `--dataset` | cnn (default), pileval | Choose calibration dataset: cnn_dailymail or pile-val. | | |
| | `--algo` | awq_lite (default), awq_clip, rtn, rtn_dq | Select the quantization algorithm. | |
🤖 Prompt for AI Agents
In `@examples/windows/onnx_ptq/genai_llm/README.md` around lines 56 - 57, The
README lists the `--dataset` supported values as "cnn, pilevel" but the
description calls it "pile-val"; pick one canonical value (recommend "pile-val")
and update the `--dataset` supported-values list and the descriptive text to
match, and also search for any validation or flag-parsing logic that references
`pilevel` and update it to the chosen canonical token so the flag, description,
and code all match (`--dataset`, cnn, pile-val).
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #812 +/- ##
=======================================
Coverage 74.17% 74.17%
=======================================
Files 192 192
Lines 19246 19246
=======================================
Hits 14276 14276
Misses 4970 4970 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What does this PR do?
Documentation
Overview:
Testing
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.