feat: add MobiusModelBuilder Olive pass for mobius-backed ONNX export by justinchuby · Pull Request #2406 · microsoft/Olive

justinchuby · 2026-04-09T21:24:07Z

Summary

Adds a new MobiusModelBuilder Olive pass that wraps the mobius package's build() function to produce ONNX models directly from HuggingFace model IDs.

What this does

Single-component models (LLMs): returns ONNXModelHandler
Multi-component models (VLMs, encoder-decoders): returns CompositeModelHandler with one ONNXModelHandler per component
EP auto-detection: maps Olive accelerator spec → mobius EP (cpu/cuda/dml/webgpu)
Precision: fp32 (default), fp16, bf16
trust_remote_code passthrough

Files

File	Description
`olive/passes/onnx/mobius_model_builder.py`	Pass implementation
`olive/olive_config.json`	Registered as `MobiusModelBuilder`
`examples/gemma4/gemma4_int4_pipeline.json`	Example: Gemma4-E2B CUDA/fp16 → INT4
`examples/gemma4/gemma4_fp32_cpu.json`	Example: Gemma4-E2B CPU/fp32
`test/passes/onnx/test_mobius_model_builder.py`	10 unit tests

E2E Test

Verified with HuggingFaceTB/SmolLM2-135M-Instruct (135M llama-type, locally cached):

olive run --config /tmp/olive_e2e_test.json

Result:

Pipeline executed without errors
model.onnx (289KB) + model.onnx.data (1.2GB) produced
onnxruntime.InferenceSession loaded successfully
Correct causal-LM I/O: input_ids/attention_mask/position_ids + 30×past_kv → logits + 30×present_kv

Test coverage

10 passed in 2.0s

Tests cover: single-component, multi-component, EP auto-detection, EP override, precision mapping, non-HF model rejection, missing mobius error, all 4 EP map entries.

Notes

E2B/E4B are Any-to-Any (vision+audio+text); 26B-A4B/31B are Image-Text-to-Text only

Adds a new Olive pass that wraps mobius's build() function to produce ONNX models directly from HuggingFace model IDs. - Single-component models (LLMs) → ONNXModelHandler - Multi-component models (VLMs, encoder-decoders) → CompositeModelHandler - EP auto-detected from Olive accelerator spec (cpu/cuda/dml/webgpu) - Precision: fp32 (default), fp16, bf16 - Registered in olive_config.json as 'MobiusModelBuilder' - Example pipeline config: examples/gemma4/gemma4_int4_pipeline.json - 10 unit tests covering single/multi-component, EP detection, and error cases Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

- test_ep_map_covers_common_providers now asserts DML and WebGPU in addition to CPU and CUDA, verifying full EP coverage - Add examples/gemma4/gemma4_fp32_cpu.json showing CPU/fp32 deployment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

Use official model IDs: - google/gemma-4-E2B-it and google/gemma-4-E4B-it: Any-to-Any (vision + audio + text) - google/gemma-4-26B-A4B-it and google/gemma-4-31B-it: Image-Text to Text only (no audio encoder) Updated both example configs to use google/gemma-4-E2B-it and added comment strings documenting the audio-capable vs image-only distinction. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

…lds) Fix invalid RunConfig fields in both example configs: - Remove output_name and system (not valid engine fields) - Move target reference to engine.target - Use log_severity_level=1 Verified E2E with HuggingFaceTB/SmolLM2-135M-Instruct: - olive run completed successfully - model.onnx + model.onnx.data produced - ORT loaded the model, correct causal-LM I/O (input_ids -> logits + KV cache) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

Copilot

Pull request overview

Adds a new ONNX pass (MobiusModelBuilder) that uses the mobius package to build ONNX models directly from HuggingFace model IDs, returning either a single ONNXModelHandler or a CompositeModelHandler for multi-component exports.

Changes:

Introduces olive/passes/onnx/mobius_model_builder.py implementing the new pass (EP mapping, precision mapping, trust_remote_code passthrough).
Registers the pass in olive/olive_config.json and adds two Gemma4 example run configs.
Adds unit tests for single-component, multi-component, EP selection, and error paths.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`olive/passes/onnx/mobius_model_builder.py`	New pass wrapping `mobius.build()` and emitting Olive model handlers.
`olive/olive_config.json`	Registers `MobiusModelBuilder` and declares extras for its dependencies.
`examples/gemma4/gemma4_int4_pipeline.json`	Example pipeline: mobius export (fp16 CUDA) then INT4 quantization.
`examples/gemma4/gemma4_fp32_cpu.json`	Example pipeline: mobius export (fp32 CPU).
`test/passes/onnx/test_mobius_model_builder.py`	New unit tests for config, handler types, EP mapping, and missing dependency behavior.

olive/olive_config.json

test/passes/onnx/test_mobius_model_builder.py

olive/passes/onnx/mobius_model_builder.py

examples/gemma4/gemma4_fp32_cpu.json

examples/gemma4/gemma4_int4_cuda.json

olive/passes/onnx/mobius_model_builder.py

test/passes/onnx/test_mobius_model_builder.py

- _PRECISION_TO_DTYPE: add inline comments explaining each dtype string (f32 = float32, f16 = float16, bf16 = bfloat16) and when to use a downstream quantization pass for INT4/INT8 instead - Remove explicit execution_provider from CUDA example config so both gemma4 configs consistently rely on auto-detection from the accelerator spec; the CPU config already did this - olive_config.json: add mobius-genai to top-level extra_dependencies map so 'olive run' can surface the install hint; remove onnx_ir (transitive dep of mobius-genai) from the pass entry - Move AcceleratorSpec import to TYPE_CHECKING block (RUFF TC001) — safe because the file already has 'from __future__ import annotations' - Use X | Y union syntax instead of Union[X, Y] (RUFF UP007) - Remove redundant 'import onnx_ir' check; ImportError message now correctly says 'pip install mobius-genai' (PYLINT W0611) - Rename unused _fake_pkg 'output_dir' param to '_output_dir' to suppress lint warning (PYLINT W0613) - Wrap long AcceleratorSpec(…) lines to stay under 120 chars (RUFF format) - Collapse nested 'with' into single 'with' (RUFF SIM117) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

- EP_MAP: tighten annotation to ClassVar[dict[ExecutionProvider, str]] (keys are enum instances, not plain strings) - olive_config.json: add onnx-ir (correct pip hyphenated name) to both the pass extra_dependencies and the top-level extra_dependencies map; was previously using wrong underscore spelling 'onnx_ir' - Rename examples/gemma4/gemma4_int4_pipeline.json -> gemma4_int4_cuda.json so both example configs follow the same {precision}_{device}.json naming pattern - _patch_build: expand docstring explaining why 'mobius.build' is the correct patch target (lazy import inside function body, not module-level) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

…delBuilder - After pkg.save(), verify each expected model.onnx exists and raise RuntimeError with a clear message if missing (single-component and per-component in multi-component paths) - Log a WARNING when trust_remote_code=True is passed so users are reminded to only use this with trusted model sources - Add 4 new tests: missing output raises RuntimeError (single and multi-component), trust_remote_code warning emitted, no warning when False (14/14 passing) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

examples/gemma4/gemma4_int4_cuda.json

- Add module-scoped _stub_mobius_module fixture that injects a fake 'mobius' stub into sys.modules when the package is not installed, ensuring patch('mobius.build') works in Olive CI without mobius-genai - Add '# pylint: disable=protected-access' on _default_config test line (PYLINT W0212 — intentional test access to a pass internals method) - Add '# noqa: PLC0415' on lazy 'from mobius import build' inside _run_for_config — import is intentionally deferred to surface a clear ImportError only when the pass actually runs - Run 'lintrunner -a' to auto-apply RUFF-FORMAT and FORMAT-JSON patches on mobius_model_builder.py, test file, and both example configs - 14/14 tests pass Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

olive/passes/onnx/mobius_model_builder.py

Change all references from 'mobius-genai' to 'mobius-ai': - olive_config.json: extra_dependencies key/value and top-level mapping - mobius_model_builder.py: docstring install snippet and ImportError message - test file: fixture docstring comment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

olive/passes/onnx/mobius_model_builder.py

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

examples/gemma4/gemma4_int4_cuda.json

olive/passes/onnx/mobius_model_builder.py

lintrunner auto-fixed RUF100 (unused noqa directive) across 15 files. The PLC0415 noqa in mobius_model_builder.py was stale — ruff does not enable PLC0415 in this repo, so the directive was unused. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

jambayk · 2026-04-10T18:21:15Z

examples/gemma4/gemma4_fp32_cpu.json

we don't keep examples in this repo anymore. can you create an accompanying PR in microsoft/olive-recipes?

I am just using this PR to iterate the files. Could you comment on whether there are errors or changes needed? I will move the files over once stable.

some comments:

have never tried adding an unused field like "comment" in the config. don't remember if the config accepts unknown field.

engine level nesting is optional. most examples we have now don't use it for cleaner configs

cpu ep is the default the system doesn't need to be added explicitly

olive/passes/onnx/mobius_model_builder.py

jambayk · 2026-04-10T18:26:48Z

olive/passes/onnx/mobius_model_builder.py

+                ),
+            ),
+            "execution_provider": PassConfigParam(
+                type_=str,


we could create an enum of the supported eps for automatic validation like in

Olive/olive/passes/pytorch/autoawq.py

Line 27 in 8b1957e

class ModelDtype(StrEnumBase):

.
unless you think the options might keep growing and it would be hard to keep it in sync across versions

olive/passes/onnx/mobius_model_builder.py

jambayk · 2026-04-10T19:16:24Z

examples/gemma4/gemma4_int4_cuda.json

+    },
+    "passes": {
+        "mobius_build": { "type": "MobiusModelBuilder", "precision": "fp16" },
+        "int4_quantize": { "type": "GptqQuantizer", "bits": 4, "group_size": 128, "sym": true }


the quantization pass works in a pytorch model and should be run before mobius

Is there a pass I can use to quantize after the model? I would like to use that as a an example for now

you can use the rtn pass

{ "type": "rtn", "bits": 4, "sym": false, "group_size": 32, "embeds": true, "lm_head": true }

oh, i misread the comment. you can use the blockwise quantizer pass:

{ "type": "OnnxBlockWiseRtnQuantization", "block_size": 128, "is_symmetric": true, "accuracy_level": 4, "save_as_external_data": true }

Thanks. And it will process all components together?

if the model is composite it should run the quantizer on each component and return a new composite model.

…files to model_attributes Agent-Logs-Url: https://github.com/microsoft/Olive/sessions/d99664b1-ed7e-44a8-b3a1-4efbc09c7259 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

…ify test docstring Agent-Logs-Url: https://github.com/microsoft/Olive/sessions/d99664b1-ed7e-44a8-b3a1-4efbc09c7259 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

jambayk · 2026-04-10T20:09:22Z

olive/cli/optimize.py

 # Licensed under the MIT License.
 # --------------------------------------------------------------------------

-# ruff: noqa: T201


nit: maybe we could do this removal in a different PR? these make the PR seem bigger than it actually is.

sorry I ran lintrunner in the wrong repo 😅 Looks like Olive's ruff version can be updated

Justin Chu and others added 4 commits April 9, 2026 14:04

Copilot AI review requested due to automatic review settings April 9, 2026 21:24

Copilot started reviewing on behalf of justinchuby April 9, 2026 21:25 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

github-advanced-security bot found potential problems Apr 9, 2026

View reviewed changes

justinchuby marked this pull request as draft April 9, 2026 22:00

Justin Chu and others added 4 commits April 9, 2026 19:56

docs: clarify _patch_build comment on lazy import patch target

8c1259c

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

github-advanced-security bot found potential problems Apr 10, 2026

View reviewed changes

examples/gemma4/gemma4_int4_cuda.json Fixed Show fixed Hide fixed

justinchuby self-assigned this Apr 10, 2026

github-advanced-security bot found potential problems Apr 10, 2026

View reviewed changes

olive/passes/onnx/mobius_model_builder.py Fixed Show fixed Hide fixed

olive/passes/onnx/mobius_model_builder.py Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Apr 10, 2026

View reviewed changes

olive/passes/onnx/mobius_model_builder.py Fixed Show fixed Hide fixed

justinchuby requested a review from Copilot April 10, 2026 17:36

Copilot started reviewing on behalf of justinchuby April 10, 2026 17:37 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

examples/gemma4/gemma4_int4_cuda.json Show resolved Hide resolved

examples/gemma4/gemma4_int4_cuda.json Show resolved Hide resolved

olive/passes/onnx/mobius_model_builder.py Outdated Show resolved Hide resolved

olive/passes/onnx/mobius_model_builder.py Show resolved Hide resolved

justinchuby requested review from jambayk and xiaoyu-work April 10, 2026 17:54

jambayk reviewed Apr 10, 2026

View reviewed changes

olive/passes/onnx/mobius_model_builder.py Outdated Show resolved Hide resolved

jambayk reviewed Apr 10, 2026

View reviewed changes

olive/passes/onnx/mobius_model_builder.py Show resolved Hide resolved

Copilot started work on behalf of justinchuby April 10, 2026 19:06 View session

jambayk reviewed Apr 10, 2026

View reviewed changes

Copilot AI and others added 2 commits April 10, 2026 19:18

fix: get trust_remote_code from model load_kwargs and add additional_…

be13f27

…files to model_attributes Agent-Logs-Url: https://github.com/microsoft/Olive/sessions/d99664b1-ed7e-44a8-b3a1-4efbc09c7259 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

fix: use .get(key, default) over or False for trust_remote_code; clar…

ee7fbd4

…ify test docstring Agent-Logs-Url: https://github.com/microsoft/Olive/sessions/d99664b1-ed7e-44a8-b3a1-4efbc09c7259 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

Copilot finished work on behalf of justinchuby April 10, 2026 19:21

jambayk reviewed Apr 10, 2026

View reviewed changes

Conversation

justinchuby commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What this does

Files

E2E Test

Test coverage

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jambayk Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

justinchuby commented Apr 9, 2026 •

edited

Loading

jambayk Apr 10, 2026 •

edited

Loading