Skip to content

feat: add MobiusModelBuilder Olive pass for mobius-backed ONNX export#2406

Draft
justinchuby wants to merge 13 commits intomainfrom
justinchu/mobius-model-builder
Draft

feat: add MobiusModelBuilder Olive pass for mobius-backed ONNX export#2406
justinchuby wants to merge 13 commits intomainfrom
justinchu/mobius-model-builder

Conversation

@justinchuby
Copy link
Copy Markdown
Contributor

@justinchuby justinchuby commented Apr 9, 2026

Summary

Adds a new MobiusModelBuilder Olive pass that wraps the mobius package's build() function to produce ONNX models directly from HuggingFace model IDs.

What this does

  • Single-component models (LLMs): returns ONNXModelHandler
  • Multi-component models (VLMs, encoder-decoders): returns CompositeModelHandler with one ONNXModelHandler per component
  • EP auto-detection: maps Olive accelerator spec → mobius EP (cpu/cuda/dml/webgpu)
  • Precision: fp32 (default), fp16, bf16
  • trust_remote_code passthrough

Files

File Description
olive/passes/onnx/mobius_model_builder.py Pass implementation
olive/olive_config.json Registered as MobiusModelBuilder
examples/gemma4/gemma4_int4_pipeline.json Example: Gemma4-E2B CUDA/fp16 → INT4
examples/gemma4/gemma4_fp32_cpu.json Example: Gemma4-E2B CPU/fp32
test/passes/onnx/test_mobius_model_builder.py 10 unit tests

E2E Test

Verified with HuggingFaceTB/SmolLM2-135M-Instruct (135M llama-type, locally cached):

olive run --config /tmp/olive_e2e_test.json

Result:

  • Pipeline executed without errors
  • model.onnx (289KB) + model.onnx.data (1.2GB) produced
  • onnxruntime.InferenceSession loaded successfully
  • Correct causal-LM I/O: input_ids/attention_mask/position_ids + 30×past_kvlogits + 30×present_kv

Test coverage

10 passed in 2.0s

Tests cover: single-component, multi-component, EP auto-detection, EP override, precision mapping, non-HF model rejection, missing mobius error, all 4 EP map entries.

Notes

  • E2B/E4B are Any-to-Any (vision+audio+text); 26B-A4B/31B are Image-Text-to-Text only

Justin Chu and others added 4 commits April 9, 2026 14:04
Adds a new Olive pass that wraps mobius's build() function to produce
ONNX models directly from HuggingFace model IDs.

- Single-component models (LLMs) → ONNXModelHandler
- Multi-component models (VLMs, encoder-decoders) → CompositeModelHandler
- EP auto-detected from Olive accelerator spec (cpu/cuda/dml/webgpu)
- Precision: fp32 (default), fp16, bf16
- Registered in olive_config.json as 'MobiusModelBuilder'
- Example pipeline config: examples/gemma4/gemma4_int4_pipeline.json
- 10 unit tests covering single/multi-component, EP detection, and error cases

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
- test_ep_map_covers_common_providers now asserts DML and WebGPU in
  addition to CPU and CUDA, verifying full EP coverage
- Add examples/gemma4/gemma4_fp32_cpu.json showing CPU/fp32 deployment

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Use official model IDs:
- google/gemma-4-E2B-it and google/gemma-4-E4B-it: Any-to-Any
  (vision + audio + text)
- google/gemma-4-26B-A4B-it and google/gemma-4-31B-it: Image-Text
  to Text only (no audio encoder)

Updated both example configs to use google/gemma-4-E2B-it and added
comment strings documenting the audio-capable vs image-only distinction.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
…lds)

Fix invalid RunConfig fields in both example configs:
- Remove output_name and system (not valid engine fields)
- Move target reference to engine.target
- Use log_severity_level=1

Verified E2E with HuggingFaceTB/SmolLM2-135M-Instruct:
- olive run completed successfully
- model.onnx + model.onnx.data produced
- ORT loaded the model, correct causal-LM I/O (input_ids -> logits + KV cache)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Copilot AI review requested due to automatic review settings April 9, 2026 21:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new ONNX pass (MobiusModelBuilder) that uses the mobius package to build ONNX models directly from HuggingFace model IDs, returning either a single ONNXModelHandler or a CompositeModelHandler for multi-component exports.

Changes:

  • Introduces olive/passes/onnx/mobius_model_builder.py implementing the new pass (EP mapping, precision mapping, trust_remote_code passthrough).
  • Registers the pass in olive/olive_config.json and adds two Gemma4 example run configs.
  • Adds unit tests for single-component, multi-component, EP selection, and error paths.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
olive/passes/onnx/mobius_model_builder.py New pass wrapping mobius.build() and emitting Olive model handlers.
olive/olive_config.json Registers MobiusModelBuilder and declares extras for its dependencies.
examples/gemma4/gemma4_int4_pipeline.json Example pipeline: mobius export (fp16 CUDA) then INT4 quantization.
examples/gemma4/gemma4_fp32_cpu.json Example pipeline: mobius export (fp32 CPU).
test/passes/onnx/test_mobius_model_builder.py New unit tests for config, handler types, EP mapping, and missing dependency behavior.

@justinchuby justinchuby marked this pull request as draft April 9, 2026 22:00
Justin Chu and others added 4 commits April 9, 2026 19:56
- _PRECISION_TO_DTYPE: add inline comments explaining each dtype string
  (f32 = float32, f16 = float16, bf16 = bfloat16) and when to use a
  downstream quantization pass for INT4/INT8 instead
- Remove explicit execution_provider from CUDA example config so both
  gemma4 configs consistently rely on auto-detection from the accelerator
  spec; the CPU config already did this
- olive_config.json: add mobius-genai to top-level extra_dependencies map
  so 'olive run' can surface the install hint; remove onnx_ir (transitive
  dep of mobius-genai) from the pass entry
- Move AcceleratorSpec import to TYPE_CHECKING block (RUFF TC001) —
  safe because the file already has 'from __future__ import annotations'
- Use X | Y union syntax instead of Union[X, Y] (RUFF UP007)
- Remove redundant 'import onnx_ir' check; ImportError message now
  correctly says 'pip install mobius-genai' (PYLINT W0611)
- Rename unused _fake_pkg 'output_dir' param to '_output_dir' to
  suppress lint warning (PYLINT W0613)
- Wrap long AcceleratorSpec(…) lines to stay under 120 chars (RUFF format)
- Collapse nested 'with' into single 'with' (RUFF SIM117)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
- EP_MAP: tighten annotation to ClassVar[dict[ExecutionProvider, str]]
  (keys are enum instances, not plain strings)
- olive_config.json: add onnx-ir (correct pip hyphenated name) to both
  the pass extra_dependencies and the top-level extra_dependencies map;
  was previously using wrong underscore spelling 'onnx_ir'
- Rename examples/gemma4/gemma4_int4_pipeline.json ->
  gemma4_int4_cuda.json so both example configs follow the same
  {precision}_{device}.json naming pattern
- _patch_build: expand docstring explaining why 'mobius.build' is the
  correct patch target (lazy import inside function body, not module-level)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
…delBuilder

- After pkg.save(), verify each expected model.onnx exists and raise
  RuntimeError with a clear message if missing (single-component and
  per-component in multi-component paths)
- Log a WARNING when trust_remote_code=True is passed so users are
  reminded to only use this with trusted model sources
- Add 4 new tests: missing output raises RuntimeError (single and
  multi-component), trust_remote_code warning emitted, no warning
  when False (14/14 passing)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
@justinchuby justinchuby self-assigned this Apr 10, 2026
- Add module-scoped _stub_mobius_module fixture that injects a fake
  'mobius' stub into sys.modules when the package is not installed,
  ensuring patch('mobius.build') works in Olive CI without mobius-genai
- Add '# pylint: disable=protected-access' on _default_config test line
  (PYLINT W0212 — intentional test access to a pass internals method)
- Add '# noqa: PLC0415' on lazy 'from mobius import build' inside
  _run_for_config — import is intentionally deferred to surface a clear
  ImportError only when the pass actually runs
- Run 'lintrunner -a' to auto-apply RUFF-FORMAT and FORMAT-JSON patches
  on mobius_model_builder.py, test file, and both example configs
- 14/14 tests pass

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Change all references from 'mobius-genai' to 'mobius-ai':
- olive_config.json: extra_dependencies key/value and top-level mapping
- mobius_model_builder.py: docstring install snippet and ImportError message
- test file: fixture docstring comment

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

lintrunner auto-fixed RUF100 (unused noqa directive) across 15 files.
The PLC0415 noqa in mobius_model_builder.py was stale — ruff does not
enable PLC0415 in this repo, so the directive was unused.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't keep examples in this repo anymore. can you create an accompanying PR in microsoft/olive-recipes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just using this PR to iterate the files. Could you comment on whether there are errors or changes needed? I will move the files over once stable.

Copy link
Copy Markdown
Contributor

@jambayk jambayk Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments:

  • have never tried adding an unused field like "comment" in the config. don't remember if the config accepts unknown field.
  • engine level nesting is optional. most examples we have now don't use it for cleaner configs
  • cpu ep is the default the system doesn't need to be added explicitly

),
),
"execution_provider": PassConfigParam(
type_=str,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could create an enum of the supported eps for automatic validation like in

class ModelDtype(StrEnumBase):
.
unless you think the options might keep growing and it would be hard to keep it in sync across versions

},
"passes": {
"mobius_build": { "type": "MobiusModelBuilder", "precision": "fp16" },
"int4_quantize": { "type": "GptqQuantizer", "bits": 4, "group_size": 128, "sym": true }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the quantization pass works in a pytorch model and should be run before mobius

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a pass I can use to quantize after the model? I would like to use that as a an example for now

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use the rtn pass

{ 
      "type": "rtn", 
      "bits": 4, 
      "sym": false, 
      "group_size": 32,
      "embeds": true,
      "lm_head": true
  }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i misread the comment. you can use the blockwise quantizer pass:

 {
      "type": "OnnxBlockWiseRtnQuantization",
      "block_size": 128,
      "is_symmetric": true,
      "accuracy_level": 4,
      "save_as_external_data": true
  }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. And it will process all components together?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the model is composite it should run the quantizer on each component and return a new composite model.

Copilot AI and others added 2 commits April 10, 2026 19:18
…files to model_attributes

Agent-Logs-Url: https://github.com/microsoft/Olive/sessions/d99664b1-ed7e-44a8-b3a1-4efbc09c7259

Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
…ify test docstring

Agent-Logs-Url: https://github.com/microsoft/Olive/sessions/d99664b1-ed7e-44a8-b3a1-4efbc09c7259

Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
# Licensed under the MIT License.
# --------------------------------------------------------------------------

# ruff: noqa: T201
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe we could do this removal in a different PR? these make the PR seem bigger than it actually is.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I ran lintrunner in the wrong repo 😅 Looks like Olive's ruff version can be updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants