Add Qwen3 Embedding recipes (0.6B and 8B)#355
Conversation
Add Olive recipes for Qwen3-Embedding-0.6B and Qwen3-Embedding-8B models targeting CPU, CUDA, and WebGPU execution providers. - FP32 precision with include_hidden_states for embedding extraction - ModelBuilder pass for ONNX/GenAI export - MTEBEvaluator configured with STS17 benchmark task - READMEs with MTEB leaderboard scores and STS17 evaluation results Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds new Olive recipe folders for the Qwen3 embedding model variants (0.6B, 8B) across CPU/CUDA/WebGPU, including configs, environment requirements, and documentation to support FP32 export with hidden states enabled and MTEB (STS17) evaluation wiring.
Changes:
- Added CPU/CUDA/WebGPU Olive configs using
ModelBuilder(FP32) withinclude_hidden_states=1for Qwen3 embedding models. - Added per-recipe
requirements.txt,info.yaml, and auto-generatedREADME.mdfiles. - Added per-model Apache 2.0
LICENSEfiles.
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| Qwen-Qwen3-Embedding-8B/LICENSE | Adds license file for the 8B embedding model recipe set. |
| Qwen-Qwen3-Embedding-8B/cpu/requirements.txt | Python deps for running the CPU recipe and MTEB evaluation. |
| Qwen-Qwen3-Embedding-8B/cpu/README.md | CPU recipe usage + leaderboard references. |
| Qwen-Qwen3-Embedding-8B/cpu/Qwen-Qwen3-Embedding-8B_cpu_fp32.json | Olive config: HF model → ModelBuilder FP32 + MTEBEvaluator(STS17). |
| Qwen-Qwen3-Embedding-8B/cpu/info.yaml | Registers the CPU recipe metadata (device/EP/file). |
| Qwen-Qwen3-Embedding-8B/cuda/requirements.txt | Python deps for running the CUDA recipe and MTEB evaluation. |
| Qwen-Qwen3-Embedding-8B/cuda/README.md | CUDA recipe usage + leaderboard references. |
| Qwen-Qwen3-Embedding-8B/cuda/Qwen-Qwen3-Embedding-8B_cuda_fp32.json | Olive config: HF model → ModelBuilder FP32 + MTEBEvaluator(STS17) on CUDA EP. |
| Qwen-Qwen3-Embedding-8B/cuda/info.yaml | Registers the CUDA recipe metadata (device/EP/file). |
| Qwen-Qwen3-Embedding-8B/webgpu/requirements.txt | Python deps for running the WebGPU recipe and MTEB evaluation. |
| Qwen-Qwen3-Embedding-8B/webgpu/README.md | WebGPU recipe usage + leaderboard references. |
| Qwen-Qwen3-Embedding-8B/webgpu/Qwen-Qwen3-Embedding-8B_webgpu_fp32.json | Olive config: HF model → ModelBuilder FP32 + MTEBEvaluator(STS17) on WebGPU EP. |
| Qwen-Qwen3-Embedding-8B/webgpu/info.yaml | Registers the WebGPU recipe metadata (device/EP/file). |
| Qwen-Qwen3-Embedding-0.6B/LICENSE | Adds license file for the 0.6B embedding model recipe set. |
| Qwen-Qwen3-Embedding-0.6B/cpu/requirements.txt | Python deps for running the CPU recipe and MTEB evaluation. |
| Qwen-Qwen3-Embedding-0.6B/cpu/README.md | CPU recipe usage + leaderboard references + STS17 eval results section. |
| Qwen-Qwen3-Embedding-0.6B/cpu/Qwen-Qwen3-Embedding-0.6B_cpu_fp32.json | Olive config: HF model → ModelBuilder FP32 + MTEBEvaluator(STS17). |
| Qwen-Qwen3-Embedding-0.6B/cpu/info.yaml | Registers the CPU recipe metadata (device/EP/file). |
| Qwen-Qwen3-Embedding-0.6B/cuda/requirements.txt | Python deps for running the CUDA recipe and MTEB evaluation. |
| Qwen-Qwen3-Embedding-0.6B/cuda/README.md | CUDA recipe usage + leaderboard references. |
| Qwen-Qwen3-Embedding-0.6B/cuda/Qwen-Qwen3-Embedding-0.6B_cuda_fp32.json | Olive config: HF model → ModelBuilder FP32 + MTEBEvaluator(STS17) on CUDA EP. |
| Qwen-Qwen3-Embedding-0.6B/cuda/info.yaml | Registers the CUDA recipe metadata (device/EP/file). |
| Qwen-Qwen3-Embedding-0.6B/webgpu/requirements.txt | Python deps for running the WebGPU recipe and MTEB evaluation. |
| Qwen-Qwen3-Embedding-0.6B/webgpu/README.md | WebGPU recipe usage + leaderboard references. |
| Qwen-Qwen3-Embedding-0.6B/webgpu/Qwen-Qwen3-Embedding-0.6B_webgpu_fp32.json | Olive config: HF model → ModelBuilder FP32 + MTEBEvaluator(STS17) on WebGPU EP. |
| Qwen-Qwen3-Embedding-0.6B/webgpu/info.yaml | Registers the WebGPU recipe metadata (device/EP/file). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Base recipes build the model only. _with_eval variants include MTEB evaluation, which requires the target EP to be available at runtime. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run |
Add int4 ModelBuilder recipes for cpu, cuda, and webgpu targets with SelectiveMixedPrecision, GPTQ, and RTN passes matching the Qwen3-8B chat model pattern. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add _with_eval variants of the int4 recipes for cpu, cuda, and webgpu targets, including MTEBEvaluator with STS17 task. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
It would be good to remove fp32 cuda and webgpu recipes and add fp16 cuda and webgpu recipes. |
…cipes Expand MTEB eval tasks beyond STS17 to include retrieval benchmarks for better quantization quality assessment across all _with_eval recipes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Expand MTEB eval tasks to include NFCorpus, ArguAna, and SciFact alongside STS17 for the 0.6B embedding model _with_eval recipes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update precision from fp32 to fp16 for cuda and webgpu targets on both Qwen3-Embedding-0.6B and Qwen3-Embedding-8B recipes. Rename files accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Did you add int4 quantization at only 8B model intentionally? |
|
Qwen3-Embedding-0.6B model supports tie_word_embeddings. Did you check if an ONNX model correctly ties embedding? |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The filenames were renamed from fp32 to fp16 but the precision field inside the JSON configs was not updated. This fixes all 8 affected files (0.6B and 8B, cuda and webgpu, plain and with_eval variants). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds int4 recipes for cpu, cuda, and webgpu targets (plain and with_eval variants). Uses same SMP+GPTQ+RTN+ModelBuilder pipeline as the 8B int4 recipes. WebGPU uses group_size=32, others use 128. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Qwen3-Embedding-0.6B model has tie_word_embeddings=True but the ONNX models produced by ModelBuilder do not preserve this. Add a GraphSurgeries pass with TieWordEmbeddings surgeon after ModelBuilder, matching the pattern used by the Qwen3-0.6B chat model recipes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ecipes - Create baseline/ folder for 0.6B and 8B with PyTorch-only eval recipes - Baseline recipes have no passes, just evaluate the HfModel with MTEB - Add evaluate_input_model: false to all existing _with_eval recipes - This avoids duplicate PyTorch eval when running ONNX model evaluation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
04bba12 to
fb20739
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Prevents recipes in the same folder from overwriting each other's output. e.g. model_cpu_int4, model_cuda_fp16, model_pytorch, etc. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Update all 6 READMEs to reference INT4 build/eval configs instead of FP32/FP16 - Add config_sentence_transformers.json copy step instructions where needed - Remove stale FP32/FP16 evaluation results - Update pipeline descriptions to match actual INT4 pass chains - Update 0.6B CUDA INT4 eval results with prompt fix (NFCorpus: 0.287 -> 0.351)
Summary
Add Olive recipes for Qwen3-Embedding-0.6B and Qwen3-Embedding-8B embedding models, targeting CPU, CUDA, and WebGPU execution providers.
Models
Recipes (6 total)
Qwen-Qwen3-Embedding-0.6B/{cpu,cuda,webgpu}Qwen-Qwen3-Embedding-8B/{cpu,cuda,webgpu}Each recipe includes:
include_hidden_states=1for embedding extractionDependencies
MTEBEvaluatorfrom Add MTEBEvaluator for embedding model evaluation Olive#2409 (merged)