Add MTEBEvaluator for embedding model evaluation by natke · Pull Request #2409 · microsoft/Olive

natke · 2026-04-10T22:25:02Z

Summary

Add a built-in MTEB (Massive Text Embedding Benchmark) evaluator to Olive, following the same architecture as LMEvaluator/lmeval_ort.py.

Changes

`olive/evaluator/olive_evaluator.py`

New MTEBEvaluator class registered in the evaluator registry
Supports three model classes: hf (SentenceTransformer), ort (ONNX via ORT), ortgenai (GenAI with hidden_states)
Auto-detects model class from handler type, including GenAI detection via genai_config.json

`olive/evaluator/mteb_ort.py` (new)

MTEBOnnxBase: abstract base implementing MTEB's EncoderProtocol with mean pooling
MTEBORTEvaluator: wraps plain ONNX models via ort.InferenceSession
MTEBORTGenAIEvaluator: wraps GenAI models using og.Generator + hidden_states output

Usage

"evaluators": {
    "mteb": {
        "type": "MTEBEvaluator",
        "tasks": ["STS17"],
        "batch_size": 32
    }
},
"evaluator": "mteb"

Testing

Tested end-to-end with Qwen3-Embedding-0.6B CPU recipe — ModelBuilder export + MTEB STS17 evaluation on both input HF model and exported GenAI model.

Add built-in MTEB (Massive Text Embedding Benchmark) evaluator to Olive, following the same architecture as LMEvaluator/lmeval_ort.py. - MTEBEvaluator in olive_evaluator.py: supports hf, ort, and ortgenai model classes with auto-detection based on model handler type - mteb_ort.py: MTEB-compatible wrappers for exported ONNX and GenAI models - MTEBORTEvaluator: wraps plain ONNX models via ORT InferenceSession - MTEBORTGenAIEvaluator: wraps GenAI models with hidden_states output - Both use mean pooling over attention-masked token embeddings Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

olive/evaluator/olive_evaluator.py

Copilot

Pull request overview

Adds a new built-in evaluator to Olive for running MTEB (Massive Text Embedding Benchmark) embedding evaluations, following the existing evaluator registry pattern used for LM evaluation.

Changes:

Registers a new MTEBEvaluator in olive_evaluator.py with support for HuggingFace, plain ONNX Runtime, and ORT GenAI style models.
Introduces olive/evaluator/mteb_ort.py with ONNX Runtime and ORT GenAI adapter classes implementing an MTEB-compatible encoder interface.
Converts MTEB task results into Olive MetricResult structures for reporting.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
olive/evaluator/olive_evaluator.py	Adds `MTEBEvaluator` and wiring to select the correct backend and translate MTEB outputs into Olive metrics.
olive/evaluator/mteb_ort.py	Adds ONNX Runtime and ORT GenAI model wrappers to satisfy MTEB encoder requirements (tokenization, pooling, similarity).

olive/evaluator/olive_evaluator.py

olive/evaluator/mteb_ort.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Raise ValueError for unknown model handlers instead of defaulting to ortgenai - Pass device to SentenceTransformer for HF model evaluation - Handle string input in encode() to avoid char-by-char iteration - Raise RuntimeError when hidden_states unavailable instead of logits fallback - Fix lint: blank line after docstring section (D413) - Fix lint: replace list comprehension with list() (C416/R1721) - Fix ruff formatting issues Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 10, 2026 22:25

natke mentioned this pull request Apr 10, 2026

Add Qwen3 Embedding recipes (0.6B and 8B) microsoft/olive-recipes#355

Draft

Copilot started reviewing on behalf of natke April 10, 2026 22:25 View session

github-advanced-security bot found potential problems Apr 10, 2026

View reviewed changes

olive/evaluator/olive_evaluator.py Fixed Show fixed Hide fixed

Copilot AI reviewed Apr 10, 2026

View reviewed changes

olive/evaluator/olive_evaluator.py Outdated Show resolved Hide resolved

olive/evaluator/olive_evaluator.py Outdated Show resolved Hide resolved

olive/evaluator/mteb_ort.py Show resolved Hide resolved

olive/evaluator/mteb_ort.py Show resolved Hide resolved

github-advanced-security bot found potential problems Apr 10, 2026

View reviewed changes

olive/evaluator/mteb_ort.py Fixed Show fixed Hide fixed

olive/evaluator/mteb_ort.py Fixed Show fixed Hide fixed

olive/evaluator/mteb_ort.py Fixed Show fixed Hide fixed

olive/evaluator/mteb_ort.py Fixed Show fixed Hide fixed

olive/evaluator/mteb_ort.py Fixed Show fixed Hide fixed

natke and others added 3 commits April 10, 2026 15:32

Remove redundant bare import of mteb_ort module

bdf09a3

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fix ruff formatting in olive_evaluator.py

5a853e2

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MTEBEvaluator for embedding model evaluation#2409

Add MTEBEvaluator for embedding model evaluation#2409
natke wants to merge 4 commits intomicrosoft:mainfrom
natke:natke/mteb-evaluator

natke commented Apr 10, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

natke commented Apr 10, 2026

Summary

Changes

olive/evaluator/olive_evaluator.py

olive/evaluator/mteb_ort.py (new)

Usage

Testing

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`olive/evaluator/olive_evaluator.py`

`olive/evaluator/mteb_ort.py` (new)