Add MTEBEvaluator for embedding model evaluation#2409
Open
natke wants to merge 4 commits intomicrosoft:mainfrom
Open
Add MTEBEvaluator for embedding model evaluation#2409natke wants to merge 4 commits intomicrosoft:mainfrom
natke wants to merge 4 commits intomicrosoft:mainfrom
Conversation
Add built-in MTEB (Massive Text Embedding Benchmark) evaluator to Olive, following the same architecture as LMEvaluator/lmeval_ort.py. - MTEBEvaluator in olive_evaluator.py: supports hf, ort, and ortgenai model classes with auto-detection based on model handler type - mteb_ort.py: MTEB-compatible wrappers for exported ONNX and GenAI models - MTEBORTEvaluator: wraps plain ONNX models via ORT InferenceSession - MTEBORTGenAIEvaluator: wraps GenAI models with hidden_states output - Both use mean pooling over attention-masked token embeddings Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new built-in evaluator to Olive for running MTEB (Massive Text Embedding Benchmark) embedding evaluations, following the existing evaluator registry pattern used for LM evaluation.
Changes:
- Registers a new
MTEBEvaluatorinolive_evaluator.pywith support for HuggingFace, plain ONNX Runtime, and ORT GenAI style models. - Introduces
olive/evaluator/mteb_ort.pywith ONNX Runtime and ORT GenAI adapter classes implementing an MTEB-compatible encoder interface. - Converts MTEB task results into Olive
MetricResultstructures for reporting.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| olive/evaluator/olive_evaluator.py | Adds MTEBEvaluator and wiring to select the correct backend and translate MTEB outputs into Olive metrics. |
| olive/evaluator/mteb_ort.py | Adds ONNX Runtime and ORT GenAI model wrappers to satisfy MTEB encoder requirements (tokenization, pooling, similarity). |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Raise ValueError for unknown model handlers instead of defaulting to ortgenai - Pass device to SentenceTransformer for HF model evaluation - Handle string input in encode() to avoid char-by-char iteration - Raise RuntimeError when hidden_states unavailable instead of logits fallback - Fix lint: blank line after docstring section (D413) - Fix lint: replace list comprehension with list() (C416/R1721) - Fix ruff formatting issues Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a built-in MTEB (Massive Text Embedding Benchmark) evaluator to Olive, following the same architecture as
LMEvaluator/lmeval_ort.py.Changes
olive/evaluator/olive_evaluator.pyMTEBEvaluatorclass registered in the evaluator registryhf(SentenceTransformer),ort(ONNX via ORT),ortgenai(GenAI with hidden_states)genai_config.jsonolive/evaluator/mteb_ort.py(new)MTEBOnnxBase: abstract base implementing MTEB's EncoderProtocol with mean poolingMTEBORTEvaluator: wraps plain ONNX models viaort.InferenceSessionMTEBORTGenAIEvaluator: wraps GenAI models usingog.Generator+hidden_statesoutputUsage
Testing
Tested end-to-end with Qwen3-Embedding-0.6B CPU recipe — ModelBuilder export + MTEB STS17 evaluation on both input HF model and exported GenAI model.