embedding-benchmark

Benchmark embedding models across Redis-backed retrieval experiments.

What It Does

embedding-benchmark compares embedding models on retrieval tasks using:

Ollama models through the Ollama Python SDK
OpenAI models through RedisVL
Local Hugging Face models through RedisVL
Redis as the vector index and retrieval backend
ranx for retrieval metrics

The CLI writes:

terminal summary tables
summary.json
metrics.csv
per_dataset_metrics.csv
per_query_metrics.csv
config.resolved.yaml
run.log

Example Findings

This repo was used to benchmark 12 model and dimension configurations across 4 NanoBEIR datasets in one Redis-backed reference run.

Benchmark caveat: this is a small, single-run reference result with no error bars or latency warmup. Treat the numbers as an example of what the tool can surface, not as a universal model leaderboard.

A few useful takeaways from that run:

larger embeddings were not consistently better
openai-large@256 was one of the strongest quality and latency tradeoffs
retrieval latency increased with vector size, but stayed below 2 ms/query even at the largest dimension in this setup
the full benchmark build and indexing run took about 46 minutes

Overall examples from quality_retrieval_by_model.csv:

These overall results are macro-averages across datasets, so each dataset gets equal weight in the average.

Model	Dims	Avg nDCG@10	Avg Hit@10	Avg retrieval ms/query
`openai-large`	3072	0.7041	0.8900	1.308
`openai-large`	256	0.6858	0.8650	0.384
`ollama-mxbai`	1024	0.6694	0.8600	0.742
`nomic-embed-text-v2-moe`	512	0.6284	0.8400	0.475

Per-dataset winners from quality_retrieval_by_dataset.csv:

Dataset	Best Model	Dims	nDCG@10	Hit@10	Retrieval ms/query
`NanoNQ`	`openai-large`	256	0.8206	0.92	0.405
`NanoSciFact`	`openai-large`	3072	0.7865	0.88	1.051
`NanoFiQA2018`	`openai-large`	3072	0.6628	0.84	1.446
`NanoArguAna`	`ollama-mxbai`	1024	0.6797	0.98	0.663

The best configuration was not stable across datasets, which is exactly why it is worth benchmarking on the retrieval tasks that matter to you.

Supported Custom Models

The repo is built so people can benchmark their own model choices within the provider families implemented today:

custom Ollama embedding model names via the Ollama SDK
custom OpenAI embedding model names via RedisVL
custom Hugging Face embedding model names via RedisVL

It does not yet support arbitrary providers outside those adapter paths. Hugging Face models through RedisVL are treated as fixed-dimension models unless a future adapter explicitly supports safe truncation.

Supported Benchmark Definitions

Benchmarks are defined in YAML. Today you can configure:

custom model lists within the built-in provider families
custom dataset lists using built-in dataset kinds
custom ranx metric lists for benchmark scoring
a separate per_query_metrics list for per_query_metrics.csv

Built-in dataset kinds:

nanobeir
hf_beir

Dataset splits default to train, which matches the NanoBEIR datasets used by the starter config. For HF BEIR-style datasets with different splits, configure them explicitly:

datasets:
  - id: SciFact
    kind: hf_beir
    source: BeIR/scifact
    enabled: true
    options:
      corpus_split: train
      queries_split: test
      qrels_split: test

metrics can use valid ranx metric strings such as:

hit_rate@10
ndcg@10
mrr@10
precision@5

per_query_metrics is intentionally narrower right now and supports:

hit_rate@k
ndcg@k
mrr@k

Requirements

Python 3.11+
Redis with RediSearch / vector search support
Ollama running locally if you want to benchmark Ollama models
OPENAI_API_KEY set if you want to benchmark OpenAI models

Install

From the project root:

uv sync --extra dev

This creates .venv and installs the CLI locally.

Start Redis

You need a Redis instance with vector search enabled.

If you already have one running, point the config at it.

If not, a quick local option is:

docker run --rm -p 6379:6379 redis/redis-stack-server:latest

Prepare Model Access

OpenAI

Set your API key:

export OPENAI_API_KEY=<your-openai-api-key>

Ollama

Start Ollama and pull the models you want to test:

ollama serve
ollama pull nomic-embed-text-v2-moe
ollama pull mxbai-embed-large

Create A Config

Generate a local starter config:

./.venv/bin/embedding-benchmark init-config

This writes benchmark.yaml in the current directory. benchmark.yaml is treated as a local working file and is gitignored on purpose.

To write it somewhere else:

./.venv/bin/embedding-benchmark init-config --output my-benchmark.yaml

If you want a committed example to start from, copy benchmark.example.yaml.

Example dataset entry:

datasets:
  - id: NanoNQ
    kind: nanobeir
    source: zeta-alpha-ai/NanoNQ
    enabled: true

Example evaluation section:

evaluation:
  top_k: 10
  metrics:
    - hit_rate@10
    - ndcg@10
    - precision@5
  per_query_metrics:
    - hit_rate@10
    - ndcg@10
  output_dir: runs

Review Configured Models

./.venv/bin/embedding-benchmark list-models --config benchmark.yaml

The generated starter config includes:

ollama-nomic
ollama-mxbai
openai-small
openai-large
redisvl-minilm

Run A Benchmark

Run the benchmark with the config:

./.venv/bin/embedding-benchmark run --config benchmark.yaml

The generated starter config enables:

NanoNQ
NanoSciFact
NanoFiQA2018
NanoArguAna

Artifacts are written under runs/<timestamp>-benchmark/. The runs/ directory is gitignored because it contains local benchmark artifacts.

Sweep Dimensions

For models that support native reduced dimensions:

./.venv/bin/embedding-benchmark sweep-dims --config benchmark.yaml --model openai-small

Example models that can sweep dimensions:

openai-small
openai-large
ollama-nomic

Inspect A Run

./.venv/bin/embedding-benchmark inspect --config benchmark.yaml --run-id 20260519T202913Z-benchmark

This prints the saved summary, including Redis index names and key prefixes.

Clean Up Redis Data

Remove Redis keys for a specific run:

./.venv/bin/embedding-benchmark cleanup --config benchmark.yaml --run-id 20260519T202913Z-benchmark

Or remove by key prefix:

./.venv/bin/embedding-benchmark cleanup --config benchmark.yaml --prefix embedbench

Smoke Test Workflow

If you want the quickest manual check:

Start Redis.
Export OPENAI_API_KEY.
Generate benchmark.yaml.
Edit benchmark.yaml so only one small model and one dataset are enabled.
Run:

./.venv/bin/embedding-benchmark run --config benchmark.yaml

Then confirm:

the CLI prints a results table
a new folder appears under runs/
summary.json and metrics.csv exist
the Redis keys/indexes are visible in Redis Insight if keep_indexes: true

Run Tests

./.venv/bin/pytest -q

Notes

OpenAI models in this project go through RedisVL, not the direct OpenAI SDK.
Ollama models use the Ollama SDK directly because RedisVL does not currently support Ollama embeddings.
Dataset loading is selected by datasets[].kind, so users can define different benchmark mixes without changing Python code.
Metric names are validated at config load time so bad ranx metric strings fail before a run starts.
Redis query results are converted into ranking scores before evaluation so ranx sees the correct ordering.
The committed benchmark.example.yaml matches the generated default config, which enables all four NanoBEIR datasets.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples/reference-run		examples/reference-run
src/embedbench		src/embedbench
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmark.example.yaml		benchmark.example.yaml
docker-compose.smoke.yml		docker-compose.smoke.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

embedding-benchmark

What It Does

Example Findings

Supported Custom Models

Supported Benchmark Definitions

Requirements

Install

Start Redis

Prepare Model Access

OpenAI

Ollama

Create A Config

Review Configured Models

Run A Benchmark

Sweep Dimensions

Inspect A Run

Clean Up Redis Data

Smoke Test Workflow

Run Tests

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

embedding-benchmark

What It Does

Example Findings

Supported Custom Models

Supported Benchmark Definitions

Requirements

Install

Start Redis

Prepare Model Access

OpenAI

Ollama

Create A Config

Review Configured Models

Run A Benchmark

Sweep Dimensions

Inspect A Run

Clean Up Redis Data

Smoke Test Workflow

Run Tests

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages