miRAGe

Minimal platform for controlled RAG experiments.

Layout

config/ — global defaults and registries.
datasets/ — smoke fixtures and external benchmark source folders.
studies/ — executable study definitions and human results.
artifacts/ — ignored machine outputs and caches.
src/mirage/ — runtime code.
tests/ — regression tests.

Experiment decision log

2026-05-11: froze the initial retrieval-only baseline on full SciFact: text-embedding-3-small, token chunks 1024/128, Qdrant HNSW, dense top-k5, no reranker, no tool policy, no LLM generation. Details: studies/rag-foundation/README.md and studies/rag-foundation/study.toml.
2026-05-11: completed the first search/backend wave. Dense top-k10/top-k20 improved recall, BM25 and hybrid RRF did not beat dense search on SciFact, and FAISS flat matched Qdrant quality, so the vector store was not the quality bottleneck. Details: studies/rag-foundation/README.md.
2026-05-12: completed the embedding sweep and promoted gemini-embedding-001 as the current retrieval baseline. It improved Recall@k from 0.7771 to 0.9509 and NDCG@k from 0.6912 to 0.8869 versus text-embedding-3-small. Details: studies/rag-foundation/README.md.
2026-05-12: completed the LLM-free macro chunking wave on the Gemini baseline. Token 512/64 improved Precision@k slightly but lost Recall@k/NDCG@k and was slower; token 2048/256 and sentence 1024/128 matched quality but added latency, so token 1024/128 remains the chunking baseline. Details: studies/rag-foundation/README.md.
2026-05-12: completed the LLM-free macro search wave on Gemini + token 1024/128. Dense top-k10 improved Recall@k from 0.9509 to 0.9750 and NDCG@k from 0.8869 to 0.8956 versus dense top-k5 with modest latency cost, so dense top-k10 is the current retrieval baseline. Details: studies/rag-foundation/README.md.

Current retrieval baseline

gemini-embedding-001 + full SciFact + prep-basic-clean-v1 + token chunks 1024/128 + Qdrant HNSW cosine + dense top-k10 + no reranker + no tool policy + no LLM generation.

Next LLM-free macro cases

Next step after the current SciFact retrieval baseline is dataset generalization of the winning config before any LLM generation/citation work.

Quickstart

cp .env.example .env
uv sync --dev
docker compose up -d
just resolve

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
config		config
datasets		datasets
src/mirage		src/mirage
studies/rag-foundation		studies/rag-foundation
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
compose.override.yaml		compose.override.yaml
compose.yaml		compose.yaml
justfile		justfile
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

miRAGe

Layout

Experiment decision log

Current retrieval baseline

Next LLM-free macro cases

Quickstart

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

miRAGe

Layout

Experiment decision log

Current retrieval baseline

Next LLM-free macro cases

Quickstart

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages