Minimal platform for controlled RAG experiments.
config/— global defaults and registries.datasets/— smoke fixtures and external benchmark source folders.studies/— executable study definitions and human results.artifacts/— ignored machine outputs and caches.src/mirage/— runtime code.tests/— regression tests.
- 2026-05-11: froze the initial retrieval-only baseline on full SciFact:
text-embedding-3-small, token chunks1024/128, Qdrant HNSW, dense top-k5, no reranker, no tool policy, no LLM generation. Details:studies/rag-foundation/README.mdandstudies/rag-foundation/study.toml. - 2026-05-11: completed the first search/backend wave. Dense top-k10/top-k20 improved recall, BM25 and hybrid RRF did not beat dense search on SciFact, and FAISS flat matched Qdrant quality, so the vector store was not the quality bottleneck. Details:
studies/rag-foundation/README.md. - 2026-05-12: completed the embedding sweep and promoted
gemini-embedding-001as the current retrieval baseline. It improved Recall@k from 0.7771 to 0.9509 and NDCG@k from 0.6912 to 0.8869 versustext-embedding-3-small. Details:studies/rag-foundation/README.md. - 2026-05-12: completed the LLM-free macro chunking wave on the Gemini baseline. Token 512/64 improved Precision@k slightly but lost Recall@k/NDCG@k and was slower; token 2048/256 and sentence 1024/128 matched quality but added latency, so token 1024/128 remains the chunking baseline. Details:
studies/rag-foundation/README.md. - 2026-05-12: completed the LLM-free macro search wave on Gemini + token 1024/128. Dense top-k10 improved Recall@k from 0.9509 to 0.9750 and NDCG@k from 0.8869 to 0.8956 versus dense top-k5 with modest latency cost, so dense top-k10 is the current retrieval baseline. Details:
studies/rag-foundation/README.md.
gemini-embedding-001 + full SciFact + prep-basic-clean-v1 + token chunks 1024/128 + Qdrant HNSW cosine + dense top-k10 + no reranker + no tool policy + no LLM generation.
Next step after the current SciFact retrieval baseline is dataset generalization of the winning config before any LLM generation/citation work.
cp .env.example .env
uv sync --dev
docker compose up -d
just resolve