Benchmarking OME Arrow through Parquet, Vortex, LanceDB, and more.
- Create and sync a uv environment (includes parquet, lancedb, vortex-data):
uv venv
uv sync- Launch Jupyter and open
notebooks/compare_parquet_vortex_lance.ipynb:
uv run jupyter labThe notebook defaults to ~100,000 rows x ~4,000 columns of float64 data and ~50 columns of string data. Lower N_ROWS/N_COLS in the config cell if you hit memory pressure (especially before converting to pandas for the CSV benchmark).
An OME-Arrow variant lives at notebooks/compare_parquet_vortex_lance_ome.ipynb (or .py via jupytext) which adds a single OME image column (random 100x100) alongside the existing columns.
An OME-Arrow-only + OME-Zarr benchmark lives at notebooks/compare_ome_arrow_only.ipynb (or .py), focusing on a single OME image column and a directory-per-image OME-Zarr comparison.