we should rethink how we measure inference time. now we report inference time per batch on GPUs. we might want to report inference per species per compute core or similar. up to debate to the community