This directory contains scripts for evaluating CloverLM checkpoints during training using the lm-eval harness with Accelerate.
For a ready-to-use evaluation setup with uv and locked dependencies, see the
lm_eval README on HuggingFace.
That setup requires only three commands to get running and is the easiest way to
reproduce the published results.
The scripts below are used internally for continuous evaluation of checkpoints as they are produced during training.
Creates (or recreates) the dev_eval_acc_flash micromamba environment with all
packages needed for the accelerate + lm_eval pipeline using Flash Attention 2:
convert_dcp_to_pt.py β convert_checkpoint.py β accelerate launch lm_eval β upload_wandb.py
Prerequisites: micromamba at ~/.local/bin/micromamba, CUDA toolkit available.
bash setup_dev_eval_acc_flash.shPolling loop that watches a DCP checkpoint directory, and for each new step:
- Converts the DCP checkpoint to a single
.ptfile - Converts the
.ptfile to a HuggingFace model directory - Runs zero-shot evaluation via
accelerate launchwith lm_eval - Uploads results to Weights & Biases
Configure the checkpoint directory, tasks, GPU count, and W&B project by editing
the variables at the top of the script. Runs indefinitely, polling every
POLL_INTERVAL seconds for new checkpoints.
bash auto_eval_4b_acc.sh