This page describes the standard LenVM workflows. Use the root scripts/ entrypoints whenever possible; they encode the expected environment activation, server flags, paths, and output locations.
Run commands from the repository root. The demo scripts are the main reproducible workflow currently provided: they run a compact end-to-end path that reproduces most of the paper's results. The exact scripts for every paper result are not included yet and will be provided later.
generate data -> train LenVM -> serve/evaluate -> visualize
LenVM data generation samples multiple completions per prompt, groups them by meta_info.lenvm_idx, and writes JSONL datasets. Training consumes those grouped datasets through the local LlamaFactory fork. Evaluation and guided decoding use the local SGLang fork to serve either the LenVM value model or a base model with LenVM-guided sampling.
This is the recommended reproducible path with the files currently included in the repository. Before launching the Slurm demo, copy scripts/remote_slurm_job/submit_job.sh.example to scripts/remote_slurm_job/submit_job.sh, copy .env.example to .env, then configure remote/local paths, environment variables, cache locations, and Slurm resources. The wrapper supports two modes: from a local machine it can sync the repository and submit over SSH; on the remote server it can submit directly with sbatch by setting IS_REMOTE=1. The default path submits scripts/remote_slurm_job/train.slurm through sbatch. If you are already on a GPU node or a dedicated GPU machine, you can also run train.slurm with bash; in that case, set REMOTE_REPO, CACHE_ROOT, and SLURM_GPUS_ON_NODE yourself.
# Configure paths, environment variables, cache locations, Slurm resources, and IS_REMOTE first.
cp scripts/remote_slurm_job/submit_job.sh.example scripts/remote_slurm_job/submit_job.sh
cp .env.example .env
bash scripts/remote_slurm_job/submit_job.shOn two H100 GPUs, the full demo takes about 1 hour and writes outputs to results/. The generated results include LIFEBench, tradeoff evaluation, length prediction, length-token analysis, and visualization of length-change trends.
The same stages can also be run manually:
bash scripts/environment/env_config_uv.sh
bash scripts/data_generation/demo.sh
bash scripts/training/demo.sh
bash scripts/inference/demo_length_prediction.sh
bash scripts/inference/demo_length_token.sh
bash scripts/inference/demo_lifebench.sh
bash scripts/inference/demo_tradeoff.sh
bash scripts/visualization/demo_visual.shThe demo covers data generation, LenVM training, LIFEBench, tradeoff evaluation, length prediction, length-token analysis, and visualization of length-change trends. It is intended to reproduce most of the paper's results, while full paper-result scripts will be added later.
If you want to skip data generation and training, download the existing artifacts:
bash scripts/environment/env_config_uv.sh
bash scripts/download_data_and_model.shThen run the evaluation or visualization script you need after updating its parameters to match the downloaded data/model locations:
bash scripts/inference/demo_length_prediction.sh
bash scripts/inference/demo_length_token.sh
bash scripts/inference/demo_lifebench.sh
bash scripts/inference/demo_tradeoff.sh
bash scripts/visualization/demo_visual.shExpected major inputs and outputs:
- Data:
data_generation/LenVM-Data/ - Model artifacts:
models/or checkpoint paths expected by scripts - Evaluation and visualization results:
results/
Downloaded data/model artifacts are a shortcut around expensive stages, not a guarantee that every demo script can run unchanged. Check model paths, checkpoint paths, ports, tensor/pipeline parallel settings, and memory fractions before launching GPU workloads.
Run the demo data generation workflow:
bash scripts/data_generation/demo.shThe script activates .venv-infer, launches an SGLang server, and calls:
python -m data_generation.data_generator.mainExpected outputs are JSONL files under data_generation/LenVM-Data/<model-name>/..., including grouped files used by training.
For larger runs, dataset-specific commands, OpenCodeReasoning-2 preparation, and downsampling utilities, see ../data_generation/data_generation.md.
Run the demo training workflow:
bash scripts/training/demo.shThe script activates .venv-train, downloads dataset_info.json for the demo dataset if needed, and runs:
FORCE_TORCHRUN=1 llamafactory-cli train ./scripts/training/configs/demo.yamlThe demo config uses:
stage: lenvm- dataset directory:
./data_generation/LenVM-Data - output directory:
./saves/a-qwen2.5-3b-instruct-b-qwen2.5-3b-instruct/demo_math_instruction - evaluation datasets:
demo_math_test,demo_instruction_test
bash scripts/inference/demo_length_prediction.shThis serves the LenVM checkpoint as an embedding-style value model and writes results under results/length_prediction/.
bash scripts/inference/demo_length_token.shThis analyzes token-level value behavior and can produce wordcloud artifacts under results/length_token_analysis/.
bash scripts/inference/demo_lifebench.shThis launches a base SGLang model with in-process LenVM guidance and runs baseline plus LenVM-guided LIFEBench experiments. Benchmark outputs are written under results/LIFEBench/.
bash scripts/inference/demo_tradeoff.shThis runs baseline sampling, LenVM-guided sampling across value scales, budget evaluation, and plotting. Outputs are written under results/tradeoff/.
bash scripts/visualization/demo_visual.shThe visualization script launches the LenVM value-model server, runs the text visualization pipeline, and writes markdown plus hover-HTML artifacts under results/visualization/.
| Stage | Main script | Typical outputs |
|---|---|---|
| Remote Slurm demo | scripts/remote_slurm_job/submit_job.sh |
full demo outputs under results/ |
| Setup | scripts/environment/env_config_uv.sh |
.venv-train/, .venv-infer/, .venv-eval/ |
| Download | scripts/download_data_and_model.sh |
data_generation/LenVM-Data/, models/ |
| Data generation | scripts/data_generation/demo.sh |
grouped JSONL files under data_generation/LenVM-Data/ |
| Training | scripts/training/demo.sh |
checkpoints under saves/ |
| Inference/eval | scripts/inference/*.sh |
metrics and JSONL/CSV artifacts under results/ |
| Visualization | scripts/visualization/demo_visual.sh |
markdown and HTML files under results/visualization/ |
LenVM value-model serving uses SGLang embedding-style launches with architecture override, for example:
python -m sglang.launch_server \
--json-model-override-args '{"architectures":["Qwen2ForLengthValueModel"]}' \
--is-embeddingLenVM-guided decoding serves a base model with in-process guidance using flags such as:
--enable-lvm-guided-sampling
--lvm-guided-inproc
--lvm-guided-inproc-model-path <lenvm-checkpoint>
--lvm-guided-fn sglang.srt.lvm.lvm_guided_sampling:lvm_combined_guidanceUse the scripts as the authoritative examples for complete launch commands.