VMEvalKit 🎥🧠

Unified inference and evaluation framework for 29+ video generation models.

Features

🚀 29+ Models: Unified interface for commercial APIs (Luma, Veo, Sora, Runway) + open-source (LTX-Video, HunyuanVideo, DynamiCrafter, SVD, etc.)
⚖️ Evaluation Pipeline: Human scoring (Gradio) + automated scoring (GPT-4O, InternVL, Qwen3-VL)
☁️ Cloud Integration: S3 + HuggingFace Hub support

Data Format

Organize your questions outside VMEvalKit with the following structure:

questions/
└── {domain}_task/                    # task folder (e.g., chess_task, matching_object_task)
    ├── {domain}_0000/                # individual question folder
    │   ├── first_frame.png           # required: input image for video generation
    │   ├── prompt.txt                # required: text prompt describing the video
    │   ├── final_frame.png           # optional: expected final frame for evaluation
    │   └── ground_truth.mp4          # optional: reference video for evaluation
    ├── {domain}_0001/
    │   └── ...
    └── {domain}_0002/
        └── ...

Example with domain chess:

questions/
└── chess_task/
    ├── chess_0000/
    │   ├── first_frame.png
    │   ├── prompt.txt
    │   ├── final_frame.png
    │   └── ground_truth.mp4
    ├── chess_0001/
    │   └── ...
    └── chess_0002/
        └── ...

Naming Convention:

Task folder: {domain}_task (e.g., chess_task, matching_object_task)
Question folders: {domain}_{i:04d} where i is zero-padded (e.g., chess_0000, chess_0064). Padding automatically expands beyond 4 digits when needed—no dataset size limit.

Quick Start

# 1. Install
git clone https://github.com/Video-Reason/VMEvalKit.git
cd VMEvalKit

python -m venv venv
source venv/bin/activate

pip install -e .

# 2. Setup models
bash setup/install_model.sh --model svd --validate

# # 3. Organize your questions data (see format above)
# mkdir -p ~/my_research/questions

# 4. Run inference
python examples/generate_videos.py --questions-dir setup/test_assets/ --output-dir ./outputs --model svd
python examples/generate_videos.py --questions-dir setup/test_assets/ --output-dir ./outputs --model LTX-2
# 5. Run evaluation  
# Create eval_config.json first:
echo '{"method": "human", "inference_dir": "~/my_research/outputs", "eval_output_dir": "~/my_research/evaluations"}' > eval_config.json
python examples/score_videos.py --eval-config eval_config.json

API Keys

Set in .env file:

cp env.template .env
# Edit .env with your API keys:
# LUMA_API_KEY=...
# OPENAI_API_KEY=...  
# GEMINI_API_KEY=...

Adding Models

# Inherit from ModelWrapper
from vmevalkit.models.base import ModelWrapper

class MyModelWrapper(ModelWrapper):
    def generate(self, image_path, text_prompt, **kwargs):
        # Your inference logic
        return {"success": True, "video_path": "...", ...}

Register in vmevalkit/runner/MODEL_CATALOG.py:

"my-model": {
    "wrapper_module": "vmevalkit.models.my_model_inference",
    "wrapper_class": "MyModelWrapper", 
    "family": "MyCompany"
}

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 619 Commits
docs		docs
examples		examples
script		script
setup		setup
submodules		submodules
vmevalkit		vmevalkit
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
env.template		env.template
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VMEvalKit 🎥🧠

Features

Data Format

Quick Start

API Keys

Adding Models

License

About

Uh oh!

Releases 5

Packages

Contributors 18

Uh oh!

Languages

License

Video-Reason/VMEvalKit

Folders and files

Latest commit

History

Repository files navigation

VMEvalKit 🎥🧠

Features

Data Format

Quick Start

API Keys

Adding Models

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 18

Uh oh!

Languages

Packages