Skip to content

Video-Reason/VMEvalKit

Repository files navigation

VMEvalKit 🎥🧠

Unified inference and evaluation framework for 29+ video generation models.

Features

  • 🚀 29+ Models: Unified interface for commercial APIs (Luma, Veo, Sora, Runway) + open-source (LTX-Video, HunyuanVideo, DynamiCrafter, SVD, etc.)
  • ⚖️ Evaluation Pipeline: Human scoring (Gradio) + automated scoring (GPT-4O, InternVL, Qwen3-VL)
  • ☁️ Cloud Integration: S3 + HuggingFace Hub support

Data Format

Organize your questions outside VMEvalKit with the following structure:

questions/
└── {domain}_task/                    # task folder (e.g., chess_task, matching_object_task)
    ├── {domain}_0000/                # individual question folder
    │   ├── first_frame.png           # required: input image for video generation
    │   ├── prompt.txt                # required: text prompt describing the video
    │   ├── final_frame.png           # optional: expected final frame for evaluation
    │   └── ground_truth.mp4          # optional: reference video for evaluation
    ├── {domain}_0001/
    │   └── ...
    └── {domain}_0002/
        └── ...

Example with domain chess:

questions/
└── chess_task/
    ├── chess_0000/
    │   ├── first_frame.png
    │   ├── prompt.txt
    │   ├── final_frame.png
    │   └── ground_truth.mp4
    ├── chess_0001/
    │   └── ...
    └── chess_0002/
        └── ...

Naming Convention:

  • Task folder: {domain}_task (e.g., chess_task, matching_object_task)
  • Question folders: {domain}_{i:04d} where i is zero-padded (e.g., chess_0000, chess_0064). Padding automatically expands beyond 4 digits when needed—no dataset size limit.

Quick Start

# 1. Install
git clone https://github.com/Video-Reason/VMEvalKit.git
cd VMEvalKit

python -m venv venv
source venv/bin/activate

pip install -e .

# 2. Setup models
bash setup/install_model.sh --model svd --validate

# # 3. Organize your questions data (see format above)
# mkdir -p ~/my_research/questions

# 4. Run inference
python examples/generate_videos.py --questions-dir setup/test_assets/ --output-dir ./outputs --model svd
python examples/generate_videos.py --questions-dir setup/test_assets/ --output-dir ./outputs --model LTX-2
# 5. Run evaluation  
# Create eval_config.json first:
echo '{"method": "human", "inference_dir": "~/my_research/outputs", "eval_output_dir": "~/my_research/evaluations"}' > eval_config.json
python examples/score_videos.py --eval-config eval_config.json

API Keys

Set in .env file:

cp env.template .env
# Edit .env with your API keys:
# LUMA_API_KEY=...
# OPENAI_API_KEY=...  
# GEMINI_API_KEY=...

Adding Models

# Inherit from ModelWrapper
from vmevalkit.models.base import ModelWrapper

class MyModelWrapper(ModelWrapper):
    def generate(self, image_path, text_prompt, **kwargs):
        # Your inference logic
        return {"success": True, "video_path": "...", ...}

Register in vmevalkit/runner/MODEL_CATALOG.py:

"my-model": {
    "wrapper_module": "vmevalkit.models.my_model_inference",
    "wrapper_class": "MyModelWrapper", 
    "family": "MyCompany"
}

License

Apache 2.0

About

This is a framework for evaluating reasoning in foundational Video Models.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 18