Problems about Inference on Video-MME

Brilliant work on OneVision-Encoder! 🎉
I'm trying to reproduce the LLaVA-NeXT-Video evaluation results following the instructions in the README.
For video benchmarks (e.g., VideoMME), I ran:

TASKS="videomme" bash scripts/eval/eval_ov_encoder.sh

However, I noticed this line in the script:

MODEL_PATH="${MODEL_PATH:-trained_model/must_contain_llava_in_name}"

I've searched through the repository and the Hugging Face organization (lmms-lab-encoder / lmms-lab), but I couldn't find a released model checkpoint whose name contains "llava".
❓ Could you clarify:
Am I misunderstanding the evaluation workflow?
Or is the LLaVA-integrated checkpoint not yet publicly released?
Any guidance would be greatly appreciated! Thanks again for the amazing work. 🙏

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems about Inference on Video-MME #99

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Problems about Inference on Video-MME #99

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions