Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Haoyu Wu$^{1*}$, Diankun Wu $^{2*}$, Tianyu He $^{1†}$, Junliang Guo $^{1}$, Yang Ye $^{1}$, Yueqi Duan $^{2}$, Jiang Bian $^{1}$

$^1$ Microsoft Research $^2$ Tsinghua University

($^*$ Equal Contribution. † Project Lead)

🎯 Overview

Geometry Forcing (GF) Overview. (a) Our proposed GF paradigm enhances video diffusion models by aligning with geometric features from VGGT. (b) Compared to DFoT, our method generates more temporally and geometrically consistent videos. (c) While baseline features fail to reconstruct meaningful 3D geometry, GF-learned features enable accurate 3D reconstruction.

🚀 News

[2025/10/8] We release the evaluation code for reprojection error and revisit error.
[2025/9/24] We release code and checkpoint.
[2025/9/22] Geometry Forcing is accepted to NeurIPS 2025 NextVid Workshop as an Oral!
[2025/7/10] We release the paper and the project.

💪 Get Started

Setup Environments

conda create -n geometryforcing python=3.10 -y
conda activate geometryforcing
pip install -r requirements.txt

Connect to Weights & Biases:

We use Weights & Biases for logging. Sign up if you don't have an account, and modify wandb.entity in config.yaml to your user/organization name.

Download Checkpoints and Data

Download pretrained checkpiont using huggingface:

bash scripts/hf_download_checkpoints.sh

Download pretrained checkpiont using modelscope:

bash scripts/ms_download_checkpoints.sh

Download and process RealEstate10k dataset to data/real-estate-10k

The structure of RealEstate10K is exactly the same with DFoT. Please download RealEstate10k from dataset of DFoT from here huggingface dataset. The structure should like this wiki from DFoT

data/
├── {dataset_name}/
│   ├── training/
│   │   ├── video_xxx.mp4
│   │   ├── ...
│   ├── validation/
│   │   ├── video_xxx.mp4
│   │   ├── ...
│   ├── test/
│   │   ├── video_xxx.mp4
│   │   ├── ...
│   ├── metadata/
│   │   ├── training.pt
│   │   ├── validation.pt
│   │   ├── test.pt

Generating Videos with Pretrained Models

Single Image to Long Video (256 Frames):

bash scripts/eval_geometry_forcing.sh

Single Image to Rotation Video (16 Frames):

bash scripts/eval_geometry_forcing_rotation.sh

Training Geometry Forcing

To train Geometry Forcing, run the following command:

bash scripts/train_geometry_forcing.sh

Evaluation for Reprojection Error and Revisit Error

To evaluate the reprojection error and revisit error, please follow the instructions in README_EVAL.md.

📜 Citation

If you find our work useful for your research, please consider citing our paper:

@article{wu2025geometryforcing,
  title={Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling},
  author={Wu, Haoyu and Wu, Diankun and He, Tianyu and Guo, Junliang and Ye, Yang and Duan, Yueqi and Bian, Jiang},
  journal={arXiv preprint arXiv:2507.07982},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
algorithms		algorithms
assets		assets
configurations		configurations
datasets		datasets
evaluation		evaluation
experiments		experiments
external		external
scripts		scripts
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
README_EVAL.md		README_EVAL.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

🎯 Overview

🚀 News

💪 Get Started

Setup Environments

Connect to Weights & Biases:

Download Checkpoints and Data

Generating Videos with Pretrained Models

Training Geometry Forcing

Evaluation for Reprojection Error and Revisit Error

📜 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

CIntellifusion/GeometryForcing

Folders and files

Latest commit

History

Repository files navigation

Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

🎯 Overview

🚀 News

💪 Get Started

Setup Environments

Connect to Weights & Biases:

Download Checkpoints and Data

Generating Videos with Pretrained Models

Training Geometry Forcing

Evaluation for Reprojection Error and Revisit Error

📜 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages