Skip to content

zenseact/idsplat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IDSplat

Instance-Decomposed Gaussian Splatting for Autonomous Driving

CVPR Findings 2026

idsplat.mp4

Official code release for IDSplat, built on top of Nerfstudio.

IDSplat decomposes autonomous driving scenes into a static background and per-instance dynamic Gaussians, enabling high-quality novel view synthesis for camera and lidar without requiring annotations.


Installation

We recommend using uv for dependency management.

The training scripts use uv run, which automatically creates and syncs the virtual environment on first use — no separate install step is required.

For non-50-series GPUs, edit pyproject.toml to switch to the CUDA 12.4 PyTorch index (swap the commented/uncommented [tool.uv] blocks) before syncing.

The gsplat dependency is installed from a custom fork that adds rolling-shutter camera and lidar rendering support.


Datasets

IDSplat supports the Waymo Open Dataset (WOD) v2 and PandaSet.

Data paths are always passed explicitly via --data and --masks_dir in the training scripts, so datasets can live anywhere on the filesystem.

Waymo Open Dataset

Download the Waymo Open Dataset v2 (parquet format) from the official website.

The expected directory structure is:

<wod_root>/
  training/
    camera_box/
    camera_image/
    lidar/
    lidar_box/
    ...
  validation/
    ...

On first run, images are extracted from the parquet files and cached to disk. The cache location is controlled by --output_folder (default: /data/wod/images).

Three evaluation splits are provided in scripts/arrays/:

Split File Cameras Parquet dir
Dynamic NOTR wod_id_to_seq_notr_dynamic_50frames.txt FRONT, FRONT_LEFT, FRONT_RIGHT training
PVG wod_id_to_seq_pvg.txt FRONT, FRONT_LEFT, FRONT_RIGHT training
StreetGS wod_id_to_seq_adgs_split.txt FRONT validation

Each sequence file has the format:

# scene_id, seg_name, start_timestep, end_timestep, scene_type
16,seg102319,0,50,dynamic

PandaSet

Download PandaSet from Hugging Face or the official website.

The expected directory structure is:

<pandaset_root>/
  001/
    camera/
      back_camera/
      front_camera/
      front_left_camera/
      ...
    lidar/
    meta/
    ...
  011/
  ...

The default config expects PandaSet at data/pandaset relative to the repo root.

Sequence IDs used for evaluation are listed in scripts/arrays/pandaset_id_to_seq.txt.


Instance Segmentation Masks

IDSplat uses per-frame tracked instance segmentation masks generated by Grounded-SAM-2.

Expected format

Masks are organized per sequence and camera:

<masks_dir>/
  <sequence_id>/
    <camera_id>/
      json_data/
        mask_000_<camera_id>.json
        mask_001_<camera_id>.json
        ...
      mask_data/
        mask_000.npy
        mask_001.npy
        ...

Each .json file has the structure:

{
  "mask_name": "mask_000.npy",
  "labels": {
    "1": {"instance_id": 1, "class_name": "car"},
    "2": {"instance_id": 2, "class_name": "truck"}
  }
}

The corresponding .npy file is a 2D integer array of shape (H, W) where each pixel value is the instance ID (0 = background). Instance IDs must be consistent across frames (i.e., the same physical object must have the same ID in every frame it appears).

Generating masks

The results in the paper were obtained using Grounded-SAM-2 with tracking to generate the tracked instance masks. Similar models (e.g. SAM-3) would conceptually work as well but has not been tested.


Training

Waymo Open Dataset — Dynamic NOTR

bash scripts/train_waymo_notr.sh \
    --sequence 102319 \
    --data /path/to/wod \
    --masks_dir /path/to/masks

The start/end frames are looked up automatically from scripts/arrays/wod_id_to_seq_notr_dynamic_50frames.txt. Pass --start_frame and --end_frame explicitly to override.

Waymo Open Dataset — PVG

bash scripts/train_waymo_pvg.sh \
    --sequence 102353 \
    --data /path/to/wod \
    --masks_dir /path/to/masks

Frame ranges are looked up from scripts/arrays/wod_id_to_seq_pvg.txt.

Waymo Open Dataset — StreetGS

bash scripts/train_waymo_streetgs.sh \
    --sequence 104481 \
    --data /path/to/wod \
    --masks_dir /path/to/masks

Frame ranges are looked up from scripts/arrays/wod_id_to_seq_adgs_split.txt.

PandaSet

bash scripts/train_pandaset.sh \
    --sequence 001 \
    --data /path/to/pandaset \
    --masks_dir /path/to/masks

Sequence IDs are listed in scripts/arrays/pandaset_id_to_seq.txt.

Training options

All training scripts accept additional arguments that are forwarded. For example, to use the browser viewer instead of W&B:

bash scripts/train_pandaset.sh \
    --sequence 001 --data /path/to/pandaset --masks_dir /path/to/masks \
    --vis viewer

Evaluation

bash scripts/eval.sh --config outputs/<experiment>/idsplat/<timestamp>/config.yml

Viewer

To open the interactive viewer for a trained model:

uv run ns-viewer --load-config outputs/<experiment>/idsplat/<timestamp>/config.yml

Citation

If you use IDSplat in your research, please cite:

@article{lindstrom2025idsplat,
  title={IDSplat: Instance-Decomposed 3D Gaussian Splatting for Driving Scenes},
  author={Lindstr{\"o}m, Carl and Rafidashti, Mahan and Fatemi, Maryam and Hammarstrand, Lars and Oswald, Martin R and Svensson, Lennart},
  journal={arXiv preprint arXiv:2511.19235},
  year={2025}
}

About

Official code release for "IDSplat: Instance-Decomposed 3D Gaussian Splatting for Driving Scenes"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors