A Closer Look at Cross-Domain Few-Shot Object Detection:
Fine-Tuning Matters and Parallel Decoder Helps
Xuanlong Yu1, Youyang Sha1, Longfei Liu1, Xi Shen† 1, Di Yang† 2
1Intellindust AI Lab 2Suzhou Institute for Advanced Research, USTC
- We introduce a Hybrid Ensemble Decoder (HED) to improve query diversity and model robustness.
- We propose a simple fine-tuning recipe for cross-domain few-shot object detection, with plateau-aware scheduling and progressive optimization.
- This repository includes:
- Training/evaluation code and configs for CD-FSOD, ODinW-13, and RF100-VL benchmarks.
- A challenge subproject: NTIRE 2026 CDFSOD Challenge, including pseudo-label annotation strategy and challenge-oriented pipeline details.
# Create conda environment
conda create -n ft-fsod python=3.10 -y
conda activate ft-fsod
# Install PyTorch first (pick the command matching your CUDA)
# Example for CUDA 12.1
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
# Install OpenMMLab dependencies
pip install -U openmim
mim install "mmengine"
mim install "mmcv==2.1.0"
mim install mmdet
# Common dependencies
pip install -r requirements.txtDeterministic hack (MMEngine): to reduce few-shot training instability, we set deterministic=True in configs, which needs to patch MMEngine as follows:
1) mmengine/runner/runner.py (set_randomness): add warn_only: bool = True and pass warn_only=warn_only to set_random_seed.
2) mmengine/runner/utils.py (set_random_seed): add warn_only: bool = True, and change to torch.use_deterministic_algorithms(True, warn_only=warn_only).
Refs: runner.py#L698, runner.py#L719, utils.py#L48, utils.py#L90.
-
Download bert-base-uncased and nltk_data following this instruction
-
Adjust the dataset path and pre-trained weight path in
src_path.py
- CD-FSOD (1/5/10-shot train+eval):
bash run_mmgdinob_traineval_cdfsod.sh - ODinW-13 (1/3/5/10-shot, multi-seed):
bash run_mmgdinol_traineval_odwin.sh - RF100-VL (10-shot):
bash run_mmgdinol_traineval_rf100vl.sh - CD-Mixed OOD evaluation (1/5/10-shot):
bash run_mmgdinob_eval_cdmixed.sh
Due to the instability of few-shot fine-tuning (even if the random seed is fixed), the results will be slightly different from the ones in the paper.
Challenge materials and instructions are in the same repository: NTIRE 2026 CDFSOD Challenge.
It includes our challenge-oriented pipeline, such as pseudo-label annotation strategy using FSOD-mAP as the evaluation metric for annotation selection, model fine-tuning and TTAs.
CD-FSOD:
python analyze_results_cdfsod.py <experiment_directory_path>
# e.g.
python analyze_results_cdfsod.py exp_cdfosd_resultsODinW-13:
python analyze_results_odinw.py <experiment_directory_path>
# e.g.
python analyze_results_odinw.py exp_odinwfsod_resultsRF100-VL:
python analyze_results_rf100.py <experiment_directory_path>
# e.g.
python analyze_results_rf100.py exp_rf100vlfsod_resultsThis project is built on top of the OpenMMLab ecosystem, RF100-VL benchmark, CD-FSOD benchmark and OdinW-13 benchmark.
If you find this project useful in your research, please consider citing:
@inproceedings{yu2026acloser,
title={A Closer Look at Cross-Domain Few-Shot Object Detection: Fine-Tuning Matters and Parallel Decoder Helps},
author={Yu, Xuanlong and Sha, Youyang and Liu, Longfei and Shen, Xi and Yang, Di},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2026}
}This project is released under the Apache 2.0 License.
