Dynamic Few-Shot Object Detection

This repository implements a dynamic few-shot object detection framework built on top of a simple YOLO-based baseline. The baseline path is intentionally bare: detect objects, crop them, embed them with a frozen backbone, compare to prototype memory, and return a label or reject as unknown.

The current framework extends that baseline into an open-world, updateable DFOD system with:

support-quality weighting
multimodal class memory
distributional prototypes with diagonal variance
multiple similarity metrics with optional fusion
calibration-aware unknown rejection
strict pseudo-label updates for dynamic memory growth
lightweight feature adapters
a full static-vs-dynamic evaluation pipeline

This README is written for a fresh clone that contains only code. No support/, memory.npz, test_images/, or calibration_images/ directories are assumed to exist.

What The Repo Does

At a high level, the pipeline is:

image -> detection -> crop -> embedding -> memory lookup -> rejection / prediction -> optional memory update

The main extension over the baseline is that memory is no longer a single unweighted class centroid. It is a versioned, updateable bundle of weighted support embeddings that can be clustered into multiple modes, scored with several metrics, calibrated on held-out data, and safely expanded over time.

Repository Layout

Baseline_Perception/
  io.py                  Image loading and crop references
  detector.py            YOLO detection + auxiliary objectness
  cropper.py             Box-to-crop conversion
  embedder.py            Frozen backbone + lightweight adapter
  quality.py             Crop-quality scoring and reranking
  validation.py          Contract checks for detections and embeddings
  Main_perception.py     End-to-end perception pipeline

Baseline_Reasoning/
  memory.py              Weighted multimodal memory construction
  similarity.py          Cosine / Euclidean / Manhattan / Mahalanobis scoring
  rejection.py           Unknown rejection on a unified score space
  calibration.py         Rejection/update calibration
  prototype_update.py    Strict pseudo-label memory updates
  serialization.py       Versioned memory bundle I/O
  formatting.py          User-facing output formatting
  Main_reasoning.py      End-to-end reasoning pipeline

dfod_config.py           Namespaced configuration + backward-compatible aliases
dfod_runtime.py          Shared runtime for inference, calibration, and evaluation
build_memory_from_support.py
run_baseline.py
calibrate_dfod.py
prepare_test_images.py
evaluate_dfod.py

rebuild_memory.sh        Wrapper: build memory using repo-local .venv
run_inference.sh         Wrapper: single-image inference using repo-local .venv
run_calibration.sh       Wrapper: calibration using repo-local .venv
prepare_test_images.sh   Wrapper: evaluation image generation using repo-local .venv
run_evaluation.sh        Wrapper: static-vs-dynamic evaluation using repo-local .venv

Environment Setup

The wrapper scripts assume a repo-local virtual environment at .venv.

Recommended Python Version

Python 3.12

Install

cd /path/to/dfod
python3.12 -m venv .venv
./.venv/bin/python -m pip install --upgrade pip
./.venv/bin/python -m pip install -r requirements.txt

Verify The Environment

./.venv/bin/python smoke_setup.py

This checks importability of:

numpy
PIL
torch
torchvision
ultralytics

Notes:

On first real detector run, Ultralytics may download YOLO weights.
The wrapper scripts always use .venv/bin/python, which avoids common Conda/venv mismatches.
The provided shell wrappers currently run on cpu for portability.

Data Requirements

This repo does not ship support images, query images, or evaluation images.

You must provide at least:

A support/ folder for memory construction.
One or more query images for inference.
Optionally, an ImageNet-style folder root if you want to use the automated calibration/evaluation pipeline.

Support Folder Layout

Create a support/ directory at the repo root:

support/
  dog/
    001.jpg
    002.jpg
    ...
  cup/
    001.jpg
    002.jpg
    ...
  keyboard/
    001.jpg
    002.jpg
    ...

Rules:

One subdirectory per class.
Folder name is the class label used by memory and outputs.
Supported image extensions are .jpg, .jpeg, and .png.
At least 3 support images per class is the practical minimum.
6+ support images per class is recommended if you want multimodal memory to activate reliably.

What Happens During Support Build

For each support image, the framework:

Runs YOLO detection.
Optionally attaches auxiliary objectness from Faster R-CNN.
Computes a crop-quality score.
Selects the best crop from that image.
Embeds the crop.
Computes support weights from crop quality and within-class centrality.
Builds a versioned memory bundle.
Optionally trains and saves a lightweight adapter.

Quick Start

1. Build Memory From Support

After you create support/, build memory.npz:

bash rebuild_memory.sh

Outputs created:

memory.npz
adapter.pt if adapter training is enabled

2. Run Single-Image Inference

Run inference on one image:

bash run_inference.sh /absolute/path/to/image.jpg

Or on an image inside the repo:

bash run_inference.sh support/dog/001.jpg

What this does:

loads memory.npz
runs perception on the query image
classifies each detected object
applies unknown rejection
optionally accepts high-confidence pseudo-updates

Outputs created:

dfod_output.json
memory_updated.npz if pseudo-updates are accepted

run_baseline.py automatically uses calibration.npz if that file exists in the repo root.

Core Runtime Scripts

These are the main entrypoints most users need:

bash rebuild_memory.sh
bash run_inference.sh path/to/image.jpg
bash run_calibration.sh ...
bash prepare_test_images.sh ...
bash run_evaluation.sh ...

You can also call the Python scripts directly with ./.venv/bin/python ..., but the shell wrappers are safer because they always use the repo-local environment.

Evaluation Pipeline

The evaluation stack compares:

static: inference without memory updates
dynamic: inference with online pseudo-label memory updates

It is designed for open-world DFOD, so evaluation includes:

known-class recognition
unknown-class rejection
support growth over time
pseudo-update safety

ImageNet-Based Evaluation Workflow

The provided automated workflow assumes you have an ImageNet-style validation root containing class directories. The helper script copies held-out images into:

calibration_images/
test_images/

The class mappings currently live in evaluation_utils.py.

Current supported known support labels for the automated ImageNet workflow:

airplane
backpack
banana
bear
bicycle
bird
bottle
broccoli
cat
chair
clock
couch
cup
dog
elephant
keyboard
laptop
orange
pizza
tv

Current automated unknown evaluation labels:

bench
boat
bus
cell_phone
teddy_bear
truck
vase
zebra

If your chosen support classes are different, update the mapping dictionaries in evaluation_utils.py.

Step 1: Prepare A Calibration Split

Use support images and ImageNet-style held-out images to create calibration_images/:

bash prepare_test_images.sh \
  --imagenet-root /path/to/imagenet/val \
  --support-dir support \
  --output-dir calibration_images \
  --known-per-class 4 \
  --unknown-per-class 4 \
  --known-offset 10 \
  --unknown-offset 4

This writes:

calibration_images/manifest.json
calibration_images/manifest.csv
calibration_images/summary.json

Use offsets that do not overlap with your support images.

Step 2: Calibrate Rejection And Update Thresholds

bash run_calibration.sh \
  --manifest calibration_images/manifest.json \
  --memory-path memory.npz \
  --output-path calibration.npz

Outputs created:

calibration.npz
calibration.json
calibration_records.json
calibration_records.csv

The calibration artifact stores:

rejection score threshold
rejection margin threshold
update score threshold
update margin threshold
held-out known/unknown performance estimates

Step 3: Prepare A Test Stream

Create a disjoint evaluation stream:

bash prepare_test_images.sh \
  --imagenet-root /path/to/imagenet/val \
  --support-dir support \
  --output-dir test_images \
  --known-per-class 4 \
  --unknown-per-class 4 \
  --known-offset 20 \
  --unknown-offset 8

This produces a manifest-ordered stream that interleaves known and unknown images round by round.

Step 4: Run Static-Vs-Dynamic Evaluation

bash run_evaluation.sh \
  --manifest test_images/manifest.json \
  --memory-path memory.npz \
  --calibration-path calibration.npz \
  --output-dir evaluation_runs/main

Outputs created in evaluation_runs/main/:

static_summary.json
dynamic_summary.json
comparison.json
static_records.json
dynamic_records.json
static_records.csv
dynamic_records.csv
dynamic_final_memory.npz
copied adapter weights for each branch

Evaluation Metrics

The main summaries track:

known_top1_accuracy
known_any_match_accuracy
unknown_rejection_rate
unknown_false_accept_rate
overall_image_score
accepted_updates_total
accepted_updates_unknown
support growth before and after the stream
per-round known accuracy and unknown rejection

These metrics are specifically useful for DFOD because they measure both recognition quality and the safety of dynamic memory expansion.

Using Custom Evaluation Data

You do not have to use the ImageNet helper.

If you already have your own test stream, write a manifest JSON with entries like:

[
  {
    "image_path": "/abs/path/to/image1.jpg",
    "eval_split": "known",
    "expected_label": "dog",
    "source_label": "dog",
    "round_index": 0
  },
  {
    "image_path": "/abs/path/to/image2.jpg",
    "eval_split": "unknown",
    "expected_label": "unknown",
    "source_label": "zebra",
    "round_index": 0
  }
]

Then run:

bash run_evaluation.sh --manifest /path/to/manifest.json --memory-path memory.npz --calibration-path calibration.npz

Programmatic Usage

If you want to integrate the framework into another project, the main Python entrypoint is dfod_runtime.py.

Example:

from dfod_runtime import run_dfod_inference

result = run_dfod_inference(
    image_path="path/to/query.jpg",
    memory_path="memory.npz",
    device="cpu",
)

print(result["detections"])
print(result["accepted_updates"])

The perception-only and reasoning-only APIs are:

Generated Artifacts

Typical generated files:

memory.npz: support-derived memory bundle
memory_updated.npz: updated memory after accepted pseudo-labels
adapter.pt: learned adapter weights
calibration.npz: rejection/update calibration artifact
dfod_output.json: single-image inference output
boxes.json, embeddings.npy: smoke artifacts
test_images/, calibration_images/: generated evaluation sets
evaluation_runs/: evaluation reports and final dynamic memory

The repo’s .gitignore is set up to ignore most generated datasets and artifacts.

Reproducibility Notes

Memory serialization is versioned and backward-compatible with legacy v1 memory files.
Support building sorts class names and image paths deterministically.
Multimodal clustering is deterministic under the configured seed.
Evaluation streams are manifest-driven rather than relying on directory iteration order.
Rejection and update thresholds can be stored in calibration.npz and reused across runs.

Running Tests

./.venv/bin/python -m unittest discover -s tests -v

The test suite covers:

memory serialization and v1-to-v2 upgrade
multimodal memory behavior
similarity and rejection logic
prototype update rules
calibration utilities
evaluation stream helpers

Troubleshooting

`ultralytics` says it is missing even after install

You are probably using the wrong Python interpreter. Use the wrappers or explicitly run:

./.venv/bin/python run_baseline.py

`run_baseline.py` fails because `support/...` does not exist

The repo does not include support images. Create a repo-root support/ directory first, then run:

bash rebuild_memory.sh

`prepare_test_images.py` cannot find your classes

Your support folder labels must be present in KNOWN_SUPPORT_IMAGENET_MAP if you want to use the ImageNet helper unchanged. Otherwise update evaluation_utils.py.

No detections are returned

Check:

the image actually contains a visible object
the object is not too small
the detector confidence threshold is not too strict
your environment can successfully import torch, torchvision, and ultralytics

Dynamic updates look unsafe

Rebuild memory, regenerate a clean calibration split, recalibrate thresholds, and rerun evaluation. Dynamic DFOD is sensitive to rejection/update calibration quality.

Minimal End-To-End Example

cd /path/to/dfod

python3.12 -m venv .venv
./.venv/bin/python -m pip install -r requirements.txt
./.venv/bin/python smoke_setup.py

# create support/<class_name>/*.jpg first
bash rebuild_memory.sh

# single-image inference
bash run_inference.sh /absolute/path/to/query.jpg

# optional calibration + evaluation
bash prepare_test_images.sh --imagenet-root /path/to/imagenet/val --output-dir calibration_images --known-offset 10 --unknown-offset 4
bash run_calibration.sh --manifest calibration_images/manifest.json --memory-path memory.npz --output-path calibration.npz
bash prepare_test_images.sh --imagenet-root /path/to/imagenet/val --output-dir test_images --known-offset 20 --unknown-offset 8
bash run_evaluation.sh --manifest test_images/manifest.json --memory-path memory.npz --calibration-path calibration.npz --output-dir evaluation_runs/main

Summary

This repo is best thought of as a research framework, not just a demo script. To use it correctly on a new machine:

Create .venv and install requirements.
Add your own support/ images.
Build memory.npz.
Run single-image inference.
If you want meaningful DFOD benchmarking, generate a held-out calibration split, calibrate thresholds, then run static-vs-dynamic evaluation.

That is the reproducible path from a code-only clone to a fully working DFOD experiment.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.vscode		.vscode
Baseline_Perception		Baseline_Perception
Baseline_Reasoning		Baseline_Reasoning
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
boxes.json		boxes.json
build_memory_from_support.py		build_memory_from_support.py
calibrate_dfod.py		calibrate_dfod.py
dfod_config.py		dfod_config.py
dfod_output.json		dfod_output.json
dfod_runtime.py		dfod_runtime.py
embeddings.npy		embeddings.npy
evaluate_dfod.py		evaluate_dfod.py
evaluation_utils.py		evaluation_utils.py
make_memory.py		make_memory.py
memory.npz		memory.npz
prepare_test_images.py		prepare_test_images.py
prepare_test_images.sh		prepare_test_images.sh
rebuild_memory.sh		rebuild_memory.sh
requirements.txt		requirements.txt
run_baseline.py		run_baseline.py
run_calibration.sh		run_calibration.sh
run_evaluation.sh		run_evaluation.sh
run_inference.sh		run_inference.sh
smoke_reasoning.py		smoke_reasoning.py
smoke_setup.py		smoke_setup.py
smoke_test.py		smoke_test.py
yolov8n.pt		yolov8n.pt

Folders and files

Latest commit

History

Repository files navigation

Dynamic Few-Shot Object Detection

What The Repo Does

Repository Layout

Environment Setup

Recommended Python Version

Install

Verify The Environment

Data Requirements

Support Folder Layout

What Happens During Support Build

Quick Start

1. Build Memory From Support

2. Run Single-Image Inference

Core Runtime Scripts

Evaluation Pipeline

ImageNet-Based Evaluation Workflow

Step 1: Prepare A Calibration Split

Step 2: Calibrate Rejection And Update Thresholds

Step 3: Prepare A Test Stream

Step 4: Run Static-Vs-Dynamic Evaluation

Evaluation Metrics

Using Custom Evaluation Data

Programmatic Usage

Generated Artifacts

Reproducibility Notes

Running Tests

Troubleshooting

ultralytics says it is missing even after install

run_baseline.py fails because support/... does not exist

prepare_test_images.py cannot find your classes

No detections are returned

Dynamic updates look unsafe

Minimal End-To-End Example

Summary

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`ultralytics` says it is missing even after install

`run_baseline.py` fails because `support/...` does not exist

`prepare_test_images.py` cannot find your classes

Packages