This repository implements a dynamic few-shot object detection framework built on top of a simple YOLO-based baseline. The baseline path is intentionally bare: detect objects, crop them, embed them with a frozen backbone, compare to prototype memory, and return a label or reject as unknown.
The current framework extends that baseline into an open-world, updateable DFOD system with:
- support-quality weighting
- multimodal class memory
- distributional prototypes with diagonal variance
- multiple similarity metrics with optional fusion
- calibration-aware unknown rejection
- strict pseudo-label updates for dynamic memory growth
- lightweight feature adapters
- a full static-vs-dynamic evaluation pipeline
This README is written for a fresh clone that contains only code. No support/, memory.npz, test_images/, or calibration_images/ directories are assumed to exist.
At a high level, the pipeline is:
image -> detection -> crop -> embedding -> memory lookup -> rejection / prediction -> optional memory update
The main extension over the baseline is that memory is no longer a single unweighted class centroid. It is a versioned, updateable bundle of weighted support embeddings that can be clustered into multiple modes, scored with several metrics, calibrated on held-out data, and safely expanded over time.
Baseline_Perception/
io.py Image loading and crop references
detector.py YOLO detection + auxiliary objectness
cropper.py Box-to-crop conversion
embedder.py Frozen backbone + lightweight adapter
quality.py Crop-quality scoring and reranking
validation.py Contract checks for detections and embeddings
Main_perception.py End-to-end perception pipeline
Baseline_Reasoning/
memory.py Weighted multimodal memory construction
similarity.py Cosine / Euclidean / Manhattan / Mahalanobis scoring
rejection.py Unknown rejection on a unified score space
calibration.py Rejection/update calibration
prototype_update.py Strict pseudo-label memory updates
serialization.py Versioned memory bundle I/O
formatting.py User-facing output formatting
Main_reasoning.py End-to-end reasoning pipeline
dfod_config.py Namespaced configuration + backward-compatible aliases
dfod_runtime.py Shared runtime for inference, calibration, and evaluation
build_memory_from_support.py
run_baseline.py
calibrate_dfod.py
prepare_test_images.py
evaluate_dfod.py
rebuild_memory.sh Wrapper: build memory using repo-local .venv
run_inference.sh Wrapper: single-image inference using repo-local .venv
run_calibration.sh Wrapper: calibration using repo-local .venv
prepare_test_images.sh Wrapper: evaluation image generation using repo-local .venv
run_evaluation.sh Wrapper: static-vs-dynamic evaluation using repo-local .venv
The wrapper scripts assume a repo-local virtual environment at .venv.
- Python
3.12
cd /path/to/dfod
python3.12 -m venv .venv
./.venv/bin/python -m pip install --upgrade pip
./.venv/bin/python -m pip install -r requirements.txt./.venv/bin/python smoke_setup.pyThis checks importability of:
numpyPILtorchtorchvisionultralytics
Notes:
- On first real detector run, Ultralytics may download YOLO weights.
- The wrapper scripts always use
.venv/bin/python, which avoids common Conda/venv mismatches. - The provided shell wrappers currently run on
cpufor portability.
This repo does not ship support images, query images, or evaluation images.
You must provide at least:
- A
support/folder for memory construction. - One or more query images for inference.
- Optionally, an ImageNet-style folder root if you want to use the automated calibration/evaluation pipeline.
Create a support/ directory at the repo root:
support/
dog/
001.jpg
002.jpg
...
cup/
001.jpg
002.jpg
...
keyboard/
001.jpg
002.jpg
...
Rules:
- One subdirectory per class.
- Folder name is the class label used by memory and outputs.
- Supported image extensions are
.jpg,.jpeg, and.png. - At least
3support images per class is the practical minimum. 6+support images per class is recommended if you want multimodal memory to activate reliably.
For each support image, the framework:
- Runs YOLO detection.
- Optionally attaches auxiliary objectness from Faster R-CNN.
- Computes a crop-quality score.
- Selects the best crop from that image.
- Embeds the crop.
- Computes support weights from crop quality and within-class centrality.
- Builds a versioned memory bundle.
- Optionally trains and saves a lightweight adapter.
After you create support/, build memory.npz:
bash rebuild_memory.shOutputs created:
memory.npzadapter.ptif adapter training is enabled
Run inference on one image:
bash run_inference.sh /absolute/path/to/image.jpgOr on an image inside the repo:
bash run_inference.sh support/dog/001.jpgWhat this does:
- loads
memory.npz - runs perception on the query image
- classifies each detected object
- applies unknown rejection
- optionally accepts high-confidence pseudo-updates
Outputs created:
dfod_output.jsonmemory_updated.npzif pseudo-updates are accepted
run_baseline.py automatically uses calibration.npz if that file exists in the repo root.
These are the main entrypoints most users need:
bash rebuild_memory.shbash run_inference.sh path/to/image.jpgbash run_calibration.sh ...bash prepare_test_images.sh ...bash run_evaluation.sh ...
You can also call the Python scripts directly with ./.venv/bin/python ..., but the shell wrappers are safer because they always use the repo-local environment.
The evaluation stack compares:
static: inference without memory updatesdynamic: inference with online pseudo-label memory updates
It is designed for open-world DFOD, so evaluation includes:
- known-class recognition
- unknown-class rejection
- support growth over time
- pseudo-update safety
The provided automated workflow assumes you have an ImageNet-style validation root containing class directories. The helper script copies held-out images into:
calibration_images/test_images/
The class mappings currently live in evaluation_utils.py.
Current supported known support labels for the automated ImageNet workflow:
airplanebackpackbananabearbicyclebirdbottlebroccolicatchairclockcouchcupdogelephantkeyboardlaptoporangepizzatv
Current automated unknown evaluation labels:
benchboatbuscell_phoneteddy_beartruckvasezebra
If your chosen support classes are different, update the mapping dictionaries in evaluation_utils.py.
Use support images and ImageNet-style held-out images to create calibration_images/:
bash prepare_test_images.sh \
--imagenet-root /path/to/imagenet/val \
--support-dir support \
--output-dir calibration_images \
--known-per-class 4 \
--unknown-per-class 4 \
--known-offset 10 \
--unknown-offset 4This writes:
calibration_images/manifest.jsoncalibration_images/manifest.csvcalibration_images/summary.json
Use offsets that do not overlap with your support images.
bash run_calibration.sh \
--manifest calibration_images/manifest.json \
--memory-path memory.npz \
--output-path calibration.npzOutputs created:
calibration.npzcalibration.jsoncalibration_records.jsoncalibration_records.csv
The calibration artifact stores:
- rejection score threshold
- rejection margin threshold
- update score threshold
- update margin threshold
- held-out known/unknown performance estimates
Create a disjoint evaluation stream:
bash prepare_test_images.sh \
--imagenet-root /path/to/imagenet/val \
--support-dir support \
--output-dir test_images \
--known-per-class 4 \
--unknown-per-class 4 \
--known-offset 20 \
--unknown-offset 8This produces a manifest-ordered stream that interleaves known and unknown images round by round.
bash run_evaluation.sh \
--manifest test_images/manifest.json \
--memory-path memory.npz \
--calibration-path calibration.npz \
--output-dir evaluation_runs/mainOutputs created in evaluation_runs/main/:
static_summary.jsondynamic_summary.jsoncomparison.jsonstatic_records.jsondynamic_records.jsonstatic_records.csvdynamic_records.csvdynamic_final_memory.npz- copied adapter weights for each branch
The main summaries track:
known_top1_accuracyknown_any_match_accuracyunknown_rejection_rateunknown_false_accept_rateoverall_image_scoreaccepted_updates_totalaccepted_updates_unknown- support growth before and after the stream
- per-round known accuracy and unknown rejection
These metrics are specifically useful for DFOD because they measure both recognition quality and the safety of dynamic memory expansion.
You do not have to use the ImageNet helper.
If you already have your own test stream, write a manifest JSON with entries like:
[
{
"image_path": "/abs/path/to/image1.jpg",
"eval_split": "known",
"expected_label": "dog",
"source_label": "dog",
"round_index": 0
},
{
"image_path": "/abs/path/to/image2.jpg",
"eval_split": "unknown",
"expected_label": "unknown",
"source_label": "zebra",
"round_index": 0
}
]Then run:
bash run_evaluation.sh --manifest /path/to/manifest.json --memory-path memory.npz --calibration-path calibration.npzIf you want to integrate the framework into another project, the main Python entrypoint is dfod_runtime.py.
Example:
from dfod_runtime import run_dfod_inference
result = run_dfod_inference(
image_path="path/to/query.jpg",
memory_path="memory.npz",
device="cpu",
)
print(result["detections"])
print(result["accepted_updates"])The perception-only and reasoning-only APIs are:
Typical generated files:
memory.npz: support-derived memory bundlememory_updated.npz: updated memory after accepted pseudo-labelsadapter.pt: learned adapter weightscalibration.npz: rejection/update calibration artifactdfod_output.json: single-image inference outputboxes.json,embeddings.npy: smoke artifactstest_images/,calibration_images/: generated evaluation setsevaluation_runs/: evaluation reports and final dynamic memory
The repo’s .gitignore is set up to ignore most generated datasets and artifacts.
- Memory serialization is versioned and backward-compatible with legacy v1 memory files.
- Support building sorts class names and image paths deterministically.
- Multimodal clustering is deterministic under the configured seed.
- Evaluation streams are manifest-driven rather than relying on directory iteration order.
- Rejection and update thresholds can be stored in
calibration.npzand reused across runs.
./.venv/bin/python -m unittest discover -s tests -vThe test suite covers:
- memory serialization and v1-to-v2 upgrade
- multimodal memory behavior
- similarity and rejection logic
- prototype update rules
- calibration utilities
- evaluation stream helpers
You are probably using the wrong Python interpreter. Use the wrappers or explicitly run:
./.venv/bin/python run_baseline.pyThe repo does not include support images. Create a repo-root support/ directory first, then run:
bash rebuild_memory.shYour support folder labels must be present in KNOWN_SUPPORT_IMAGENET_MAP if you want to use the ImageNet helper unchanged. Otherwise update evaluation_utils.py.
Check:
- the image actually contains a visible object
- the object is not too small
- the detector confidence threshold is not too strict
- your environment can successfully import
torch,torchvision, andultralytics
Rebuild memory, regenerate a clean calibration split, recalibrate thresholds, and rerun evaluation. Dynamic DFOD is sensitive to rejection/update calibration quality.
cd /path/to/dfod
python3.12 -m venv .venv
./.venv/bin/python -m pip install -r requirements.txt
./.venv/bin/python smoke_setup.py
# create support/<class_name>/*.jpg first
bash rebuild_memory.sh
# single-image inference
bash run_inference.sh /absolute/path/to/query.jpg
# optional calibration + evaluation
bash prepare_test_images.sh --imagenet-root /path/to/imagenet/val --output-dir calibration_images --known-offset 10 --unknown-offset 4
bash run_calibration.sh --manifest calibration_images/manifest.json --memory-path memory.npz --output-path calibration.npz
bash prepare_test_images.sh --imagenet-root /path/to/imagenet/val --output-dir test_images --known-offset 20 --unknown-offset 8
bash run_evaluation.sh --manifest test_images/manifest.json --memory-path memory.npz --calibration-path calibration.npz --output-dir evaluation_runs/mainThis repo is best thought of as a research framework, not just a demo script. To use it correctly on a new machine:
- Create
.venvand install requirements. - Add your own
support/images. - Build
memory.npz. - Run single-image inference.
- If you want meaningful DFOD benchmarking, generate a held-out calibration split, calibrate thresholds, then run static-vs-dynamic evaluation.
That is the reproducible path from a code-only clone to a fully working DFOD experiment.