Skip to content

UTMIST/dfod

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dynamic Few-Shot Object Detection

This repository implements a dynamic few-shot object detection framework built on top of a simple YOLO-based baseline. The baseline path is intentionally bare: detect objects, crop them, embed them with a frozen backbone, compare to prototype memory, and return a label or reject as unknown.

The current framework extends that baseline into an open-world, updateable DFOD system with:

  • support-quality weighting
  • multimodal class memory
  • distributional prototypes with diagonal variance
  • multiple similarity metrics with optional fusion
  • calibration-aware unknown rejection
  • strict pseudo-label updates for dynamic memory growth
  • lightweight feature adapters
  • a full static-vs-dynamic evaluation pipeline

This README is written for a fresh clone that contains only code. No support/, memory.npz, test_images/, or calibration_images/ directories are assumed to exist.

What The Repo Does

At a high level, the pipeline is:

image -> detection -> crop -> embedding -> memory lookup -> rejection / prediction -> optional memory update

The main extension over the baseline is that memory is no longer a single unweighted class centroid. It is a versioned, updateable bundle of weighted support embeddings that can be clustered into multiple modes, scored with several metrics, calibrated on held-out data, and safely expanded over time.

Repository Layout

Baseline_Perception/
  io.py                  Image loading and crop references
  detector.py            YOLO detection + auxiliary objectness
  cropper.py             Box-to-crop conversion
  embedder.py            Frozen backbone + lightweight adapter
  quality.py             Crop-quality scoring and reranking
  validation.py          Contract checks for detections and embeddings
  Main_perception.py     End-to-end perception pipeline

Baseline_Reasoning/
  memory.py              Weighted multimodal memory construction
  similarity.py          Cosine / Euclidean / Manhattan / Mahalanobis scoring
  rejection.py           Unknown rejection on a unified score space
  calibration.py         Rejection/update calibration
  prototype_update.py    Strict pseudo-label memory updates
  serialization.py       Versioned memory bundle I/O
  formatting.py          User-facing output formatting
  Main_reasoning.py      End-to-end reasoning pipeline

dfod_config.py           Namespaced configuration + backward-compatible aliases
dfod_runtime.py          Shared runtime for inference, calibration, and evaluation
build_memory_from_support.py
run_baseline.py
calibrate_dfod.py
prepare_test_images.py
evaluate_dfod.py

rebuild_memory.sh        Wrapper: build memory using repo-local .venv
run_inference.sh         Wrapper: single-image inference using repo-local .venv
run_calibration.sh       Wrapper: calibration using repo-local .venv
prepare_test_images.sh   Wrapper: evaluation image generation using repo-local .venv
run_evaluation.sh        Wrapper: static-vs-dynamic evaluation using repo-local .venv

Environment Setup

The wrapper scripts assume a repo-local virtual environment at .venv.

Recommended Python Version

  • Python 3.12

Install

cd /path/to/dfod
python3.12 -m venv .venv
./.venv/bin/python -m pip install --upgrade pip
./.venv/bin/python -m pip install -r requirements.txt

Verify The Environment

./.venv/bin/python smoke_setup.py

This checks importability of:

  • numpy
  • PIL
  • torch
  • torchvision
  • ultralytics

Notes:

  • On first real detector run, Ultralytics may download YOLO weights.
  • The wrapper scripts always use .venv/bin/python, which avoids common Conda/venv mismatches.
  • The provided shell wrappers currently run on cpu for portability.

Data Requirements

This repo does not ship support images, query images, or evaluation images.

You must provide at least:

  1. A support/ folder for memory construction.
  2. One or more query images for inference.
  3. Optionally, an ImageNet-style folder root if you want to use the automated calibration/evaluation pipeline.

Support Folder Layout

Create a support/ directory at the repo root:

support/
  dog/
    001.jpg
    002.jpg
    ...
  cup/
    001.jpg
    002.jpg
    ...
  keyboard/
    001.jpg
    002.jpg
    ...

Rules:

  • One subdirectory per class.
  • Folder name is the class label used by memory and outputs.
  • Supported image extensions are .jpg, .jpeg, and .png.
  • At least 3 support images per class is the practical minimum.
  • 6+ support images per class is recommended if you want multimodal memory to activate reliably.

What Happens During Support Build

For each support image, the framework:

  1. Runs YOLO detection.
  2. Optionally attaches auxiliary objectness from Faster R-CNN.
  3. Computes a crop-quality score.
  4. Selects the best crop from that image.
  5. Embeds the crop.
  6. Computes support weights from crop quality and within-class centrality.
  7. Builds a versioned memory bundle.
  8. Optionally trains and saves a lightweight adapter.

Quick Start

1. Build Memory From Support

After you create support/, build memory.npz:

bash rebuild_memory.sh

Outputs created:

  • memory.npz
  • adapter.pt if adapter training is enabled

2. Run Single-Image Inference

Run inference on one image:

bash run_inference.sh /absolute/path/to/image.jpg

Or on an image inside the repo:

bash run_inference.sh support/dog/001.jpg

What this does:

  • loads memory.npz
  • runs perception on the query image
  • classifies each detected object
  • applies unknown rejection
  • optionally accepts high-confidence pseudo-updates

Outputs created:

  • dfod_output.json
  • memory_updated.npz if pseudo-updates are accepted

run_baseline.py automatically uses calibration.npz if that file exists in the repo root.

Core Runtime Scripts

These are the main entrypoints most users need:

  • bash rebuild_memory.sh
  • bash run_inference.sh path/to/image.jpg
  • bash run_calibration.sh ...
  • bash prepare_test_images.sh ...
  • bash run_evaluation.sh ...

You can also call the Python scripts directly with ./.venv/bin/python ..., but the shell wrappers are safer because they always use the repo-local environment.

Evaluation Pipeline

The evaluation stack compares:

  • static: inference without memory updates
  • dynamic: inference with online pseudo-label memory updates

It is designed for open-world DFOD, so evaluation includes:

  • known-class recognition
  • unknown-class rejection
  • support growth over time
  • pseudo-update safety

ImageNet-Based Evaluation Workflow

The provided automated workflow assumes you have an ImageNet-style validation root containing class directories. The helper script copies held-out images into:

  • calibration_images/
  • test_images/

The class mappings currently live in evaluation_utils.py.

Current supported known support labels for the automated ImageNet workflow:

  • airplane
  • backpack
  • banana
  • bear
  • bicycle
  • bird
  • bottle
  • broccoli
  • cat
  • chair
  • clock
  • couch
  • cup
  • dog
  • elephant
  • keyboard
  • laptop
  • orange
  • pizza
  • tv

Current automated unknown evaluation labels:

  • bench
  • boat
  • bus
  • cell_phone
  • teddy_bear
  • truck
  • vase
  • zebra

If your chosen support classes are different, update the mapping dictionaries in evaluation_utils.py.

Step 1: Prepare A Calibration Split

Use support images and ImageNet-style held-out images to create calibration_images/:

bash prepare_test_images.sh \
  --imagenet-root /path/to/imagenet/val \
  --support-dir support \
  --output-dir calibration_images \
  --known-per-class 4 \
  --unknown-per-class 4 \
  --known-offset 10 \
  --unknown-offset 4

This writes:

  • calibration_images/manifest.json
  • calibration_images/manifest.csv
  • calibration_images/summary.json

Use offsets that do not overlap with your support images.

Step 2: Calibrate Rejection And Update Thresholds

bash run_calibration.sh \
  --manifest calibration_images/manifest.json \
  --memory-path memory.npz \
  --output-path calibration.npz

Outputs created:

  • calibration.npz
  • calibration.json
  • calibration_records.json
  • calibration_records.csv

The calibration artifact stores:

  • rejection score threshold
  • rejection margin threshold
  • update score threshold
  • update margin threshold
  • held-out known/unknown performance estimates

Step 3: Prepare A Test Stream

Create a disjoint evaluation stream:

bash prepare_test_images.sh \
  --imagenet-root /path/to/imagenet/val \
  --support-dir support \
  --output-dir test_images \
  --known-per-class 4 \
  --unknown-per-class 4 \
  --known-offset 20 \
  --unknown-offset 8

This produces a manifest-ordered stream that interleaves known and unknown images round by round.

Step 4: Run Static-Vs-Dynamic Evaluation

bash run_evaluation.sh \
  --manifest test_images/manifest.json \
  --memory-path memory.npz \
  --calibration-path calibration.npz \
  --output-dir evaluation_runs/main

Outputs created in evaluation_runs/main/:

  • static_summary.json
  • dynamic_summary.json
  • comparison.json
  • static_records.json
  • dynamic_records.json
  • static_records.csv
  • dynamic_records.csv
  • dynamic_final_memory.npz
  • copied adapter weights for each branch

Evaluation Metrics

The main summaries track:

  • known_top1_accuracy
  • known_any_match_accuracy
  • unknown_rejection_rate
  • unknown_false_accept_rate
  • overall_image_score
  • accepted_updates_total
  • accepted_updates_unknown
  • support growth before and after the stream
  • per-round known accuracy and unknown rejection

These metrics are specifically useful for DFOD because they measure both recognition quality and the safety of dynamic memory expansion.

Using Custom Evaluation Data

You do not have to use the ImageNet helper.

If you already have your own test stream, write a manifest JSON with entries like:

[
  {
    "image_path": "/abs/path/to/image1.jpg",
    "eval_split": "known",
    "expected_label": "dog",
    "source_label": "dog",
    "round_index": 0
  },
  {
    "image_path": "/abs/path/to/image2.jpg",
    "eval_split": "unknown",
    "expected_label": "unknown",
    "source_label": "zebra",
    "round_index": 0
  }
]

Then run:

bash run_evaluation.sh --manifest /path/to/manifest.json --memory-path memory.npz --calibration-path calibration.npz

Programmatic Usage

If you want to integrate the framework into another project, the main Python entrypoint is dfod_runtime.py.

Example:

from dfod_runtime import run_dfod_inference

result = run_dfod_inference(
    image_path="path/to/query.jpg",
    memory_path="memory.npz",
    device="cpu",
)

print(result["detections"])
print(result["accepted_updates"])

The perception-only and reasoning-only APIs are:

Generated Artifacts

Typical generated files:

  • memory.npz: support-derived memory bundle
  • memory_updated.npz: updated memory after accepted pseudo-labels
  • adapter.pt: learned adapter weights
  • calibration.npz: rejection/update calibration artifact
  • dfod_output.json: single-image inference output
  • boxes.json, embeddings.npy: smoke artifacts
  • test_images/, calibration_images/: generated evaluation sets
  • evaluation_runs/: evaluation reports and final dynamic memory

The repo’s .gitignore is set up to ignore most generated datasets and artifacts.

Reproducibility Notes

  • Memory serialization is versioned and backward-compatible with legacy v1 memory files.
  • Support building sorts class names and image paths deterministically.
  • Multimodal clustering is deterministic under the configured seed.
  • Evaluation streams are manifest-driven rather than relying on directory iteration order.
  • Rejection and update thresholds can be stored in calibration.npz and reused across runs.

Running Tests

./.venv/bin/python -m unittest discover -s tests -v

The test suite covers:

  • memory serialization and v1-to-v2 upgrade
  • multimodal memory behavior
  • similarity and rejection logic
  • prototype update rules
  • calibration utilities
  • evaluation stream helpers

Troubleshooting

ultralytics says it is missing even after install

You are probably using the wrong Python interpreter. Use the wrappers or explicitly run:

./.venv/bin/python run_baseline.py

run_baseline.py fails because support/... does not exist

The repo does not include support images. Create a repo-root support/ directory first, then run:

bash rebuild_memory.sh

prepare_test_images.py cannot find your classes

Your support folder labels must be present in KNOWN_SUPPORT_IMAGENET_MAP if you want to use the ImageNet helper unchanged. Otherwise update evaluation_utils.py.

No detections are returned

Check:

  • the image actually contains a visible object
  • the object is not too small
  • the detector confidence threshold is not too strict
  • your environment can successfully import torch, torchvision, and ultralytics

Dynamic updates look unsafe

Rebuild memory, regenerate a clean calibration split, recalibrate thresholds, and rerun evaluation. Dynamic DFOD is sensitive to rejection/update calibration quality.

Minimal End-To-End Example

cd /path/to/dfod

python3.12 -m venv .venv
./.venv/bin/python -m pip install -r requirements.txt
./.venv/bin/python smoke_setup.py

# create support/<class_name>/*.jpg first
bash rebuild_memory.sh

# single-image inference
bash run_inference.sh /absolute/path/to/query.jpg

# optional calibration + evaluation
bash prepare_test_images.sh --imagenet-root /path/to/imagenet/val --output-dir calibration_images --known-offset 10 --unknown-offset 4
bash run_calibration.sh --manifest calibration_images/manifest.json --memory-path memory.npz --output-path calibration.npz
bash prepare_test_images.sh --imagenet-root /path/to/imagenet/val --output-dir test_images --known-offset 20 --unknown-offset 8
bash run_evaluation.sh --manifest test_images/manifest.json --memory-path memory.npz --calibration-path calibration.npz --output-dir evaluation_runs/main

Summary

This repo is best thought of as a research framework, not just a demo script. To use it correctly on a new machine:

  1. Create .venv and install requirements.
  2. Add your own support/ images.
  3. Build memory.npz.
  4. Run single-image inference.
  5. If you want meaningful DFOD benchmarking, generate a held-out calibration split, calibrate thresholds, then run static-vs-dynamic evaluation.

That is the reproducible path from a code-only clone to a fully working DFOD experiment.

About

Dynamic Few-Shot Object Detection - ML Project, led by Abhinn Kaushik

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors