Skip to content

Conversation

@benjamintli
Copy link

@benjamintli benjamintli commented Jan 25, 2026

Summary

This PR is an attempt to implement support for ObjectDetection models in optimum ExecuTorch. This PR adds ExecuTorchModelForObjectDetection, ObjectDetectionExportableModule, a task for object-detection, and a test for DETR type models.

Notes

  • ObjectDetectionExportableModule traces an object detection model, and stores num_channels, image_size, get_label_ids and get_label_names
    • num_channels is not consistently defined in configs for the existing object detection models, so the ObjectDetectionExportableModule will try a few different config options, and if it can't resolve it it'll default to 3 (RGB), which I feel like is a sensible default. the priority goes kwargs defined num_channels -> config defined num_channels -> 3
    • image_size is also not defined in configurations typically, as some models have dynamic sizes; there's not really a sensible default, since users would pick a size to use at inference/deployment time. So for this, the image size is passed in via the CLI and will only be used for object-detection models
    • get_label_ids and get_label_names are two flat lists that are used to construct id2label; for some reason executorch doesn't seem to support storing dicts as values in constant_methods (it gets flattened)
  • ExecuTorchModelForObjectDetection has 3 attributes: num_channels, image_size, and id2label, which is a dict of class ids to labels.
  • added timm as a dev dependency for DETR

Testing

Loaded ExecuTorchModelForObjectDetection
=== Running inference ===
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
[cpuinfo_utils.cpp:71] Reading file /sys/devices/soc0/image_version
[cpuinfo_utils.cpp:87] Failed to open midr file /sys/devices/soc0/image_version
logits shape: torch.Size([1, 100, 92])
pred_boxes shape: torch.Size([1, 100, 4])

=== Detections (confidence > 0.7) ===
  remote: 1.00 @ box [38.75445556640625, 70.63130187988281, 177.2999267578125, 117.51690673828125]
  remote: 0.99 @ box [334.3410949707031, 73.98548126220703, 369.325439453125, 188.1864776611328]
  couch: 1.00 @ box [-0.009555816650390625, 1.426863670349121, 639.6910400390625, 474.4981994628906]
  cat: 1.00 @ box [11.712303161621094, 51.7900390625, 314.5272216796875, 469.29779052734375]
  cat: 1.00 @ box [345.2006530761719, 23.41107940673828, 640.120849609375, 370.40679931640625]

=== Drawing bounding boxes ===
Saved result to detections_output.jpg

Done!

This is my first PR in this repo, so I'm open to any feedback!

@benjamintli benjamintli marked this pull request as ready for review January 26, 2026 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant