Skip to content

Native bounding box detection with dynamic text prompting#21

Open
Priyadarshini75 wants to merge 2 commits intodataplayer12:mainfrom
Priyadarshini75:feature/native-bbox-detection
Open

Native bounding box detection with dynamic text prompting#21
Priyadarshini75 wants to merge 2 commits intodataplayer12:mainfrom
Priyadarshini75:feature/native-bbox-detection

Conversation

@Priyadarshini75
Copy link
Copy Markdown

Summary

Adds native bounding box detection directly from SAM3 model outputs and enables
dynamic text prompting at runtime, eliminating the need for hardcoded token IDs.

Changes

Native Bounding Box Detection

  • Extract pred_boxes and pred_logits from the TensorRT engine outputs
  • Implement CUDA post-processing kernel for bounding box coordinate decoding
  • Render green bounding boxes with text labels via OpenCV in the demo app

Dynamic Text Prompting

  • Add export_tokenizer.py to export HuggingFace tokenizer files for C++ use
  • Add tokenize_prompt.py for standalone prompt tokenization
  • Accept the text prompt as a CLI argument instead of compile-time constants
  • Dynamically tokenize and feed prompts into the engine at runtime

Dynamic Image Size Processing

  • Support variable input image resolutions instead of fixed dimensions
  • Improve benchmarking workflow with proper timing isolation

Updated Documentation

  • Added usage instructions for bounding box visualization and benchmarking
  • Updated README with tokenizer setup steps

Usage

# Export tokenizer (one-time setup)
python3 python/export_tokenizer.py

# Run with any text prompt
./sam3_pcs_app /workspace/test_images /workspace/sam3_fp16.plan "helmet"

# Benchmark mode (no visualization overhead)
./sam3_pcs_app /workspace/test_images /workspace/sam3_fp16.plan "helmet" 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant