π― A production-quality object detection & tracking pipeline for workspace monitoring β demonstrating real-world ML engineering with self-supervised evaluation metrics.
Smart workspace monitoring enables:
- Productivity analytics β Track object interactions over time
- Ergonomics research β Monitor desk setup and posture indicators
- Automated inventory β Detect and track items on workspaces
This project demonstrates end-to-end ML pipeline engineering: from raw video to tracked objects with quality metrics β all without requiring ground truth annotations.
| Feature | Description |
|---|---|
| Object Detection | Pre-trained Mask R-CNN with configurable confidence thresholds |
| Multi-Object Tracking | SORT algorithm with 8D Kalman filtering |
| Self-Supervised Evaluation | Quality metrics without ground truth |
| Modular Architecture | Clean separation of detection, tracking, I/O, and evaluation |
| Multiple Outputs | COCO JSON annotations, visualization frames, evaluation reports |
| CLI + Python API | Flexible usage for scripts or integration |
The built-in evaluation framework measures tracking quality without ground truth:
| Video | Overall | Continuity | Stability | Tracks | ID Switches |
|---|---|---|---|---|---|
| video1 (complex) | 36.8 | 66.5 | 25.4 | 23 | 6 |
| video2 (medium) | 44.4 | 67.8 | 43.3 | 11 | 3 |
| video4 (simple) | 78.4 | 95.9 | 100.0 | 8 | 0 |
| Average | 53.2 | 76.7 | 56.3 | - | - |
- β 100% stability on simple scenes (β€8 concurrent tracks)
β οΈ Stability degrades with scene complexity (IoU-based matching limitation)- π§ Identified bottleneck: ID association in crowded scenes β recommends Deep SORT
objectSpace/
βββ src/objectSpace/
β βββ detection/ # Mask R-CNN object detection
β β βββ base.py # Abstract detector interface
β β βββ mask_rcnn.py # Mask R-CNN implementation
β βββ tracking/ # SORT with Kalman filtering
β β βββ kalman.py # Kalman filter implementation
β β βββ association.py # IoU & Hungarian matching
β β βββ sort_tracker.py # SORT algorithm
β βββ evaluation/ # Self-supervised quality metrics
β β βββ metrics.py # Metric dataclasses
β β βββ analyzer.py # TrackingAnalyzer
β β βββ reporter.py # Report generation
β β βββ integration.py # Pipeline integration
β βββ io/ # Video I/O and COCO export
β β βββ video.py # Video reading
β β βββ export.py # COCO JSON export
β βββ pipeline.py # Main orchestration
β βββ config.py # Typed configuration
βββ tests/ # Unit & integration tests
β βββ evaluation/ # Evaluation module tests
β βββ test_detection.py
β βββ test_tracking.py
βββ examples/ # Demo notebooks
β βββ demo.ipynb # Interactive demo
βββ configs/ # YAML configurations
β βββ default.yaml
β βββ tuned.yaml
βββ assets/ # Demo media
βββ demo.gif
git clone https://github.com/SAMithila/objectSpace.git
cd objectSpace
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"from objectSpace import DetectionTrackingPipeline
pipeline = DetectionTrackingPipeline()
results = pipeline.process_video("video.mp4", output_dir="output/")# Get tracking results + quality metrics
results, evaluation = pipeline.process_video_with_evaluation("video.mp4")
print(f"Overall Score: {evaluation.overall_score:.1f}/100")
print(f"ID Switches: {evaluation.id_switches.total_switches}")# Process single video
python process_one_video.py task3.1_video1
# Evaluate existing results
python run_evaluation.py
# Compare all videos
python compare_videos.pyThe evaluation module computes tracking quality without ground truth annotations:
| Metric | What It Measures |
|---|---|
| Continuity Score | Track completeness (gaps, fragmentation) |
| Stability Score | ID consistency (fewer switches = better) |
| Speed Score | Processing FPS vs target |
| Overall Score | Weighted combination |
from objectSpace.pipeline import evaluate_annotations
# Evaluate existing tracking results
result = evaluate_annotations("output/video_annotations.json")
print(f"Fragmented tracks: {result.fragmentation.fragmented_tracks}")
print(f"ID switches: {result.id_switches.total_switches}")
print(f"Avg coverage: {result.fragmentation.avg_coverage_ratio:.1%}")python compare_videos.pyOutput:
EVALUATION COMPARISON
================================================================================
Video Overall Cont. Stab. Speed Tracks
--------------------------------------------------------------------------------
task3.1_video1 36.8 66.5 25.4 0.0 23
task3.1_video2 44.4 67.8 43.3 0.0 11
task3.1_video4 78.4 95.9 100.0 0.0 8
--------------------------------------------------------------------------------
AVERAGE 53.2 76.7 56.3 0.0 42
Default settings in configs/default.yaml:
| Parameter | Default | Description |
|---|---|---|
detector.device |
auto | CPU/CUDA selection |
detector.default_confidence |
0.3 | Detection threshold |
tracker.max_age |
8 | Frames to keep lost tracks |
tracker.iou_threshold |
0.3 | Minimum IoU for matching |
Based on evaluation results, configs/tuned.yaml improves performance:
tracker:
max_age: 15 # Handles longer occlusions
iou_threshold: 0.2 # Fewer false ID switches# Run tests
pytest tests/ -v
# Run specific test module
pytest tests/evaluation/ -v
# Run with coverage
pytest tests/ --cov=objectSpace --cov-report=term-missing{
"annotations": [
{
"id": 1,
"image_id": 0,
"category_id": 1,
"bbox": [100, 100, 50, 80],
"track_id": 0
}
]
}*_evaluation.jsonβ Machine-readable metrics*_evaluation.mdβ Human-readable reportEVALUATION_SUMMARY.mdβ Cross-video comparison
This project demonstrates:
- Modular Design β Separate concerns for detection, tracking, evaluation
- Type Safety β Full type hints with dataclasses
- Configuration Management β YAML configs with typed validation
- Self-Supervised ML β Quality metrics without labeled data
- Production Patterns β Logging, error handling, CLI interface
- CI/CD β GitHub Actions for automated testing
from objectSpace.detection import BaseDetector
class YOLODetector(BaseDetector):
def detect(self, frame):
# Your implementation
passfrom objectSpace.evaluation import TrackingAnalyzer
class CustomAnalyzer(TrackingAnalyzer):
def compute_custom_metric(self, annotations):
# Your metric logic
passMIT License β see LICENSE for details.
- SORT β Bewley et al.
- Mask R-CNN β He et al.
- torchvision β Pre-trained models
