-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Summary
Currently, snapshots capture only a single message per topic at the moment of fault confirmation (JSON in SQLite). This is limiting for debugging because we don't have visibility into what happened before the fault occurred.
Add optional rosbag2 integration to capture a time-window of topic data (e.g., 5 seconds before fault + 1 second after), enabling "black box" style recording for post-mortem analysis.
Proposed solution (optional)
Tiered configuration approach
Level 1 (current, unchanged): JSON snapshots - single message per topic, stored in SQLite.
Level 2 (new, opt-in): Simple rosbag - enable with rosbag.enabled: true, uses sensible defaults.
Level 3 (new, advanced): Custom rosbag config - full control over duration, topics, format, storage.
Configuration schema
snapshots:
enabled: true
# === Existing JSON config (unchanged) ===
default_topics: ["/odom", "/cmd_vel"]
config_file: "snapshots.yaml"
# === New rosbag config ===
rosbag:
enabled: false # opt-in
# Time window
duration_sec: 5.0 # seconds before fault (default 5s)
duration_after_sec: 1.0 # seconds after CONFIRMED
# Topics: "config" (reuse JSON config) | "all" | [explicit list]
topics: "config"
include_topics: [] # add to resolved list
exclude_topics: [] # remove from list
# Performance tuning
lazy_start: false # true = start buffer only on PREFAILED
# Storage
format: "sqlite3" # "sqlite3" | "mcap"
storage_path: "" # empty = temp dir
max_bag_size_mb: 50
max_total_storage_mb: 500
auto_cleanup: true # delete bag when fault CLEAREDArchitecture
NORMAL → Ring buffer running (lazy_start: false)
↓
PREFAILED → Continue buffering
↓
CONFIRMED → 1. JSON snapshot (existing)
2. Flush ring buffer to .mcap/.db3
3. Record duration_after_sec more
4. Close bag, store path in DB
↓
CLEARED → auto_cleanup: true → delete bag file
REST API extension
GET /api/v1/faults/{code}/snapshots
Response includes:
{
"topics": { ... },
"rosbag": {
"available": true,
"duration_sec": 6.0,
"size_bytes": 2456789,
"download_url": "/api/v1/faults/{code}/snapshots/bag"
}
}
GET /api/v1/faults/{code}/snapshots/bag
→ Returns bag file download
Additional context (optional)
Key design decisions
- Default lazy_start: false — Basic configs have instant PREFAILED→CONFIRMED, lazy would miss data
- Default duration_sec: 5.0 — Balance between usefulness and RAM usage
- Default format: sqlite3 — Easier to inspect for development; MCAP as optimization
- Default topics: "config" — Reuse existing JSON topic config - zero extra setup
Risk mitigations
- RAM explosion with "all" topics — Document warning, recommend exclude_topics for cameras
- Storage explosion — max_bag_size_mb, max_total_storage_mb, auto_cleanup
- Always-on overhead — lazy_start: true option for resource-constrained systems
Dependencies
- rosbag2_cpp for ring buffer and writing
- rosbag2_storage_mcap (optional) for MCAP format
Related
- Current snapshot implementation: Add Snapshot Capture #81
- rosbag2: https://github.com/ros2/rosbag2
- MCAP format: https://mcap.dev/