Skip to content

notebooks: MCAP robotics DataFrame demo#37

Open
everettVT wants to merge 2 commits into
mainfrom
everettVT/mcap-robotics-demo
Open

notebooks: MCAP robotics DataFrame demo#37
everettVT wants to merge 2 commits into
mainfrom
everettVT/mcap-robotics-demo

Conversation

@everettVT
Copy link
Copy Markdown
Contributor

Summary

A walkthrough notebook for daft.read_mcap against a real robotics recording — a 1.19 GB MCAP from the DapengFeng/MCAP dataset on HuggingFace (FAST-LIVO/hku1: Livox LiDAR + IMU + stereo compressed cameras, 29,702 messages over 127.7s).

Covers the things a robotics user would actually ask of an MCAP DataFrame:

  • Schema inference (topic / log_time / publish_time / sequence / data)
  • Per-topic groupby for "what's in this recording"
  • Topic filter pushdown — topics=[\"/livox/imu\"] avoids materializing camera payloads
  • Time-window pushdown via start_time / end_time
  • topic_start_time_resolver — the new per-file per-topic keyframe alignment from Daft #5886 (v0.7.2)
  • A 50ms-bucket join between LiDAR sweeps and IMU samples — the time-aligned query that's painful in raw MCAP

Known limitation

Direct HTTP reads (daft.read_mcap(\"https://...\")) currently fail with TypeError: Expected a FileSystemHandler instance, got HTTPFileSystemdaft/filesystem.py wraps fsspec in PyFileSystem without going through FSSpecHandler. Tracked in Eventual-Inc/Daft#6983. Notebook uses `huggingface_hub.hf_hub_download` as the workaround; will update once #6983 lands.

Placement

Dropped in `notebooks/` alongside the existing format-walkthrough notebooks (`getting_started_with_common_crawl.ipynb`, `window_functions.ipynb`). Happy to convert to a PEP 723 script under `examples/io/` in a follow-up if that fits the repo direction better.

Test plan

  • `jupyter nbconvert --execute` runs against a fresh `~/.cache/huggingface` (note: 1.2 GB download)
  • Schema output matches the README claims in the markdown cells
  • Topic counts: `/livox/imu` ≈ 25,871, other three ≈ 1,277 each
  • Verify `daft[mcap]>=0.7.13` extras spec resolves (the install cell pins it)

🤖 Generated with Claude Code

Walks daft.read_mcap against a 1.19 GB MCAP from the DapengFeng/MCAP
dataset on HuggingFace (FAST-LIVO/hku1: Livox LiDAR + IMU + stereo
compressed cameras, 29,702 messages over 127.7s).

Covers schema inference, per-topic groupby, topic+time pushdown,
topic_start_time_resolver (per-file per-topic keyframe alignment, new
in v0.7.2), and a 50ms-bucket join between LiDAR sweeps and IMU samples.

Direct HTTP reads (daft.read_mcap("https://...")) tracked in
Eventual-Inc/Daft#6983 — notebook downloads via huggingface_hub for now.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 30af30239a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread notebooks/mcap_robotics_dataframe.ipynb Outdated
" .with_column(\"bucket\", daft.col(\"lidar_ts\") // BUCKET_NS)\n",
")\n",
"\n",
"joined = lidar.join(imu, on=\"bucket\").select(\n",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Match adjacent buckets for ±50ms window

The bucket equality join only matches IMU/LiDAR rows that fall in the exact same 50ms epoch bucket, which misses valid pairs near bucket boundaries even when they are within ±50ms (for example, timestamps 2ms apart on opposite sides of a boundary). Because this section claims to return IMU samples within ±50ms of each LiDAR frame, the current logic silently drops qualifying matches and can skew downstream alignment/fusion analyses.

Useful? React with 👍 / 👎.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant