ESP32 8KB CSI embedding — v2 honest re-benchmark + converged encoder
🤗 https://huggingface.co/ruvnet/wifi-densepose-pretrained (v2 files added; v1 kept for compatibility)
What was wrong with v1
- The v1 contrastive encoder logged a flat training loss (
0.13517 every epoch) — the optimizer wasn't learning.
- Its "100% presence accuracy" headline was measured on a single-class recording: an overnight capture of one sleeping person, 6,062 / 6,063 frames labelled "present", 1 "absent". A constant "yes" predictor scores 99.98% — so the figure is real but says nothing about generalization. Retracted.
v2 fix (honest, label-free, time-disjoint)
Retrained the same 8→64→128 encoder with a working InfoNCE objective. Metric: held-out temporal-triplet accuracy = P(d(anchor, temporal-positive) < d(anchor, temporal-negative)), evaluated on the last 20% of the recording by time (no leakage).
| Encoder |
Held-out temporal-triplet acc |
| Raw 8-dim features |
66.4% |
| Random-init encoder |
69.6% |
| v2 trained encoder |
82.3% (+15.9 pts over raw, properly converged) |
- 4-bit packed encoder + fp16 standardizer = 4.56 KB (fits the 8 KB ESP32 SRAM budget).
- Encoder weights SHA-256:
3b37bca66e6050c50ccbc0f6e0501824f258bfdd8675dc0f4541b1e2e96feecd
- Repro:
python aether-arena/staging/train_csi_embed.py; data data/recordings/overnight-1775217646.csi.jsonl (6,063 feature frames).
Honest scope
One room, one capture, two nodes. The triplet metric measures embedding quality, not downstream presence/vitals accuracy (needs multi-class, multi-room labelled data we don't have yet for this 2.4 GHz feature). For pose SOTA on a public benchmark see the separate 5 GHz model ruvnet/wifi-densepose-mmfi-pose (82.69% torso-PCK@20 on MM-Fi), tracked in #880.
ESP32 8KB CSI embedding — v2 honest re-benchmark + converged encoder
🤗 https://huggingface.co/ruvnet/wifi-densepose-pretrained (v2 files added; v1 kept for compatibility)
What was wrong with v1
0.13517every epoch) — the optimizer wasn't learning.v2 fix (honest, label-free, time-disjoint)
Retrained the same
8→64→128encoder with a working InfoNCE objective. Metric: held-out temporal-triplet accuracy = P(d(anchor, temporal-positive) < d(anchor, temporal-negative)), evaluated on the last 20% of the recording by time (no leakage).3b37bca66e6050c50ccbc0f6e0501824f258bfdd8675dc0f4541b1e2e96feecdpython aether-arena/staging/train_csi_embed.py; datadata/recordings/overnight-1775217646.csi.jsonl(6,063 feature frames).Honest scope
One room, one capture, two nodes. The triplet metric measures embedding quality, not downstream presence/vitals accuracy (needs multi-class, multi-room labelled data we don't have yet for this 2.4 GHz feature). For pose SOTA on a public benchmark see the separate 5 GHz model
ruvnet/wifi-densepose-mmfi-pose(82.69% torso-PCK@20 on MM-Fi), tracked in #880.