Skip to content

Fix sparse CG cache validation (root cause of test flakiness)#45

Merged
johmathe merged 1 commit into
mainfrom
johmathe/fix-sparse-cache-validation
May 12, 2026
Merged

Fix sparse CG cache validation (root cause of test flakiness)#45
johmathe merged 1 commit into
mainfrom
johmathe/fix-sparse-cache-validation

Conversation

@johmathe
Copy link
Copy Markdown
Collaborator

Summary

  • The sparse CG disk cache (~/.cache/bispectrum/sparse_cg_lmax*_selective.pt) only checked len(entry_meta) on load, not the actual triple content. When _build_selective_index_map ordering changed between code versions, stale caches loaded silently with misaligned (l1, l2, l) → wrong output assignments
  • This was the actual root cause of test_sparse_matches_full_bispectrum failures — NOT floating-point non-associativity (max diff is ~1.6e-8 in float32, well within 5e-7)
  • Reverts the torch.randperm seeding and tolerance loosening from Fix flaky sparse parity test and drop cu128 index #44 which were treating symptoms

Changes

  • src/bispectrum/so3_on_s2.py: validate entry_meta content (not just length) against expected triples; revert randperm seeding
  • tests/test_so3_on_s2.py: restore tight tolerance (atol=5e-7, rtol=1e-5)

Test plan

  • pytest tests/test_so3_on_s2.py — 81 passed, 22 skipped (after clearing stale caches)
  • Verified max diff at float32 is ~1.6e-8, at float64 is ~1.8e-17

Made with Cursor

The sparse CG cache only validated entry count, not content. When the
index map ordering changed between code versions, stale caches were
silently loaded with misaligned (l1, l2, l) triples, producing wrong
output assignments — the root cause of test_sparse_matches_full failures.

- Validate full entry_meta content against expected triples on cache load
- Revert unnecessary randperm seeding (was masking the real issue)
- Restore tight test tolerance (5e-7 atol) now that the root cause is fixed

Co-authored-by: Cursor <cursoragent@cursor.com>
@johmathe johmathe merged commit e2203c5 into main May 12, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant