Skip to content

Conversation

@johnchandlerburnham
Copy link
Member

Major rewrite of the Ixon serialization format optimized for space efficiency.
Now Ix + Lean in Lean environment compiles in 10.21 seconds with
serialized size of 1.6 GB

Key changes:

  • Split ixon module into separate files (constant, env, expr, univ, metadata, serialize, sharing, tag, comm, error)
  • New tag-based encoding (Tag0, Tag2, Tag4) for compact representation
  • Refs table: deduplicates constant references using indices instead of inline 32-byte addresses
  • Univs table: deduplicates universe terms per constant
  • Global name indexing: shares names across entire environment, metadata uses u64 indices instead of 32-byte addresses
  • Sharing analysis: identifies shared subexpressions within constants
  • Pre-order traversal indexing for expression metadata

The old ixon module preserved as ixon_old.rs for migration reference.

  Major rewrite of the Ixon serialization format optimized for space efficiency.
  Now Ix + Lean in Lean environment compiles in 10.21 seconds with
  serialized size of 1.6 GB

  Key changes:
  - Split ixon module into separate files (constant, env, expr, univ, metadata,
    serialize, sharing, tag, comm, error)
  - New tag-based encoding (Tag0, Tag2, Tag4) for compact representation
  - Refs table: deduplicates constant references using indices instead of
    inline 32-byte addresses
  - Univs table: deduplicates universe terms per constant
  - Global name indexing: shares names across entire environment, metadata
    uses u64 indices instead of 32-byte addresses
  - Sharing analysis: identifies shared subexpressions within constants
  - Pre-order traversal indexing for expression metadata

  The old ixon module preserved as ixon_old.rs for migration reference.
@johnchandlerburnham johnchandlerburnham changed the title WIP: Ixon serialization rewrite with 85% size reduction WIP: Ixon serialization rewrite with large speed and size reduction Jan 14, 2026
  Main fix: build_sharing_vec now re-sorts shared hashes in topological order
  (leaves first) instead of using gross-benefit order from decide_sharing.
  Previously, large compound expressions came first and couldn't reference
  their children via Share(idx), causing each entry to inline entire subtrees
  (30-35KB per entry instead of 3-13 bytes). This caused 1140x blow-up for
  pathological blocks like denote_blastDivSubtractShift_q.

  Also fixes hash-consed size calculation to include structural overhead for
  Inductive/Constructor/Recursor data structures, not just expressions. This
  makes the comparison fair - previously simple enums like SymbolKind showed
  Ixon as worse than hash-consing (1.91x) when it's actually 21x better.

  Results after fix:
  - Overall ratio: 0.071x (Ixon is 14x more efficient than hash-consing)
  - 0 blocks with ratio >= 1.0x (previously 4)
  - 11,722 blocks with 20x+ compression
  - Total serialized: 159MB vs 2.24GB hash-consed

  Other changes:
  - Remove early-break bug in decide_sharing that skipped profitable candidates
  - Add TRACK_HASH_CONSED_SIZE and ANALYZE_SHARING debug flags
  - Add analyze_sharing_stats() for debugging pathological cases
  - Add test_early_break_bug unit test
  - Add granular ratio distribution buckets in size analysis output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants