Skip to content

Reduce cache footprint by decoupling degeneracy-dependent data#387

Open
lkdvos wants to merge 9 commits intomainfrom
ld-caching
Open

Reduce cache footprint by decoupling degeneracy-dependent data#387
lkdvos wants to merge 9 commits intomainfrom
ld-caching

Conversation

@lkdvos
Copy link
Member

@lkdvos lkdvos commented Mar 23, 2026

This PR refactors the internal representation of the fusion tree structure (block layout and sub-block indexing) for TensorMap.
The two main changes are:

  1. Replace the parallel fusiontreelist + fusiontreestructure arrays with a single Dictionaries.Dictionary, that can be efficiently used both through sequential access via the token system, as well as via hashing.
  2. Share the (separately-cached) Indices for the dictionary across spaces that share the same sectors but differ in degeneracies only.

Motivation and context

Previously, FusionBlockStructure stored block layout information using three parallel data structures:

  • fusiontreelist: a Vector of (f₁, f₂) fusion tree pairs (the canonical order)
  • fusiontreeindices: a Dict{(f₁, f₂), Int} for O(1) lookup by key
  • fusiontreestructure: a Vector{StridedStructure} indexed positionally

This split was necessary to support both sequential access (iteration in canonical order) and keyed access (looking up a sub-block by fusion tree pair).
The drawback is redundancy: the tree pairs are stored twice.
Additionally, many operations require multiple indirections that are handled manually throughout the package. (fusiontreeindices → index → fusiontreestructure).

This is precisely what Dictionaries.jl solves, as this is more or less exactly mapped to the internal structure of the Dictionary type.
The gettoken function maps keys to integers, and gettokenvalue then simply uses that integer to index into the vector of values.
This effectively replaces all three structures with a single Dictionary{typeof(((f₁,f₂)), StridedStructure}.

Additionally, fusiontrees(t) / fusiontrees(W) — which enumerate valid fusion tree pairs — benefits from caching, but the cache key is the sector structure of the space (not its degeneracy dimensions, which affect sub-block sizes but not the set of valid trees).
The Indices type from Dictionaries.jl serves as the keyed-ordered set of fusion tree pairs that can be shared across HomSpaces with identical sector structure. (the combination of fusiontreelist and fusiontreeindices from before)


Design Decisions

fusiontreelist is cached by sector structure:

fusiontreelist(W) uses a custom Hashed wrapper to hash/compare HomSpaces only by their sector structure (ignoring degeneracy dimensions).
This solves the issue that the set of valid fusion tree pairs (f₁, f₂) depends only on which sectors appear in each index space and their dualities — not on how many states each sector has.
By caching the fusiontreelist at this coarser level, HomSpaces that share the same sectors but differ in multiplicities can share the same Indices.

File reorganization: tensorstructure.jl

As the HomSpace file was getting somewhat large, I also refactored and split off the functions that construct tensor structure into their own file, included just before abstracttensor.jl.
This code is really about tensor data layout, not about the abstract space itself.


Questions

  • Should we just switch out all dictionaries for Dictionaries.jl-based options, to avoid the mental load of having both and the maintenance of supporting different dictionaries defined within TensorKit?
  • Are there other abstraction points that should get this kind of treatment?

@lkdvos lkdvos linked an issue Mar 23, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fusionblockstructure should reuse data that is degeneracy-independent

1 participant