Conversation
|
I think a first general design remark is that now all the manipulations on the trees_src = fusiontrees(fs_src)
trees_dst = fusiontrees(fs_dst)
indexmap = Dict(f => ind for (ind, f) in enumerate(trees_dst))If you know start composing elementary operations to make more complex ones (e.g. It might thus be a better idea to store the blocksectors and associated fusion tree pairs (and there index mapping) within the Of course, this way, the Happy to discuss when you have time. |
4e40fc3 to
50db028
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #261 +/- ##
==========================================
- Coverage 82.85% 82.73% -0.13%
==========================================
Files 44 45 +1
Lines 5757 6139 +382
==========================================
+ Hits 4770 5079 +309
- Misses 987 1060 +73 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
So I compared the performances with TensorKit v0.14.9 |
1f4c68f to
ecbdaa6
Compare
|
As a small update here, I think I further optimized the implementations slightly, however there definitely is still some room for improvement. In particular the I'll try and run some profilers next week to see what is now actually the bottleneck, to verify what needs further improvements. In particular, I can try and reduce some allocations by trying to re-use a bunch of the unitary matrices, or we can try and cache the |
|
BTW does @ogauthe have a MWE of what's causing the slowdown that could be shared, perhaps to run as part of a perf regression testsuite 👀 ? |
using TensorKit
using TensorKit: treepermuter
Vsrc = (Rep[ℤ₂ × SU₂]((0, 0) => 1, (0, 1) => 1, (1, 1) => 1) ⊗ Rep[ℤ₂ × SU₂]((0, 0) => 1, (0, 1) => 1, (1, 1) => 1) ⊗ Rep[ℤ₂ × SU₂]((0, 0) => 1, (0, 1) => 1, (1, 1) => 1)' ⊗ Rep[ℤ₂ × SU₂]((0, 0) => 1, (0, 1) => 1, (1, 1) => 1)' ⊗ Rep[ℤ₂ × SU₂]((0, 0) => 1, (1, 1) => 1)) ← (Rep[ℤ₂ × SU₂]((0, 0) => 1, (1, 1) => 1) ⊗ Rep[ℤ₂ × SU₂]((0, 0) => 1, (0, 1) => 1, (1, 1) => 1) ⊗ Rep[ℤ₂ × SU₂]((0, 0) => 1, (0, 1) => 1, (1, 1) => 1) ⊗ Rep[ℤ₂ × SU₂]((0, 0) => 1, (0, 1) => 1, (1, 1) => 1)' ⊗ Rep[ℤ₂ × SU₂]((0, 0) => 1, (0, 1) => 1, (1, 1) => 1)')
Vdst = (Rep[ℤ₂ × SU₂]((0, 0) => 1, (1, 1) => 1) ⊗ Rep[ℤ₂ × SU₂]((0, 0) => 1, (1, 1) => 1)') ← (Rep[ℤ₂ × SU₂]((0, 0) => 1, (0, 1) => 1, (1, 1) => 1)' ⊗ Rep[ℤ₂ × SU₂]((0, 0) => 1, (0, 1) => 1, (1, 1) => 1) ⊗ Rep[ℤ₂ × SU₂]((0, 0) => 1, (0, 1) => 1, (1, 1) => 1)' ⊗ Rep[ℤ₂ × SU₂]((0, 0) => 1, (0, 1) => 1, (1, 1) => 1) ⊗ Rep[ℤ₂ × SU₂]((0, 0) => 1, (0, 1) => 1, (1, 1) => 1) ⊗ Rep[ℤ₂ × SU₂]((0, 0) => 1, (0, 1) => 1, (1, 1) => 1)' ⊗ Rep[ℤ₂ × SU₂]((0, 0) => 1, (0, 1) => 1, (1, 1) => 1) ⊗ Rep[ℤ₂ × SU₂]((0, 0) => 1, (0, 1) => 1, (1, 1) => 1)')
p = ((5, 6), (1, 7, 2, 8, 3, 9, 4, 10))
TensorKit.empty_globalcaches!()
@time treepermuter(Vdst, Vsrc, p);
Profile.clear()
TensorKit.empty_globalcaches!()
Profile.init(n = 10^7, delay = 0.01)
@profile treepermuter(Vdst, Vsrc, p);Here's a relevant case I took from the code at some point. It's really just a matter of taking one of the peps contractions and looking for the dominant treepermuter cost, which you can typically identify by adding |
|
@kshyatt here is a slightly larger example. @lkdvos minimal case should appear at some point. The timing was done using this branch as of August 15th. using TensorOperations: @tensor
using TensorKit
function proj_braket(rD)
fs = map(s -> sign(frobeniusschur(s)), sectors(rD))
!(all(fs .== 1) || all(fs .== -1)) && return error("Cannot handle quaternionic irreps")
rD2 = fuse(rD, rD')
isoD2 = isomorphism(rD2, rD ⊗ rD')
permuted_iso = flip(permute(isoD2', ((3,), (2, 1))), (1,))
proj2_even_fullspace = isoD2 + permuted_iso
proj2_odd_fullspace = isoD2 - permuted_iso
proj2_even = rightnull(proj2_odd_fullspace; alg=SVD(), atol=1e-12)
proj2_odd = rightnull(proj2_even_fullspace; alg=SVD(), atol=1e-12)
return proj2_even, proj2_odd
end
function project_Z2(rd, rD)
tket = randn(rd ← rD ⊗ rD ⊗ rD' ⊗ rD')
pd_even_oddd = proj_braket(rd)
projN = map(t -> permute(t, ((2, 3), (1,))), proj_braket(rD))
projE = map(t -> permute(t, ((2, 3), (1,))), proj_braket(rD))
projS = map(t -> permute(t, ((2, 3), (1,))), proj_braket(rD'))
projW = map(t -> permute(t, ((2, 3), (1,))), proj_braket(rD'))
t_double = permute(tket' ⊗ tket, ((5, 6), (1, 7, 2, 8, 3, 9, 4, 10)))
for i_d in 1:2
projectedd = permute(pd_even_oddd[i_d] * t_double, ((1, 4, 5, 6, 7, 8, 9), (2, 3)))
for iN in 1:2
@tensor projectedN[d2, D2N, ketW, braW, ketS, braS; ketE, braE] :=
projectedd[d2, ketE, braE, ketS, braS, ketW, braW; ketN, braN] *
projN[iN][ketN, braN, D2N]
for iE in 1:2
@tensor projectedNE[d2, D2N, D2E, ketW, braW; ketS, braS] :=
projectedN[d2, D2N, ketS, braS, ketW, braW; ketE, braE] *
projE[iE][ketE, braE, D2E]
for iS in 1:2
iW = mod1(i_d + iN + iE + iS + 1, 2)
@tensor projected[d2, D2N, D2E, D2S, D2W] :=
projectedNE[d2, D2N, D2E, ketW, braW; ketS, braS] *
projS[iS][ketS, braS, D2S] *
projW[iW][ketW, braW, D2W]
end
end
end
end
return 1
end
rd = Rep[ℤ₂ × SU₂]((0, 0) => 1, (1, 1) => 1)
rD7 = Rep[ℤ₂ × SU₂]((0, 0)=>1, (0, 1)=>1, (1, 1)=>1)
rD11 = Rep[ℤ₂ × SU₂]((0, 0) => 2, (0, 1) => 1, (1, 1) => 2)
rD16 = Rep[ℤ₂ × SU₂]((0, 0) => 2, (0, 1) => 1, (1, 1) => 2, (0, 2) => 1)
@time project_Z2(rd, rD7)
@time project_Z2(rd, rD7)
@time project_Z2(rd, rD11)
@time project_Z2(rd, rD11)
# 1st run D = 7
@time project_Z2(rd, rD7)
# 400.344116 seconds (1.39 G allocations: 465.458 GiB, 7.96% gc time, 24.47% compilation time)
# 2nd run D = 7
@time project_Z2(rd, rD7)
#5.408577 seconds (20.24 M allocations: 2.738 GiB, 5.07% gc time, 5.34% compilation time)
# 1st run D = 11 (*after D = 7*)
@time project_Z2(rd, rD11)
# 319.394331 seconds (1.36 G allocations: 482.086 GiB, 8.63% gc time, 0.00% compilation time)
# 2nd run D = 11
@time project_Z2(rd, rD11)
# 7.989900 seconds (20.06 M allocations: 4.710 GiB, 4.31% gc time) |
|
I have been using this branch for expensive computations with using TensorKit
using SUNRepresentations: SU3Irrep
vd = Vect[SU3Irrep]((1, 0, 0) => 1)
h = TensorMap([0.0, 2.0], vd ⊗ vd ← vd ⊗ vd)
permute(h, (1, 2, 3), (4,))ERROR: LoadError: KeyError: key (FusionTree{Irrep[SU₃]}(((1, 0, 0), (1, 0, 0), (1, 1, 0)), (1, 0, 0), (false, false, true), ((1, 1, 0),), (1, 1)), FusionTree{Irrep[SU₃]}(((1, 0, 0),), (1, 0, 0), (false,), (), ())) not found
Stacktrace:
[1] getindex(h::Dict{Tuple{Tuple{SU3Irrep, SU3Irrep}, Tuple{Int64, Int64}}, Int64}, key::Tuple{FusionTree{SU3Irrep, 3, 1, 2}, FusionTree{SU3Irrep, 1, 0, 0}})
@ Base ./dict.jl:498
[2] bendright(src::TensorKit.FusionTreeBlock{SU3Irrep, 4, 0, Tuple{FusionTree{SU3Irrep, 4, 2, 3}, FusionTree{SU3Irrep, 0, 0, 0}}})
@ TensorKit ~/.julia/packages/TensorKit/Y7XJ1/src/fusiontrees/fusiontreeblocks.jl:138
[3] macro expansion
@ ~/.julia/packages/TensorKit/Y7XJ1/src/fusiontrees/fusiontreeblocks.jl:397 [inlined]
[4] repartition
@ ~/.julia/packages/TensorKit/Y7XJ1/src/fusiontrees/fusiontreeblocks.jl:379 [inlined]
[5] repartition
@ ~/.julia/packages/TensorKit/Y7XJ1/src/fusiontrees/fusiontreeblocks.jl:364 [inlined]
[6] __fsbraid(key::Tuple{TensorKit.FusionTreeBlock{SU3Irrep, 2, 2, Tuple{FusionTree{SU3Irrep, 2, 0, 1}, FusionTree{SU3Irrep, 2, 0, 1}}}, Tuple{Tuple{Int64, Int64, Int64}, Tuple{Int64}}, Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}}})
@ TensorKit ~/.julia/packages/TensorKit/Y7XJ1/src/fusiontrees/fusiontreeblocks.jl:658
[7] #54
@ ~/.julia/packages/TensorKit/Y7XJ1/src/auxiliary/caches.jl:108 [inlined]
[8] get!(default::TensorKit.var"#54#59"{Tuple{TensorKit.FusionTreeBlock{SU3Irrep, 2, 2, Tuple{FusionTree{SU3Irrep, 2, 0, 1}, FusionTree{SU3Irrep, 2, 0, 1}}}, Tuple{Tuple{Int64, Int64, Int64}, Tuple{Int64}}, Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}}}}, lru::LRUCache.LRU{Any, Any}, key::Tuple{TensorKit.FusionTreeBlock{SU3Irrep, 2, 2, Tuple{FusionTree{SU3Irrep, 2, 0, 1}, FusionTree{SU3Irrep, 2, 0, 1}}}, Tuple{Tuple{Int64, Int64, Int64}, Tuple{Int64}}, Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}}})
@ LRUCache ~/.julia/packages/LRUCache/ZH7qB/src/LRUCache.jl:169
[9] _fsbraid(key::Tuple{TensorKit.FusionTreeBlock{SU3Irrep, 2, 2, Tuple{FusionTree{SU3Irrep, 2, 0, 1}, FusionTree{SU3Irrep, 2, 0, 1}}}, Tuple{Tuple{Int64, Int64, Int64}, Tuple{Int64}}, Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}}}, ::TensorKit.GlobalLRUCache)
@ TensorKit ./reduce.jl:0
[10] _fsbraid
@ ./none:0 [inlined]
[11] braid
@ ~/.julia/packages/TensorKit/Y7XJ1/src/fusiontrees/fusiontreeblocks.jl:618 [inlined]
[12] permute
@ ~/.julia/packages/TensorKit/Y7XJ1/src/fusiontrees/fusiontreeblocks.jl:676 [inlined]
[13] fusiontreetransform
@ ~/.julia/packages/TensorKit/Y7XJ1/src/tensors/treetransformers.jl:228 [inlined]
[14] TensorKit.GenericTreeTransformer(transform::TensorKit.var"#fusiontreetransform#259"{Tuple{Tuple{Int64, Int64, Int64}, Tuple{Int64}}}, p::Tuple{Tuple{Int64, Int64, Int64}, Tuple{Int64}}, Vdst::TensorMapSpace{GradedSpace{SU3Irrep, TensorKit.SortedVectorDict{SU3Irrep, Int64}}, 3, 1}, Vsrc::TensorMapSpace{GradedSpace{SU3Irrep, TensorKit.SortedVectorDict{SU3Irrep, Int64}}, 2, 2})
@ TensorKit ~/.julia/packages/TensorKit/Y7XJ1/src/tensors/treetransformers.jl:130
[15] TensorKit.TreeTransformer(transform::Function, p::Tuple{Tuple{Int64, Int64, Int64}, Tuple{Int64}}, Vdst::TensorMapSpace{GradedSpace{SU3Irrep, TensorKit.SortedVectorDict{SU3Irrep, Int64}}, 3, 1}, Vsrc::TensorMapSpace{GradedSpace{SU3Irrep, TensorKit.SortedVectorDict{SU3Irrep, Int64}}, 2, 2})
@ TensorKit ~/.julia/packages/TensorKit/Y7XJ1/src/tensors/treetransformers.jl:199
[16] _treepermuter
@ ~/.julia/packages/TensorKit/Y7XJ1/src/tensors/treetransformers.jl:229 [inlined]
[17] #257
@ ~/.julia/packages/TensorKit/Y7XJ1/src/auxiliary/caches.jl:108 [inlined]
[18] get!(default::TensorKit.var"#257#262"{TensorMapSpace{GradedSpace{SU3Irrep, TensorKit.SortedVectorDict{SU3Irrep, Int64}}, 3, 1}, TensorMapSpace{GradedSpace{SU3Irrep, TensorKit.SortedVectorDict{SU3Irrep, Int64}}, 2, 2}, Tuple{Tuple{Int64, Int64, Int64}, Tuple{Int64}}}, lru::LRUCache.LRU{Any, Any}, key::Tuple{TensorMapSpace{GradedSpace{SU3Irrep, TensorKit.SortedVectorDict{SU3Irrep, Int64}}, 3, 1}, TensorMapSpace{GradedSpace{SU3Irrep, TensorKit.SortedVectorDict{SU3Irrep, Int64}}, 2, 2}, Tuple{Tuple{Int64, Int64, Int64}, Tuple{Int64}}})
@ LRUCache ~/.julia/packages/LRUCache/ZH7qB/src/LRUCache.jl:169
[19] treepermuter(Vdst::TensorMapSpace{GradedSpace{SU3Irrep, TensorKit.SortedVectorDict{SU3Irrep, Int64}}, 3, 1}, Vsrc::TensorMapSpace{GradedSpace{SU3Irrep, TensorKit.SortedVectorDict{SU3Irrep, Int64}}, 2, 2}, p::Tuple{Tuple{Int64, Int64, Int64}, Tuple{Int64}}, ::TensorKit.GlobalLRUCache)
@ TensorKit ./reduce.jl:0
[20] treepermuter
@ ./none:0 [inlined]
[21] treepermuter
@ ~/.julia/packages/TensorKit/Y7XJ1/src/tensors/treetransformers.jl:224 [inlined]
[22] add_permute!
@ ~/.julia/packages/TensorKit/Y7XJ1/src/tensors/indexmanipulations.jl:405 [inlined]
[23] permute!
@ ~/.julia/packages/TensorKit/Y7XJ1/src/tensors/indexmanipulations.jl:38 [inlined]
[24] permute(t::TensorMap{Float64, GradedSpace{SU3Irrep, TensorKit.SortedVectorDict{SU3Irrep, Int64}}, 2, 2, Vector{Float64}}, ::Tuple{Tuple{Int64, Int64, Int64}, Tuple{Int64}}; copy::Bool)
@ TensorKit ~/.julia/packages/TensorKit/Y7XJ1/src/tensors/indexmanipulations.jl:77
[25] permute
@ ~/.julia/packages/TensorKit/Y7XJ1/src/tensors/indexmanipulations.jl:65 [inlined]
[26] permute(t::TensorMap{Float64, GradedSpace{SU3Irrep, TensorKit.SortedVectorDict{SU3Irrep, Int64}}, 2, 2, Vector{Float64}}, p1::Tuple{Int64, Int64, Int64}, p2::Tuple{Int64}; copy::Bool)
@ TensorKit ./deprecated.jl:105
[27] permute(t::TensorMap{Float64, GradedSpace{SU3Irrep, TensorKit.SortedVectorDict{SU3Irrep, Int64}}, 2, 2, Vector{Float64}}, p1::Tuple{Int64, Int64, Int64}, p2::Tuple{Int64})
@ TensorKit ./deprecated.jl:103
[28] top-level scope
@ ~/Documents/tensorkit/ThermalPEPS.jl/nogit.draft.jl:9
in expression starting at /home/ogauthe/Documents/tensorkit/ThermalPEPS.jl/nogit.draft.jl:9There is no error with the same code with TensorKit release |
|
Ah, these great self-dual sectors... Thanks for the report and MWE, I'll see if I can find out something although it might not be for this week |
8507405 to
fc4deeb
Compare
|
I think at least now this should be working, and it is back in line with the latest main branch so all of the latest and greatest new features should be usable now :) |
|
Your PR no longer requires formatting changes. Thank you for your contribution! |
| conj(Rsymbol(d, c, e) * Fsymbol(d, a, b, e, c′, c)) * Rsymbol(d, a, c′) | ||
| else | ||
| Rsymbol(c, d, e) * conj(Fsymbol(d, a, b, e, c′, c) * Rsymbol(a, d, c′)) |
There was a problem hiding this comment.
Just wanted to mention here that the a and d of the last R-symbols might be switched, as that is consistent with https://quantumkithub.github.io/TensorKit.jl/stable/man/sectors/#Manipulations-on-a-fusion-tree
| function iscyclicpermutation(v1, v2) | ||
| length(v1) == length(v2) || return false | ||
| return iscyclicpermutation(indexin(v1, v2)) | ||
| end |
There was a problem hiding this comment.
Might these two be a good upstream contribution to Permutations.jl?
| #---------------------------------------------- | ||
| # -> rewrite generic fusion tree in basis of fusion trees in standard form | ||
| # -> only depend on Fsymbol | ||
| """ |
There was a problem hiding this comment.
An example might be nice here 🥺
| operation is the inverse of split, in the sense that `f == join(split(f, M)...)` | ||
| holds for all `M` between `0` and `N`, where `N` is the number of uncoupled sectors of `f`. | ||
|
|
||
| See also [`split`](@ref) and [`insertat`](@ref). |
There was a problem hiding this comment.
Another good spot for an example?
| k += 1 | ||
| end | ||
| end | ||
| k > N₃ && throw(ArgumentError("Not a planar trace")) |
There was a problem hiding this comment.
Error message here could be better, maybe at least display k and N?
| """ | ||
| fusionblocks(W::HomSpace) | ||
|
|
||
| Return the fusionblocks corresponding to all valid fusion channels of a given `HomSpace`, |
There was a problem hiding this comment.
Maybe "Return the FusionTreeBlocks"?
| style = FusionStyle(I) | ||
| if use_threaded_transform(tdst, transformer) | ||
| _add_general_kernel_threaded!(tdst, tsrc, p, transformer, α, β, backend...) | ||
| add_kernel_threaded!(style, tdst, tsrc, p, transformer, α, β, backend...) |
There was a problem hiding this comment.
Just a general comment here but we should think how this interacts with the CUDA/AMD stuff, since threads have their own streams associated with them
| r₂ = (p₂..., q₂...) | ||
|
|
||
| # TODO: Is it worth to add multithreading support? | ||
| if FusionStyle(I) isa UniqueFusion |
There was a problem hiding this comment.
Stylistic thing here but this could be refactored into two new functions, dispatching on FusionStyle(I), that might be easier to read?
| [`braid(f::FusionTree{I, N}, levels::NTuple{N, Int}, permutation::NTuple{N, Int})`](@ref braid(f::FusionTree{I, N}, levels::NTuple{N, Int}, p::NTuple{N, Int}) where {I <: Sector, N}) | ||
| [`braid(f::FusionTree{I, N}, p::IndexTuple{N}, levels::IndexTuple{N})`](@ref braid(::FusionTree{I, N}, ::IndexTuple{N}, ::IndexTuple{N}) where {I, N}) | ||
|
|
||
| where the braid is specified as a permutation, such that the new sector at position `i` was originally at position `permutation[i]`, and where every uncoupled sector is also assigned a level or depth. |
There was a problem hiding this comment.
| where the braid is specified as a permutation, such that the new sector at position `i` was originally at position `permutation[i]`, and where every uncoupled sector is also assigned a level or depth. | |
| where the braid is specified as a permutation, such that the new sector at position `i` was originally at position `p[i]`, and where every uncoupled sector is also assigned a level or depth. |
Given the speed-up of the permutations, the bottleneck has now migrated to the actual computation of the treetransformers. (mostly for tensors with many legs again)
@ogauthe has reported cases where the first run takes ~2000seconds, while the second run is ~5seconds.
While I haven't really benchmarked or profiled myself yet (shame on me!), I immediately remembered that we are not really composing the fusion tree manipulations in a very optimal way, where we are effectively manually writing out the matrix multiplications on the one hand, but also the recursive implementation means that we are recomputing a lot of transformations that we would otherwise already have encountered. This never really showed up since these computations are somewhat fast and cached, but for large amount of fusiontrees these loops become prohibitively expensive.
This PR aims to address these issues in a very similar fashion as the previous optimization run: we consider the set of all fusion trees with equal uncoupled charges, and act on these as a block. If my back-of-the-envelope calculations are correct, if there are$\sim N^L$ operations due to the recursive nature, while now this is $\sim L N^3$ .
Nsuch fusion trees and we are doingLsubsteps in the manipulation, we were previously doingThis PR now contains the following changes:
braidhad some inconsistencies in its argument orders,Index2Tupleetc)fusiontensorinstead ofconvert(Array, ::FusionTreePair)to avoid type piracyUniqueFusionfusionstyles to avoid the exponential scaling in number of chained manipulations.TensorMapspecializations, approaching the entire "indexmanipulations" kernels in a more unified way