Splitting edges backwards in time #3354
Replies: 2 comments
-
|
I think you want to look at the algorithms for "bricking" a tree sequence, that Wilder developed: #1345 . Confusingly this was originally thought of as "split edges" but the current split_edges method is a different thing. Note that this does not create quite so many edges as Hanbin's implementation, because in a bricked tree sequence the tree topology can change under an edge, as long as the sample sets remain the same. This is usually enough for most algorithms, I think. There was discussion of this in one of the Wed talks recently (to do with "linearised ARGs") |
Beta Was this translation helpful? Give feedback.
-
|
On tree sequence construction, could we construct a boolean array that indicates whether each edge has the same set of samples underneath it? When we run the bricking algorithm, this would imply an array of ones. I suspect the calculation of this Boolean array could be expensive, however. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
@hanbin973 has implemented an algorithm to split edges backwards in time. See https://github.com/hanbin973/tslmm/blob/b066a503c4e2a69d1e8c3255218b11c72930171e/tslmm/operations.py#L220 and.
These implementations were initially deemed necessary for
tslmm, but this kind of splitting quickly creates a huge number of edges, which increases memory usage and work needed with such tree sequences (this makes sense as a large portion of the tree sequence structure is lost through splitting). Importantly, this operation can be avoided if a bespoke algorithm is developed instead, such as the matvec described in https://doi.org/10.1093/genetics/iyaf219 and used in https://doi.org/10.1101/2025.07.14.664631.The need for this kind of splitting has come up a couple times as mentioned in hanbin973/tslmm#27, but more work is needed to port the algorithm to C to make it usable for large tree sequences (numba implementation at https://github.com/hanbin973/tslmm/blob/b066a503c4e2a69d1e8c3255218b11c72930171e/tslmm/operations.py#L106 wasn't faster than Python version).
This discussion post is to make people aware of this implementation and the discussion in the issue should anyone come to this in future.
Related to:
unsquashOpposite of EdgeTable.squash #2657split_edgeshttps://github.com/search?q=repo%3Atskit-dev%2Ftskit+split_edges&type=code (can't seem to find implementation)split_polytomiestskit/python/tskit/combinatorics.py
Line 316 in 25a26a7
Beta Was this translation helpful? Give feedback.
All reactions