Splitting edges backwards in time #3354

gregorgorjanc · 2025-12-20T04:20:07Z

gregorgorjanc
Dec 20, 2025
Collaborator

@hanbin973 has implemented an algorithm to split edges backwards in time. See https://github.com/hanbin973/tslmm/blob/b066a503c4e2a69d1e8c3255218b11c72930171e/tslmm/operations.py#L220 and.

These implementations were initially deemed necessary for tslmm, but this kind of splitting quickly creates a huge number of edges, which increases memory usage and work needed with such tree sequences (this makes sense as a large portion of the tree sequence structure is lost through splitting). Importantly, this operation can be avoided if a bespoke algorithm is developed instead, such as the matvec described in https://doi.org/10.1093/genetics/iyaf219 and used in https://doi.org/10.1101/2025.07.14.664631.

The need for this kind of splitting has come up a couple times as mentioned in hanbin973/tslmm#27, but more work is needed to port the algorithm to C to make it usable for large tree sequences (numba implementation at https://github.com/hanbin973/tslmm/blob/b066a503c4e2a69d1e8c3255218b11c72930171e/tslmm/operations.py#L106 wasn't faster than Python version).

This discussion post is to make people aware of this implementation and the discussion in the issue should anyone come to this in future.

Related to:

unsquash Opposite of EdgeTable.squash #2657
split_edges https://github.com/search?q=repo%3Atskit-dev%2Ftskit+split_edges&type=code (can't seem to find implementation)
split_polytomies

tskit/python/tskit/combinatorics.py

Line 316 in 25a26a7

def split_polytomies(

hyanwong · 2025-12-20T12:01:24Z

hyanwong
Dec 20, 2025
Maintainer

I think you want to look at the algorithms for "bricking" a tree sequence, that Wilder developed: #1345 . Confusingly this was originally thought of as "split edges" but the current split_edges method is a different thing.

Note that this does not create quite so many edges as Hanbin's implementation, because in a bricked tree sequence the tree topology can change under an edge, as long as the sample sets remain the same. This is usually enough for most algorithms, I think. There was discussion of this in one of the Wed talks recently (to do with "linearised ARGs")

0 replies

hyanwong · 2025-12-20T12:32:00Z

hyanwong
Dec 20, 2025
Maintainer

On tree sequence construction, could we construct a boolean array that indicates whether each edge has the same set of samples underneath it? When we run the bricking algorithm, this would imply an array of ones.

I suspect the calculation of this Boolean array could be expensive, however.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Splitting edges backwards in time #3354

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Splitting edges backwards in time #3354

Uh oh!

gregorgorjanc Dec 20, 2025 Collaborator

Replies: 2 comments

Uh oh!

Uh oh!

hyanwong Dec 20, 2025 Maintainer

Uh oh!

Uh oh!

hyanwong Dec 20, 2025 Maintainer

gregorgorjanc
Dec 20, 2025
Collaborator

hyanwong
Dec 20, 2025
Maintainer

hyanwong
Dec 20, 2025
Maintainer