-
Notifications
You must be signed in to change notification settings - Fork 79
Description
I think the documentation of migrations could use a bit more detail here: migration table section of the docs. Specifically, the docs read:
Migrations are performed by individual ancestors, but most likely not by an individual whose genome is tracked as a node (as in a discrete-deme model they are unlikely to be both a migrant and a most recent common ancestor). So, tskit records when a segment of ancestry has moved between populations.
and
The node column records the ID of the node that was associated with the ancestry segment in question at the time of the migration event.
I think a clearer way of explaining this is that migrations chart how nodes are associated with a population. Specifically, the migration node refers to the child node of the edge where the migration occurred, so a migration from source population x to destination population y at a given time, node, and left/right coordinate means that at the edge (or edges) denoted by the node and coordinates, ancestors younger than the time of the migration on the relevant edges belong to population x and the older ancestors along the relevant edges belong to population y (or at least until an intervening migration occurs). Furthermore, all the older ancestors of the parent node of the edge which exist between the left and right coordinates also belong to population y (until they are affected by an older migration) and all descendants of this edge belong to population x over the left/right coordinates (until they are affected by a more recent migration).
Here's an example of when this is important: if you wanted to know which tracts of ancestry (note this does not necessarily correspond to the haplotypes since it doesn't depend on variant sites) in modern samples are the result of a historic migration, we would look at the migration node and the marginal trees existing between the left and right coordinates, and then find the relevant leaf nodes. This will give the "ancestry segments" carried by samples which are the result of migrations in the absence of intervening migrations. Note that these left/right segments do not always correspond to the breakpoints between edges, which I found surprising at first.
If I have all that right, then I think we should clarify a few thing: (1) migrations explain how ancestral nodes how/why ancestral nodes have a population, (2) that the migration node is the child node of the edge where the migration occurred, (3) that, barring multiple migrations on an edge, the child node of the edge belongs to source population and the parent node belongs to the destination population.
If others agree, I'll make a PR with to document this a bit more clearly.