A consistent least-squares criterion for calibrating edge lengths in phylogenetic networks
Authors
Jingcheng Xu, Cécile Ané
Categories
Abstract
In phylogenetic networks, it is desirable to estimate edge lengths in substitutions per site or calendar time. Yet, there is a lack of scalable methods that provide such estimates. Here we consider the problem of obtaining edge length estimates from genetic distances, in the presence of rate variation across genes and lineages, when the network topology is known. We propose a novel criterion based on least-squares that is both consistent and computationally tractable. The crux of our approach is to decompose the genetic distances into two parts, one of which is invariant across displayed trees of the network. The scaled genetic distances are then fitted to the invariant part, while the average scaled genetic distances are fitted to the non-invariant part. We show that this criterion is consistent provided that there exists a tree path between some pair of tips in the network, and that edge lengths in the network are identifiable from average distances. We also provide a constrained variant of this criterion assuming a molecular clock, which can be used to obtain relative edge lengths in calendar time.
A consistent least-squares criterion for calibrating edge lengths in phylogenetic networks
Categories
Abstract
In phylogenetic networks, it is desirable to estimate edge lengths in substitutions per site or calendar time. Yet, there is a lack of scalable methods that provide such estimates. Here we consider the problem of obtaining edge length estimates from genetic distances, in the presence of rate variation across genes and lineages, when the network topology is known. We propose a novel criterion based on least-squares that is both consistent and computationally tractable. The crux of our approach is to decompose the genetic distances into two parts, one of which is invariant across displayed trees of the network. The scaled genetic distances are then fitted to the invariant part, while the average scaled genetic distances are fitted to the non-invariant part. We show that this criterion is consistent provided that there exists a tree path between some pair of tips in the network, and that edge lengths in the network are identifiable from average distances. We also provide a constrained variant of this criterion assuming a molecular clock, which can be used to obtain relative edge lengths in calendar time.
Authors
Jingcheng Xu, Cécile Ané
Click to preview the PDF directly in your browser