Abstract
We consider the problem of distance estimation under the TKF91 model of sequence evolution by insertions, deletions and substitutions on a phylogeny. In an asymptotic regime where the expected sequence lengths tend to infinity, we show that no consistent distance estimation is possible from sequence lengths alone. More formally, we establish that the distributions of pairs of sequence lengths at different distances cannot be distinguished with probability going to one.
Similar content being viewed by others
References
Allman ES, Rhodes JA, Sullivant S (2015) Statistically consistent k-mer methods for phylogenetic tree reconstruction. J Comput Biol J Comput Mol Cell Biol 24(2):153–171
Daskalakis C, Roch S (2013) Alignment-free phylogenetic reconstruction: sample complexity via a branching process analysis. Ann Appl Probab 23(2):693–721
Durrett R (2010) Probability theory and examples, 4th edn. Cambridge series in statistical and probabilistic mathematics. Cambridge University Press, Cambridge
Fan W-TL, Roch S (2020) Statistically consistent and computationally efficient inference of ancestral dna sequences in the TKF91 model under dense taxon sampling. Bull Math Biol 82(2):21
Haubold B (2013) Alignment-free phylogenetics and population genetics. Briefings Bioinf 15(3):407–418
Mike S (2016) Phylogeny—discrete and random processes in evolution. In: CBMS-NSF regional conference series in applied mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA
Thatte BD (2006) Invertibility of the TKF model of sequence evolution. Math Biosci 200(1):58–75
Thorne JL, Kishino H, Felsentein J (1991) An evolutionary model for maximum likelihood alignment of DNA sequences. J Mol Evol 33(2):114–124
Thorne JL, Kishino H, Felsenstein J (1992) Inching toward reality: an improved likelihood model of sequence evolution. J Mol Evol 34(1):3–16
Warnow T (2017) Computational phylogenetics: an introduction to designing methods for phylogeny estimation, 1st edn. Cambridge University Press, Cambridge, USA
Yang K, Zhang L (2008) Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction. Nucleic Acids Res 36(5):e33–e33
Acknowledgements
SR was supported by NSF grants DMS-1614242, CCF-1740707 (TRIPODS), DMS-1902892, and DMS-1916378, as well as a Simons Fellowship and a Vilas Associates Award. BL was supported by DMS-1614242, CCF-1740707 (TRIPODS), DMS-1902892 (to SR). WTF was supported by NSF grants DMS-1614242 (to SR) and DMS-1855417, and ONR-TCRI N00014-20-1-2411.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fan, WT.L., Legried, B. & Roch, S. Impossibility of Consistent Distance Estimation from Sequence Lengths Under the TKF91 Model. Bull Math Biol 82, 123 (2020). https://doi.org/10.1007/s11538-020-00801-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11538-020-00801-3