当前位置: X-MOL 学术Mol. Biol. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep Residual Neural Networks Resolve Quartet Molecular Phylogenies.
Molecular Biology and Evolution ( IF 11.0 ) Pub Date : 2020-05-01 , DOI: 10.1093/molbev/msz307
Zhengting Zou 1 , Hongjiu Zhang 2 , Yuanfang Guan 2, 3 , Jianzhi Zhang 1
Affiliation  

Phylogenetic inference is of fundamental importance to evolutionary as well as other fields of biology, and molecular sequences have emerged as the primary data for this task. Although many phylogenetic methods have been developed to explicitly take into account substitution models of sequence evolution, such methods could fail due to model misspecification or insufficiency, especially in the face of heterogeneities in substitution processes across sites and among lineages. In this study, we propose to infer topologies of four-taxon trees using deep residual neural networks, a machine learning approach needing no explicit modeling of the subject system and having a record of success in solving complex nonlinear inference problems. We train residual networks on simulated protein sequence data with extensive amino acid substitution heterogeneities. We show that the well-trained residual network predictors can outperform existing state-of-the-art inference methods such as the maximum likelihood method on diverse simulated test data, especially under extensive substitution heterogeneities. Reassuringly, residual network predictors generally agree with existing methods in the trees inferred from real phylogenetic data with known or widely believed topologies. Furthermore, when combined with the quartet puzzling algorithm, residual network predictors can be used to reconstruct trees with more than four taxa. We conclude that deep learning represents a powerful new approach to phylogenetic reconstruction, especially when sequences evolve via heterogeneous substitution processes. We present our best trained predictor in a freely available program named Phylogenetics by Deep Learning (PhyDL, https://gitlab.com/ztzou/phydl; last accessed January 3, 2020).

中文翻译:


深层残差神经网络解决了四重分子系统发育问题。



系统发育推断对于进化以及生物学的其他领域至关重要,分子序列已成为这项任务的主要数据。尽管已经开发了许多系统发育方法来明确考虑序列进化的替换模型,但这些方法可能会由于模型错误指定或不足而失败,特别是面对跨位点和谱系之间的替换过程的异质性。在这项研究中,我们建议使用深度残差神经网络来推断四分类树的拓扑,这是一种机器学习方法,不需要对主题系统进行显式建模,并且在解决复杂的非线性推理问题方面拥有成功的记录。我们在具有广泛氨基酸取代异质性的模拟蛋白质序列数据上训练残差网络。我们表明,训练有素的残差网络预测器可以胜过现有的最先进的推理方法,例如针对不同模拟测试数据的最大似然方法,特别是在广泛的替代异质性下。令人放心的是,残差网络预测器通常与从具有已知或广泛相信的拓扑的真实系统发育数据推断出的树中的现有方法一致。此外,当与四重组谜题算法结合时,残差网络预测器可用于重建具有四个以上分类单元的树。我们得出的结论是,深度学习代表了一种强大的系统发育重建新方法,特别是当序列通过异质替换过程进化时。我们在一个名为 Phylogenesis by Deep Learning 的免费程序中展示了训练有素的预测器(PhyDL,https://gitlab.com/ztzou/phydl;最后访问时间为 2020 年 1 月 3 日)。
更新日期:2019-12-23
down
wechat
bug