当前位置: X-MOL 学术Algorithms Mol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Heuristic algorithms for best match graph editing
Algorithms for Molecular Biology ( IF 1 ) Pub Date : 2021-08-17 , DOI: 10.1186/s13015-021-00196-3
David Schaller 1, 2 , Manuela Geiß 3 , Marc Hellmuth 4 , Peter F Stadler 1, 2, 5, 6, 7, 8
Affiliation  

Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics as a representation of the pairwise most closely related genes among multiple species. An arc connects a gene x with a gene y from another species (vertex color) Y whenever it is one of the phylogenetically closest relatives of x. BMGs can be approximated with the help of similarity measures between gene sequences, albeit not without errors. Empirical estimates thus will usually violate the theoretical properties of BMGs. The corresponding graph editing problem can be used to guide error correction for best match data. Since the arc set modification problems for BMGs are NP-complete, efficient heuristics are needed if BMGs are to be used for the practical analysis of biological sequence data. Since BMGs have a characterization in terms of consistency of a certain set of rooted triples (binary trees on three vertices) defined on the set of genes, we consider heuristics that operate on triple sets. As an alternative, we show that there is a close connection to a set partitioning problem that leads to a class of top-down recursive algorithms that are similar to Aho’s supertree algorithm and give rise to BMG editing algorithms that are consistent in the sense that they leave BMGs invariant. Extensive benchmarking shows that community detection algorithms for the partitioning steps perform best for BMG editing. Noisy BMG data can be corrected with sufficient accuracy and efficiency to make BMGs an attractive alternative to classical phylogenetic methods.

中文翻译:

用于最佳匹配图编辑的启发式算法

最佳匹配图 (BMG) 是一类在数学系统发育学中自然出现的彩色有向图,作为多个物种中成对最密切相关基因的表示。弧将基因 x 与来自另一个物种(顶点颜色)Y 的基因 y 连接起来,只要它是 x 的系统发育最接近的亲属之一。可以借助基因序列之间的相似性度量来近似 BMG,尽管并非没有错误。因此,经验估计通常会违反 BMG 的理论特性。相应的图形编辑问题可用于指导最佳匹配数据的纠错。由于 BMG 的弧集修改问题是 NP 完全的,如果要将 BMG 用于生物序列数据的实际分析,则需要有效的启发式方法。由于 BMG 具有在一组基因上定义的特定一组有根三元组(三个顶点上的二叉树)的一致性方面的特征,我们考虑对三元组进行操作的启发式算法。作为替代方案,我们展示了与集合分区问题的密切联系,该问题导致了一类类似于 Aho 的超级树算法的自顶向下递归算法,并产生了在某种意义上一致的 BMG 编辑算法保持 BMG 不变。广泛的基准测试表明,分区步骤的社区检测算法对 BMG 编辑效果最好。嘈杂的 BMG 数据可以以足够的准确性和效率进行校正,使 BMG 成为经典系统发育方法的有吸引力的替代方案。
更新日期:2021-08-19
down
wechat
bug