当前位置: X-MOL 学术Inform. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Scalable generalized median graph estimation and its manifold use in bioinformatics, clustering, classification, and indexing
Information Systems ( IF 3.7 ) Pub Date : 2021-03-27 , DOI: 10.1016/j.is.2021.101766
David B. Blumenthal , Nicolas Boria , Sébastien Bougleux , Luc Brun , Johann Gamper , Benoit Gaüzère

In this paper, we present GMG-BCU   a local search algorithm based on block coordinate update for estimating a generalized median graph for a given collection of labeled or unlabeled input graphs. Unlike all competitors, GMG-BCU is designed for both discrete and continuous label spaces and can be configured to run in linear time w. r. t. the size of the graph collection whenever median node and edge labels are computable in linear time. These properties make GMG-BCU usable for applications such as differential microbiome data analysis, graph classification, clustering, and indexing. We also prove theoretical properties of generalized median graphs, namely, that they exist under reasonable assumptions which are met in almost all application scenarios, that they are in general non-unique, that they are NP-hard to compute and APX-hard to approximate, and that no polynomial α-approximation exists for any α unless the graph isomorphism problem is in P. Extensive experiments on six different datasets show that our heuristic GMG-BCU always outperforms the state of the art in terms of runtime or quality (on most datasets, both w. r. t. runtime and quality), that it is the only available heuristic which can cope with collections containing several thousands of graphs, and that it shows very promising potential when used for the aforementioned applications. GMG-BCU is freely available on GitHub: https://github.com/dbblumenthal/gedlib/.



中文翻译:

可扩展的广义中值图估计及其在生物信息学,聚类,分类和索引中的广泛应用

在本文中,我们介绍了GMG-BCU   一种基于块坐标更新的局部搜索算法,用于估计给定标记或未标记输入图集合的广义中值图。与所有竞争对手不同,GMG-BCU设计用于离散和连续标签空间,并可配置为以线性时间w运行。  t。每当可在线性时间内计算中值节点和边缘标签时,图形集合的大小。这些特性使GMG-BCU适用于差分微生物组数据分析,图形分类,聚类和索引编制等应用。我们还证明了广义中值图的理论性质,即它们存在于几乎所有应用场景都可以满足的合理假设下,它们通常是非唯一的,它们是NP-难以计算和 悉尼证券交易所-很难近似,并且没有多项式 α-存在任何近似值 α 除非存在图同构问题 P。在六个不同的数据集大量的实验表明,我们的启发式GMG的BCU始终优于现有技术的状态,运行时间或质量方面(在大多数数据集,包括了W,  吨。运行时间和质量),它是唯一可用的启发式它可以处理包含数千个图形的集合,并且在用于上述应用程序时显示出非常有前途的潜力。GMG-BCU在GitHub上免费提供:https://github.com/dbblumenthal/gedlib/。

更新日期:2021-04-12
down
wechat
bug