Large-scale network motif analysis using compression,Data Mining and Knowledge Discovery

当前位置： X-MOL 学术 › Data Min. Knowl. Discov. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Large-scale network motif analysis using compression
Data Mining and Knowledge Discovery ( IF 2.8 ) Pub Date : 2020-06-23 , DOI: 10.1007/s10618-020-00691-y
Peter Bloem , Steven de Rooij

We introduce a new method for finding network motifs. Subgraphs are motifs when their frequency in the data is high compared to the expected frequency under a null model. To compute this expectation, a full or approximate count of the occurrences of a motif is normally repeated on as many as 1000 random graphs sampled from the null model; a prohibitively expensive step. We use ideas from the minimum description length literature to define a new measure of motif relevance. With our method, samples from the null model are not required. Instead we compute the probability of the data under the null model and compare this to the probability under a specially designed alternative model. With this new relevance test, we can search for motifs by random sampling, rather than requiring an accurate count of all instances of a motif. This allows motif analysis to scale to networks with billions of links.

中文翻译：

使用压缩的大规模网络主题分析

我们介绍一种寻找网络主题的新方法。当子图在数据中的频率高于空模型下的预期频率时，它们就是主题。为了计算这种期望，通常会在从空模型中抽取的多达1000张随机图上重复完整或近似地计算出一个主题的发生；昂贵的步骤。我们使用最小描述长度文献中的思想来定义主题相关性的新度量。使用我们的方法，不需要来自空模型的样本。取而代之的是，我们计算空模型下数据的概率，并将其与经过特殊设计的替代模型下的概率进行比较。借助这一新的相关性测试，我们可以通过随机采样来搜索主题，而无需对主题的所有实例进行准确计数。这使得主题分析可以扩展到具有数十亿个链接的网络。

更新日期：2020-06-23

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11