当前位置: X-MOL 学术J. Appl. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Scalable module detection for attributed networks with applications to breast cancer
Journal of Applied Statistics ( IF 1.5 ) Pub Date : 2020-08-13 , DOI: 10.1080/02664763.2020.1803811
Han Yu 1 , Rachael Hageman Blair 2
Affiliation  

The objective of network module detection is to identify groups of nodes within a network structure that are tightly connected. Nodes in a network often have attributes (aka metadata) associated with them. It is often desirable to identify groups of nodes that are tightly connected in the network structure, but also have strong similarity in their attributes. Utilizing attribute information in module detection is a major challenge because it requires bridging the structural network with attribute data. A Weighted Fast Greedy (WFG) algorithm for attribute-based module detection is proposed. WFG utilizes logistic regression to bridge the structural and attribute spaces. The logistic function naturally emphasizes associations between attributes and network structure accordingly, and can be easily interpreted. A breast cancer application is presented that connects a protein–protein interaction network gene expression data and a survival outcome. This application demonstrates the importance of embedding attribute information into the community detection framework on a breast cancer dataset. Five modules were significant for survival and they contained known pathways and markers for cancer, including cell cycle, p53 pathway, BRCA1, BRCA2, and AURKB, among others. Whereas, neither the gene expression data nor the network structure alone gave rise to these cancer biomarkers and signatures.



中文翻译:

用于乳腺癌的属性网络的可扩展模块检测

网络模块检测的目的是识别网络结构中紧密连接的节点组。网络中的节点通常具有与其关联的属性(也称为元数据)。通常需要识别在网络结构中紧密连接的节点组,但它们的属性也具有很强的相似性。在模块检测中利用属性信息是一项重大挑战,因为它需要将结构网络与属性数据联系起来。提出了一种用于基于属性的模块检测的加权快速贪心(WFG)算法。WFG 利用逻辑回归来连接结构和属性空间。逻辑函数自然地相应地强调属性和网络结构之间的关联,并且可以很容易地解释。提出了一个乳腺癌应用程序,它连接了蛋白质-蛋白质相互作用网络基因表达数据和生存结果。该应用程序展示了将属性信息嵌入到乳腺癌数据集的社区检测框架中的重要性。五个模块对生存具有重要意义,它们包含已知的癌症通路和标志物,包括细胞周期、p53 通路、BRCA1BRCA2AURKB等。然而,单独的基因表达数据和网络结构都没有产生这些癌症生物标志物和特征。

更新日期:2020-08-13
down
wechat
bug