Exploiting causality in gene network reconstruction based on graph embedding,Machine Learning

当前位置： X-MOL 学术 › Mach. Learn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Exploiting causality in gene network reconstruction based on graph embedding
Machine Learning ( IF 4.3 ) Pub Date : 2019-12-03 , DOI: 10.1007/s10994-019-05861-8
Gianvito Pio , Michelangelo Ceci , Francesca Prisciandaro , Donato Malerba

Gene network reconstruction is a bioinformatics task that aims at modelling the complex regulatory activities that may occur among genes. This task is typically solved by means of link prediction methods that analyze gene expression data. However, the reconstructed networks often suffer from a high amount of false positive edges, which are actually the result of indirect regulation activities due to the presence of common cause and common effect phenomena or, in other terms, due to the fact that the adopted inductive methods do not take into account possible causality phenomena. This issue is accentuated even more by the inherent presence of a high amount of noise in gene expression data. Existing methods for the identification of a transitive reduction of a network or for the removal of (possibly) redundant edges suffer from limitations in the structure of the network or in the nature/length of the indirect regulation, and often require additional pre-processing steps to handle specific peculiarities of the networks (e.g., cycles). Moreover, they are not able to consider possible community structures and possible similar roles of the genes in the network (e.g. hub nodes), which may change the tendency of nodes to be highly connected (and with which nodes) in the network. In this paper, we propose the method INLOCANDA, which learns an inductive predictive model for gene network reconstruction and overcomes all the mentioned limitations. In particular, INLOCANDA is able to (i) identify and exploit indirect relationships of arbitrary length to remove edges due to common cause and common effect phenomena; (ii) take into account possible community structures and possible similar roles by means of graph embedding. Experiments performed along multiple dimensions of analysis on benchmark, real networks of two organisms ( E. coli and S. cerevisiae ) show a higher accuracy with respect to the competitors, as well as a higher robustness to the presence of noise in the data, also when a huge amount of (possibly false positive) interactions is removed. Availability: http://www.di.uniba.it/~gianvitopio/systems/inlocanda/

中文翻译：

在基于图嵌入的基因网络重建中利用因果关系

基因网络重建是一项生物信息学任务，旨在模拟基因之间可能发生的复杂调控活动。该任务通常通过分析基因表达数据的链接预测方法来解决。然而，重建的网络经常遭受大量的假阳性边缘，这实际上是由于共因和共效现象的存在而导致的间接调节活动的结果，或者换句话说，由于采用的归纳法方法不考虑可能的因果关系现象。由于基因表达数据中固有的大量噪声，这个问题更加突出。用于识别网络的传递归约或去除（可能）冗余边缘的现有方法受到网络结构或间接调节的性质/长度的限制，并且通常需要额外的预处理步骤处理网络的特定特性（例如，周期）。此外，他们无法考虑网络中可能的社区结构和基因可能的相似角色（例如枢纽节点），这可能会改变网络中节点高度连接（以及与哪些节点）的趋势。在本文中，我们提出了 INLOCANDA 方法，该方法学习用于基因网络重建的归纳预测模型并克服了所有提到的局限性。特别是，INLOCANDA 能够 (i) 识别和利用任意长度的间接关系来消除由于共同因果现象引起的边缘；(ii) 通过图嵌入考虑可能的社区结构和可能的类似角色。在基准分析的多个维度上进行的实验，两种生物（大肠杆菌和酿酒酵母）的真实网络显示出相对于竞争对手更高的准确性，以及对数据中存在噪声的更高稳健性，也当大量（可能是误报）交互被删除时。可用性：http://www.di.uniba.it/~gianvitopio/systems/inlocanda/ 在基准分析的多个维度上进行的实验，两种生物（大肠杆菌和酿酒酵母）的真实网络显示出相对于竞争对手更高的准确性，以及对数据中存在噪声的更高稳健性，也当大量（可能是误报）交互被删除时。可用性：http://www.di.uniba.it/~gianvitopio/systems/inlocanda/ 在基准分析的多个维度上进行的实验，两种生物（大肠杆菌和酿酒酵母）的真实网络显示出相对于竞争对手更高的准确性，以及对数据中存在噪声的更高稳健性，也当大量（可能是误报）交互被删除时。可用性：http://www.di.uniba.it/~gianvitopio/systems/inlocanda/

更新日期：2019-12-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11