当前位置: X-MOL 学术Int. J. Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Imbalance deep multi‐instance learning for predicting isoform–isoform interactions
International Journal of Intelligent Systems ( IF 7 ) Pub Date : 2021-02-25 , DOI: 10.1002/int.22402
Guoxian Yu 1, 2, 3 , Jie Zeng 2 , Jun Wang 2, 3 , Hong Zhang 2 , Xiangliang Zhang 4 , Maozu Guo 5
Affiliation  

Multi‐instance learning (MIL) can model complex bags (samples) that are further made of diverse instances (subsamples). In typical MIL, the labels of bags are known while those of individual instances are unknown and to be specified. In this paper we propose an imbalanced deep multi‐instance learning approach (IDMIL‐III) and apply it to predict genome‐wide isoform–isoform interactions (IIIs). This prediction task is crucial for precisely understanding the interactome between proteoforms and to reveal their functional diversity. The current solutions typically formulate the prediction of IIIs as a MIL problem by pairing two genes as a “bag” and any two isoforms spliced from these two genes as “instances.” The key instances (interacting isoform pairs) trigger the label of the positive (interacting) gene bags, which is important for identifying the IIIs. Furthermore, the prediction task was simplified as a balanced classification problem, which in practice is a rather imbalanced one. To address these issues, IDMIL‐III fuses RNA‐seq, nucleotide sequence, amino acid sequence and exon array data, and further introduces a novel loss function to separately model the loss of positive pairs and of negative pairs, and thus to avoid the expected loss dominated by majority negative pairs. In addition, it includes an attention strategy to identify positive isoform pairs from a positive gene bag. Extensive experimental results prove the effectiveness of IDMIL‐III on predicting IIIs. Particularly, IDMIL‐III achieves an F1 value as 95.4%, at least 3.8% higher than those of competitive methods at the gene‐level; and obtains an F1 as 29.8%, at least 2.4% higher than the state‐of‐the‐art methods at the isoform‐level. The code of IDMIL‐III is available at http://mlda.swu.edu.cn/codes.php?name=IDMIL-III.

中文翻译:

不平衡的深度多实例学习,可预测同工型之间的相互作用

多实例学习(MIL)可以对复杂的包装袋(样本)进行建模,这些袋进一步由不同的实例(子样本)组成。在典型的MIL中,袋子的标签是已知的,而个别情况的标签是未知的并且需要指定。在本文中,我们提出了一种不平衡的深度多实例学习方法(IDMIL-III),并将其应用于预测基因组范围的同工型-同工型相互作用(IIIs)。该预测任务对于精确了解蛋白形式之间的相互作用组并揭示其功能多样性至关重要。当前的解决方案通常通过将两个基因配对为“袋子”,将从这两个基因剪接的任意两个同工型配对为“实例”,来将IIIs预测为MIL问题。关键实例(相互作用的同工型对)触发阳性(相互作用)基因袋的标签,这对于识别III很重要。此外,将预测任务简化为平衡分类问题,实际上这是一个非常不平衡的问题。为了解决这些问题,IDMIL-III融合了RNA-seq,核苷酸序列,氨基酸序列和外显子阵列数据,并进一步引入了一种新颖的损失功能,分别对正对和负对的损失进行建模,从而避免了预期的损失。损失以多数负对为主导。另外,它包括一种从阳性基因袋中鉴定阳性同工型对的注意策略。大量的实验结果证明了IDMIL-III在预测IIIs方面的有效性。特别是,IDMIL-III的F1值达到95.4%,至少比基因水平上的竞争方法高3.8%。并获得29.8%(至少2)的F1。比同等水平的最新方法高4%。IDMIL-III的代码可从http://mlda.swu.edu.cn/codes.php?name=IDMIL-III获得。
更新日期:2021-04-27
down
wechat
bug