当前位置: X-MOL 学术Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-feature fusion for deep learning to predict plant lncRNA-protein interaction.
Genomics ( IF 4.4 ) Pub Date : 2020-05-11 , DOI: 10.1016/j.ygeno.2020.05.005
Jael Sanyanda Wekesa 1 , Jun Meng 2 , Yushi Luan 3
Affiliation  

Long non-coding RNAs (lncRNAs) play key roles in regulating cellular biological processes through diverse molecular mechanisms including binding to RNA binding proteins. The majority of plant lncRNAs are functionally uncharacterized, thus, accurate prediction of plant lncRNA-protein interaction is imperative for subsequent functional studies. We present an integrative model, namely DRPLPI. Its uniqueness is that it predicts by multi-feature fusion. Structural and four groups of sequence features are used, including tri-nucleotide composition, gapped k-mer, recursive complement and binary profile. We design a multi-head self-attention long short-term memory encoder-decoder network to extract generative high-level features. To obtain robust results, DRPLPI combines categorical boosting and extra trees into a single meta-learner. Experiments on Zea mays and Arabidopsis thaliana obtained 0.9820 and 0.9652 area under precision/recall curve (AUPRC) respectively. The proposed method shows significant enhancement in the prediction performance compared with existing state-of-the-art methods.

中文翻译:

用于深度学习的多特征融合预测植物 lncRNA-蛋白质相互作用。

长链非编码 RNA (lncRNA) 通过多种分子机制(包括与 RNA 结合蛋白的结合)在调节细胞生物学过程中发挥关键作用。大多数植物 lncRNA 在功能上是未表征的,因此,准确预测植物 lncRNA-蛋白质相互作用对于后续的功能研究至关重要。我们提出了一个综合模型,即 DRPLPI。它的独特之处在于它通过多特征融合进行预测。使用了结构和四组序列特征,包括三核苷酸组成、缺口 k-mer、递归互补和二元图谱。我们设计了一个多头自注意长短期记忆编码器-解码器网络来提取生成的高级特征。为了获得稳健的结果,DRPLPI 将分类提升和额外的树组合到一个元学习器中。玉米和拟南芥的实验分别获得了 0.9820 和 0.9652 的精确率/召回率曲线 (AUPRC) 下的面积。与现有的最先进方法相比,所提出的方法显示出预测性能的显着增强。
更新日期:2020-05-11
down
wechat
bug