当前位置: X-MOL 学术Interdiscip. Sci. Comput. Life Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
LncRNA-Encoded Short Peptides Identification Using Feature Subset Recombination and Ensemble Learning
Interdisciplinary Sciences: Computational Life Sciences ( IF 4.8 ) Pub Date : 2021-07-25 , DOI: 10.1007/s12539-021-00464-1
Siyuan Zhao 1 , Jun Meng 1 , Yushi Luan 2
Affiliation  

Long non-coding RNA (lncRNA), which is a type of non-coding RNA, was reported to contain short open reading frames (sORFs). SORFs-encoded short peptides (SEPs) have been demonstrated to play a crucial role in regulating the biological processes such as growth, development, and resistance response. The identification of SEPs is vital to further understanding their function. However, there is still a lack of methods for identifying SEPs effectively and rapidly. In this study, a novel method for lncRNA-encoded short peptides identification based on feature subset recombination and ensemble learning, lncPepid, is developed. lncPepid transforms the data of Zea mays and Arabidopsis thaliana into hybrid features from two aspects including sequence composition and physicochemical properties separately. It optimizes hybrid features by proposing a novel weighted iteration-based feature selection method to recombine a stable subset that characterizes SEPs effectively. Different classification models with different optimized features are constructed and tested separately. The outputs of the optimal models are integrated for ensemble classification to improve efficiency. Experimental results manifest that the geometric mean of sensitivity and specificity of lncPepid is about 70% on the identification of functional SEPs derived from multiple species. It is an effective and rapid method for the identification of lncRNA-encoded short peptides. This study can be extended to the research on SEPs from other species and have crucial implications for further findings and studies of functional genomics.



中文翻译:

使用特征子集重组和集成学习识别 LncRNA 编码的短肽

据报道,长非编码 RNA (lncRNA) 是一种非编码 RNA,含有短开放阅读框 (sORF)。SORFs 编码的短肽 (SEPs) 已被证明在调节诸如生长、发育和抗性反应等生物过程中起着至关重要的作用。标准必要专利的识别对于进一步了解其功能至关重要。然而,仍然缺乏有效快速识别标准必要专利的方法。本研究开发了一种基于特征子集重组和集成学习的 lncRNA 编码短肽识别新方法 lncPepid。lncPepid 转换玉米拟南芥的数据分别从序列组成和理化性质两个方面划分杂种特征。它通过提出一种新颖的基于加权迭代的特征选择方法来优化混合特征,以重新组合有效表征 SEP 的稳定子集。分别构建和测试具有不同优化特征的不同分类模型。集成最优模型的输出以进行集成分类以提高效率。实验结果表明,lncPepid在鉴定来自多个物种的功能性标准必要专利时,其敏感性和特异性的几何平均值约为70%。它是鉴定lncRNA编码短肽的一种有效、快速的方法。

更新日期:2021-07-25
down
wechat
bug