当前位置: X-MOL 学术SAR QSAR Environ. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
iORI-ENST: identifying origin of replication sites based on elastic net and stacking learning
SAR and QSAR in Environmental Research ( IF 2.3 ) Pub Date : 2021-03-18 , DOI: 10.1080/1062936x.2021.1895884
Y. Yao 1 , S. Zhang 1 , Y. Liang 2
Affiliation  

ABSTRACT

DNA replication is not only the basis of biological inheritance but also the most fundamental process in all living organisms. It plays a crucial role in the cell-division cycle and gene expression regulation. Hence, the accurate identification of the origin of replication sites (ORIs) has a great meaning for further understanding the regulatory mechanism of gene expression and treating genic diseases. In this paper, a novel, feasible and powerful model, namely, iORI-ENST is designed for identifying ORIs. Firstly, we extract the different features by incorporating mono-nucleotide binary encoding and dinucleotide-based spatial autocorrelation. Subsequently, elastic net is utilized as the feature selection method to select the optimal feature set. And then stacking learning is employed to predict ORIs and non-ORIs, which contains random forest, adaboost, gradient boosting decision tree, extra trees and support vector machine. Finally, the ORI sites are identified on the benchmark datasets S1 and S2 with their accuracies of 91.41% and 95.07%, respectively. Meanwhile, an independent dataset S3 is employed to verify the validation and transferability of our model and its accuracy reaches 91.10%. Comparing with state-of-the-art methods, our model achieves more remarkable performance. The results show our model is a feasible, effective and powerful tool for identifying ORIs. The source code and datasets are available at https://github.com/YingyingYao/iORI-ENST.



中文翻译:

iORI-ENST:基于弹性网和堆栈学习识别复制站点的来源

摘要

DNA复制不仅是生物遗传的基础,而且是所有活生物体中最基本的过程。它在细胞分裂周期和基因表达调控中起着至关重要的作用。因此,准确鉴定复制位点(ORIs)的起源对于进一步理解基因表达的调控机制和治疗遗传性疾病具有重要意义。本文设计了一种新颖,可行且功能强大的模型,即iORI-ENST来识别ORI。首先,我们通过结合单核苷酸二进制编码和基于二核苷酸的空间自相关来提取不同的特征。随后,将弹性网用作特征选择方法以选择最佳特征集。然后使用堆叠学习来预测ORI和非ORI,其中包含随机森林,adaboost,梯度提升决策树,额外的树和支持向量机。最后,在基准数据集上确定了ORI站点小号1个小号2个准确度分别为91.41%和95.07%。同时,一个独立的数据集小号3用于验证我们模型的有效性和可移植性,其准确性达到91.10%。与最先进的方法相比,我们的模型实现了更出色的性能。结果表明,该模型是一种可行的,有效的,功能强大的ORIs识别工具。源代码和数据集可在https://github.com/YingyingYao/iORI-ENST上获得。

更新日期:2021-03-31
down
wechat
bug