当前位置: X-MOL 学术Interdiscip. Sci. Comput. Life Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
i6mA-VC: A Multi-Classifier Voting Method for the Computational Identification of DNA N6-methyladenine Sites
Interdisciplinary Sciences: Computational Life Sciences ( IF 4.8 ) Pub Date : 2021-04-08 , DOI: 10.1007/s12539-021-00429-4
Tian Xue 1 , Shengli Zhang 1 , Huijuan Qiao 1
Affiliation  

DNA N6-methyladenine (6 mA), as an essential component of epigenetic modification, cannot be neglected in genetic regulation mechanism. The efficient and accurate prediction of 6 mA sites is beneficial to the development of biological genetics. Biochemical experimental methods are considered to be time-consuming and laborious. Most of the established machine learning methods have a single dataset. Although some of them have achieved cross-species prediction, their results are not satisfactory. Therefore, we designed a novel statistical model called i6mA-VC to improve the accuracy for 6 mA sites. On the one hand, kmer and binary encoding are applied to extract features, and then gradient boosting decision tree (GBDT) embedded method is applied as the feature selection strategy. On the other hand, DNA sequences are represented by vectors through the feature extraction method of ring-function-hydrogen-chemical properties (RFHCP) and the feature selection strategy of ExtraTree. After fusing the two optimal features, a voting classifier based on gradient boosting decision tree (GBDT), light gradient boosting machine (LightGBM) and multilayer perceptron classifier (MLPC) is constructed for final classification and prediction. The accuracy of Rice dataset and M.musculus dataset with five-fold cross-validation are 0.888 and 0.967, respectively. The cross-species dataset is selected as independent testing dataset, and the accuracy reaches 0.848. Through rigorous experiments, it is demonstrated that the proposed predictor is convincing and applicable. The development of i6mA-VC predictor will become an effective way for the recognition of N6-methyladenine sites, and it will also be beneficial for biological geneticists to further study gene expression and DNA modification. In addition, an accessible web-server for i6mA-VC is available from http://www.zhanglab.site/.



中文翻译:

i6mA-VC:一种用于计算识别 DNA N6-甲基腺嘌呤位点的多分类器投票方法

DNA N6-甲基腺嘌呤(6 mA)作为表观遗传修饰的重要组成部分,在遗传调控机制中不容忽视。6 mA位点的高效准确预测有利于生物遗传学的发展。生化实验方法被认为是费时费力的。大多数已建立的机器学习方法都有一个数据集。虽然他们中的一些人已经实现了跨物种预测,但他们的结果并不令人满意。因此,我们设计了一种称为 i6mA-VC 的新型统计模型,以提高 6 mA 位点的准确性。一方面应用kmer和二进制编码提取特征,然后应用梯度提升决策树(GBDT)嵌入方法作为特征选择策略。另一方面,DNA序列通过环函数氢化学性质(RFHCP)的特征提取方法和ExtraTree的特征选择策略用向量表示。融合两个最优特征后,构建基于梯度提升决策树(GBDT)、光梯度提升机(LightGBM)和多层感知器分类器(MLPC)的投票分类器进行最终分类和预测。经过五重交叉验证的 Rice 数据集和 M.musculus 数据集的准确率分别为 0.888 和 0.967。选择跨物种数据集作为独立测试数据集,准确率达到0.848。通过严格的实验,证明所提出的预测器具有说服力和适用性。i6mA-VC预测器的开发将成为识别N6-甲基腺嘌呤位点的有效途径,也将有利于生物遗传学家进一步研究基因表达和DNA修饰。此外,可从 http://www.zhanglab.site/ 获得 i6mA-VC 的可访问网络服务器。

更新日期:2021-04-08
down
wechat
bug