当前位置: X-MOL 学术Brief. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Meta-i6mA: an interspecies predictor for identifying DNA N6-methyladenine sites of plant genomes by exploiting informative features in an integrative machine-learning framework.
Briefings in Bioinformatics ( IF 6.8 ) Pub Date : 2020-09-10 , DOI: 10.1093/bib/bbaa202
Md Mehedi Hasan 1 , Shaherin Basith 2 , Mst Shamima Khatun 3 , Gwang Lee 2 , Balachandran Manavalan 4 , Hiroyuki Kurata 5
Affiliation  

DNA N6-methyladenine (6mA) represents important epigenetic modifications, which are responsible for various cellular processes. The accurate identification of 6mA sites is one of the challenging tasks in genome analysis, which leads to an understanding of their biological functions. To date, several species-specific machine learning (ML)-based models have been proposed, but majority of them did not test their model to other species. Hence, their practical application to other plant species is quite limited. In this study, we explored 10 different feature encoding schemes, with the goal of capturing key characteristics around 6mA sites. We selected five feature encoding schemes based on physicochemical and position-specific information that possesses high discriminative capability. The resultant feature sets were inputted to six commonly used ML methods (random forest, support vector machine, extremely randomized tree, logistic regression, naïve Bayes and AdaBoost). The Rosaceae genome was employed to train the above classifiers, which generated 30 baseline models. To integrate their individual strength, Meta-i6mA was proposed that combined the baseline models using the meta-predictor approach. In extensive independent test, Meta-i6mA showed high Matthews correlation coefficient values of 0.918, 0.827 and 0.635 on Rosaceae, rice and Arabidopsis thaliana, respectively and outperformed the existing predictors. We anticipate that the Meta-i6mA can be applied across different plant species. Furthermore, we developed an online user-friendly web server, which is available at http://kurata14.bio.kyutech.ac.jp/Meta-i6mA/.

中文翻译:

Meta-i6mA:通过利用综合机器学习框架中的信息特征来识别植物基因组 DNA N6-甲基腺嘌呤位点的种间预测器。

DNA N 6-甲基腺嘌呤 (6mA) 代表重要的表观遗传修饰,负责各种细胞过程。准确识别 6mA 位点是基因组分析中具有挑战性的任务之一,这有助于了解它们的生物学功能。迄今为止,已经提出了几种基于特定物种的机器学习 (ML) 的模型,但其中大多数没有针对其他物种测试他们的模型。因此,它们对其他植物物种的实际应用非常有限。在这项研究中,我们探索了 10 种不同的特征编码方案,目的是捕捉 6mA 站点周围的关键特征。我们根据物理化学和位置特定信息选择了五种具有高判别能力的特征编码方案。得到的特征集被输入到六种常用的机器学习方法中(随机森林、支持向量机、极度随机化树、逻辑回归、朴素贝叶斯和 AdaBoost)。蔷薇科基因组被用来训练上述分类器,产生了 30 个基线模型。为了整合他们的个人实力,提出了 Meta-i6mA,它使用元预测方法结合了基线模型。在广泛的独立测试中,Meta-i6mA 在蔷薇科、水稻和 Meta-i6mA 被提议使用元预测方法结合基线模型。在广泛的独立测试中,Meta-i6mA 在蔷薇科、水稻和 Meta-i6mA 被提议使用元预测方法结合基线模型。在广泛的独立测试中,Meta-i6mA 在蔷薇科、水稻和拟南芥分别优于现有的预测因子。我们预计 Meta-i6mA 可以应用于不同的植物物种。此外,我们开发了一个在线用户友好的网络服务器,可在 http://kurata14.bio.kyutech.ac.jp/Meta-i6mA/ 获得。
更新日期:2020-09-11
down
wechat
bug