当前位置: X-MOL 学术Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
i6mA-stack: A stacking ensemble-based computational prediction of DNA N6-methyladenine (6mA) sites in the Rosaceae genome
Genomics ( IF 3.4 ) Pub Date : 2020-10-01 , DOI: 10.1016/j.ygeno.2020.09.054
Jhabindra Khanal 1 , Dae Young Lim 2 , Hilal Tayara 3 , Kil To Chong 2
Affiliation  

DNA N6-methyladenine (6 mA) is an epigenetic modification that plays a vital role in a variety of cellular processes in both eukaryotes and prokaryotes. Accurate information of 6 mA sites in the Rosaceae genome may assist in understanding genomic 6 mA distributions and various biological functions such as epigenetic inheritance. Various studies have shown the possibility of identifying 6 mA sites through experiments, but the procedures are time-consuming and costly. To overcome the drawbacks of experimental methods, we propose an accurate computational paradigm based on a machine learning (ML) technique to identify 6 mA sites in Rosa chinensis (R.chinensis) and Fragaria vesca (F.vesca). To improve the performance of the proposed model and to avoid overfitting, a recursive feature elimination with cross-validation (RFECV) strategy is used to extract the optimal number of features (ONF) subset from five different DNA sequence encoding schemes, i.e., Binary Encoding (BE), Ring-Function-Hydrogen-Chemical Properties (RFHC), Electron-Ion-Interaction Pseudo Potentials of Nucleotides (EIIP), Dinucleotide Physicochemical Properties (DPCP), and Trinucleotide Physicochemical Properties (TPCP). Subsequently, we use the ONF subset to train a double layers of ML-based stacking model to create a bioinformatics tool named ‘i6mA-stack’. This tool outperforms its peer tool in general and is currently available at http://nsclbio.jbnu.ac.kr/tools/i6mA-stack/



中文翻译:

i6mA-stack:基于堆叠集成计算预测蔷薇科基因组中 DNA N6-甲基腺嘌呤 (6mA) 位点

DNA N6-甲基腺嘌呤 (6 mA) 是一种表观遗传修饰,在真核生物和原核生物的多种细胞过程中发挥着至关重要的作用。蔷薇科基因组中 6 mA 位点的准确信息可能有助于了解基因组 6 mA 分布和各种生物学功能,例如表观遗传。各种研究表明通过实验识别 6 mA 位点的可能性,但该过程耗时且成本高。为了克服实验方法的缺点,我们提出了一种基于机器学习 (ML) 技术的准确计算范式,以识别月季 (R.chinensis)草莓 (F.vesca)中的 6 mA 位点. 为了提高所提出模型的性能并避免过度拟合,使用交叉验证递归特征消除(RFECV)策略从五种不同的DNA序列编码方案中提取最优特征数(ONF)子集,即二进制编码(BE)、环功能-氢-化学性质 (RFHC)、核苷酸的电子-离子相互作用伪电位 (EIIP)、二核苷酸物理化学性质 (DPCP) 和三核苷酸物理化学性质 (TPCP)。随后,我们使用 ONF 子集训练基于 ML 的双层堆叠模型,以创建名为“i6mA-stack”的生物信息学工具。该工具总体上优于同类工具,目前可在 http://nsclbio.jbnu.ac.kr/tools/i6mA-stack/ 获得

更新日期:2020-10-02
down
wechat
bug