当前位置: X-MOL 学术bioRxiv. Genom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Convolution Based Computational Approach Towards DNA N6-methyladenine Site Identification and Motif Extraction in Rice Genome
bioRxiv - Genomics Pub Date : 2021-05-18 , DOI: 10.1101/2020.07.08.194308
Chowdhury Rafeed Rahman , Ruhul Amin , Swakkhar Shatabda , Md. Sadrul Islam Toaha

DNA N6-methylation (6mA) in Adenine nucleotide is a post replication modification responsible for many biological functions. Automated and accurate computational methods can help to identify 6mA sites in long genomes saving significant time and money. Our study develops a convolutional neural network (CNN) based tool i6mA-CNN capable of identifying 6mA sites in the rice genome. Our model coordinates among multiple types of features such as PseAAC (Pseudo Amino Acid Composition) inspired customized feature vector, multiple one hot representations and dinucleotide physicochemical properties. It achieves auROC (area under Receiver Operating Characteristic curve) score of 0.98 with an overall accuracy of 93.97% using 5 fold cross validation on benchmark dataset. Finally, we evaluate our model on three other plant genome 6mA site identification test datasets. Results suggest that our proposed tool is able to generalize its ability of 6mA site identification on plant genomes irrespective of plant species. An algorithm for potential motif extraction and a feature importance analysis procedure are two by products of this research. Web tool for this research can be found at: https://cutt.ly/dgp3QTR.

中文翻译:

基于卷积的水稻基因组DNA N6-甲基腺嘌呤位点鉴定和基序提取的计算方法

腺嘌呤核苷酸中的DNA N6-甲基化(6mA)是复制后修饰,负责许多生物学功能。自动化和准确的计算方法可帮助识别长基因组中的6mA位点,从而节省大量时间和金钱。我们的研究开发了基于卷积神经网络(CNN)的工具i6mA-CNN,该工具能够识别水稻基因组中的6mA位点。我们的模型在多种类型的特征之间进行协调,例如PseAAC(伪氨基酸组成)启发的定制特征向量,多个“一”热表示和二核苷酸的理化性质。使用基准数据集的5倍交叉验证,它的auROC(接收器工作特性曲线下的面积)得分为0.98,总准确度为93.97%。最后,我们在其他三个植物基因组6mA站点识别测试数据集上评估了我们的模型。结果表明,我们提出的工具能够在植物基因组上推广其6mA位点识别的能​​力,而与植物物种无关。该研究的两个副产品是用于潜在图案提取的算法和特征重要性分析程序。可以在以下网址找到用于此研究的Web工具:https://cutt.ly/dgp3QTR。
更新日期:2021-05-19
down
wechat
bug