当前位置: X-MOL 学术SAR QSAR Environ. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
iDHS-DMCAC: identifying DNase I hypersensitive sites with balanced dinucleotide-based detrending moving-average cross-correlation coefficient.
SAR and QSAR in Environmental Research ( IF 2.3 ) Pub Date : 2019-05-23 , DOI: 10.1080/1062936x.2019.1615546
Y Liang 1 , S Zhang 2
Affiliation  

DNase I hypersensitive sites (DHSs) are associated with regulatory DNA elements, so their good understanding is significant for both the biomedical research and the discovery of new drugs. Traditional experimental methods are laborious, time consuming and an inaccurately task to detect DHSs. More importantly, with the avalanche of genome sequences in the postgenomic age, it is highly essential to develop cost-effective computational approaches to identify DHSs. In this paper, we develop a statistical feature extraction model using the detrended moving-average cross-correlation (DMCA) coefficient descriptor based on dinucleotide property matrix generated by the 15 DNA dinucleotide properties, and this model is named iDHS-DMCAC. A 105-dimensional feature vector is constructed for a certain window on the two class imbalanced benchmark datasets, with over-sampling and support vector machine algorithms. Rigorous cross-validations indicate that our predictor remarkably outperforms the existing models in both accuracy and stability. We anticipate that iDHS-DMCAC will become a very useful high throughput tool, or at the very least, a complementary tool to the existing methods of identifying DNase I hypersensitive sites. The datasets and source codes of the proposed model are freely available at https://github.com/shengli0201/Datasets.



中文翻译:

iDHS-DMCAC:使用基于二核苷酸的平衡趋势移动平均互相关系数来识别DNase I超敏位点。

DNase I超敏位点(DHS)与调节性DNA元素相关,因此它们的良好理解对于生物医学研究和新药的发现均具有重要意义。传统的实验方法费力,费时且检测DHS的任务不准确。更重要的是,在后基因组时代,随着基因组序列的大量增加,开发具有成本效益的计算方法来识别DHS至关重要。在本文中,我们基于由15个DNA双核苷酸属性生成的双核苷酸属性矩阵,使用去趋势移动平均互相关(DMCA)系数描述符开发了统计特征提取模型,该模型称为iDHS-DMCAC。在两类不平衡基准数据集的某个窗口上构建105维特征向量,带有过采样和支持向量机算法。严格的交叉验证表明,我们的预测器的准确性和稳定性均明显优于现有模型。我们预计,iDHS-DMCAC将成为非常有用的高通量工具,或者至少是对识别DNase I超敏位点的现有方法的补充工具。所提议模型的数据集和源代码可从https://github.com/shengli0201/Datasets免费获得。现有的识别DNase I超敏位点的方法的补充工具。所提议模型的数据集和源代码可从https://github.com/shengli0201/Datasets免费获得。现有的识别DNase I超敏位点的方法的补充工具。所提议模型的数据集和源代码可从https://github.com/shengli0201/Datasets免费获得。

更新日期:2019-05-23
down
wechat
bug