当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Probe Efficient Feature Representation of Gapped K-mer Frequency Vectors from Sequences Using Deep Neural Networks.
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 3.6 ) Pub Date : 2018-08-31 , DOI: 10.1109/tcbb.2018.2868071
Zhen Cao , Shihua Zhang

Gapped k-mers frequency vectors (gkm-fv) has been presented for extracting sequence features. Coupled with support vector machine (gkm-SVM), gkm-fvs have been used to achieve effective sequence-based predictions. However, the huge computation of a large kernel matrix prevents it from using large amount of data. And it is unclear how to combine gkm-fvs with other data sources in the context of string kernel. On the other hand, the high dimensionality, colinearity and sparsity of gkm-fvs hinder the use of many traditional machine learning methods without a kernel trick. Therefore, we proposed a flexible and scalable framework gkm-DNN to achieve feature representation from high-dimensional gkm-fvs using deep neural networks (DNN). We first proposed a more concise version of gkm-fvs, which significantly reduce the dimension of gkm-fvs. Then we implemented an efficient method to calculate the gkm-fv of a given sequence at the first time. Finally, we adopted a DNN model with gkm-fvs as inputs to achieve efficient feature representation and a prediction task. Here, we took the transcription factor binding site prediction as an illustrative application and applied gkm-DNN onto 467 small and 69 big human ENCODE ChIP-seq datasets to demonstrate its performance and compared it with the state-of-the-art method gkm-SVM.

中文翻译:

使用深度神经网络从序列中探寻缺口K-mer频率向量的有效特征表示。

已提出有间隔的k-mers频率向量(gkm-fv),用于提取序列特征。结合支持向量机(gkm-SVM),gkm-fvs已用于实现基于序列的有效预测。但是,大型内核矩阵的巨大计算阻止了它使用大量数据。尚不清楚如何在字符串内核的上下文中将gkm-fvs与其他数据源结合在一起。另一方面,gkm-fvs的高维度,共线性和稀疏性阻碍了许多传统的机器学习方法的使用,而没有内核技巧。因此,我们提出了一种灵活且可扩展的框架gkm-DNN,以使用深度神经网络(DNN)从高维gkm-fvs实现特征表示。我们首先提出了更简洁的gkm-fvs版本,该版本显着减小了gkm-fvs的尺寸。然后,我们首次实现了一种有效的方法来计算给定序列的gkm-fv。最后,我们采用了以gkm-fvs为输入的DNN模型,以实现有效的特征表示和预测任务。在这里,我们以转录因子结合位点预测为例,将gkm-DNN应用于467个小人类和69个大人类ENCODE ChIP-seq数据集,以证明其性能,并将其与最新方法gkm-支持向量机
更新日期:2020-04-22
down
wechat
bug