当前位置: X-MOL 学术Curr. Proteom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Useful Tool for the Identification of DNA-Binding Proteins Using Graph Convolutional Network
Current Proteomics ( IF 0.5 ) Pub Date : 2021-09-30 , DOI: 10.2174/1570164618999201210225354
Dasheng Chen 1 , Leyi Wei 2
Affiliation  

Background: DNA and protein are important components of living organisms. DNA binding protein is a helicase, which is a protein specifically responsible for binding to DNA single- stranded regions. It is a necessary component for DNA replication, recombination and repair, and plays a key role in the function of various biomolecules. Although there are already some classification prediction methods for this protein, the use of graph neural networks for this work is still limited.

Objective: The classification of unknown protein sequences into the correct categories, subcategories and families is important for biological sciences. In this article, using graph neural networks, we developed a novel predictor GCN-DBP for protein classification prediction.

Methods: Each protein sequence is treated as a document in this study, and then segment the words according to the concept of k-mer, thereby, finally achieving the purpose of segmenting the document. This research aims to use document word relationships and word co-occurrence as a corpus to construct a text graph, and then learn protein sequence information by two-layer graph convolutional networks.

Results: Finally, we tested GCN-DBP on the independent data set PDB2272, and its accuracy reached 64.17% and MCC was 28.32%. Moreover, in order to compare the proposed method with other existing methods, we have conducted more experiments.

Conclusion: The results show that the proposed method is superior to the other four methods and will be a useful tool.



中文翻译:

使用图卷积网络识别 DNA 结合蛋白的有用工具

背景:DNA 和蛋白质是生物体的重要组成部分。DNA结合蛋白是解旋酶,它是一种专门负责结合DNA单链区域的蛋白质。它是DNA复制、重组和修复的必要成分,在各种生物分子的功能中起着关键作用。虽然已经有一些针对这种蛋白质的分类预测方法,但图神经网络在这项工作中的使用仍然有限。

目的:将未知蛋白质序列分类到正确的类别、子类别和家族中对于生物科学很重要。在本文中,我们使用图神经网络开发了一种用于蛋白质分类预测的新型预测器 GCN-DBP。

方法:本研究将每个蛋白质序列作为一个文档,根据k-mer的概念对词进行分割,最终达到对文档进行分割的目的。本研究旨在以文档词关系和词共现为语料构建文本图,然后通过两层图卷积网络学习蛋白质序列信息。

结果:最后,我们在独立数据集PDB2272上测试了GCN-DBP,其准确率达到64.17%,MCC为28.32%。此外,为了将所提出的方法与其他现有方法进行比较,我们进行了更多的实验。

结论:结果表明,所提出的方法优于其他四种方法,将是一个有用的工具。

更新日期:2021-11-23
down
wechat
bug