当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Use Chou's 5-Steps Rule With Different Word Embedding Types to Boost Performance of Electron Transport Protein Prediction Model
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 3.6 ) Pub Date : 2020-07-21 , DOI: 10.1109/tcbb.2020.3010975
Duong Nguyen , Thai Ho-Quang , Le Nguyen Quoc Khanh , Van Dinh-Phan , Yu-Yen Ou

Living organisms receive necessary energy substances directly from cellular respiration. The completion of electron storage and transportation requires the process of cellular respiration with the aid of electron transport chains. Therefore, the work of deciphering electron transport proteins is inevitably needed. The identification of these proteins with high performance has a prompt dependence on the choice of methods for feature extraction and machine learning algorithm. In this study, protein sequences served as natural language sentences comprising words. The nominated word embedding-based feature sets, hinged on the word embedding modulation and protein motif frequencies, were useful for feature choosing. Five word embedding types and a variety of conjoint features were examined for such feature selection. The support vector machine algorithm consequentially was employed to perform classification. The performance statistics within the 5-fold cross-validation including average accuracy, specificity, sensitivity, as well as MCC rates surpass 0.95. Such metrics in the independent test are 96.82, 97.16, 95.76 percent, and 0.9, respectively. Compared to state-of-the-art predictors, the proposed method can generate more preferable performance above all metrics indicating the effectiveness of the proposed method in determining electron transport proteins. Furthermore, this study reveals insights about the applicability of various word embeddings for understanding surveyed sequences.

中文翻译:

使用 Chou 的 5 步规则和不同的词嵌入类型来提高电子传递蛋白预测模型的性能

生物体直接从细胞呼吸中获得必要的能量物质。电子的储存和运输的完成需要细胞呼吸借助电子传递链的过程。因此,不可避免地需要破译电子传递蛋白的工作。这些具有高性能的蛋白质的识别迅速依赖于特征提取和机器学习算法的方法的选择。在这项研究中,蛋白质序列用作包含单词的自然语言句子。指定的基于词嵌入的特征集,取决于词嵌入调制和蛋白质基序频率,对于特征选择很有用。针对这种特征选择,检查了五种词嵌入类型和各种联合特征。支持向量机算法因此被用于执行分类。5 折交叉验证中的性能统计数据,包括平均准确度、特异性、灵敏度以及 MCC 率,均超过 0.95。独立测试中的此类指标分别为 96.82、97.16、95.76% 和 0.9。与最先进的预测器相比,所提出的方法可以产生比所有指标更可取的性能,表明所提出的方法在确定电子传递蛋白方面的有效性。此外,这项研究揭示了关于各种词嵌入在理解调查序列方面的适用性的见解。以及 MCC 率超过 0.95。独立测试中的此类指标分别为 96.82、97.16、95.76% 和 0.9。与最先进的预测器相比,所提出的方法可以产生比所有指标更可取的性能,表明所提出的方法在确定电子传递蛋白方面的有效性。此外,这项研究揭示了关于各种词嵌入在理解调查序列方面的适用性的见解。以及 MCC 率超过 0.95。独立测试中的此类指标分别为 96.82、97.16、95.76% 和 0.9。与最先进的预测器相比,所提出的方法可以产生比所有指标更可取的性能,表明所提出的方法在确定电子传递蛋白方面的有效性。此外,这项研究揭示了关于各种词嵌入在理解调查序列方面的适用性的见解。
更新日期:2020-07-21
down
wechat
bug