TFIDF-Random Forest: Prediction of Aptamer-Protein Interacting Pairs,IEEE/ACM Transactions on Computational Biology and Bioinformatics

当前位置： X-MOL 学术 › IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

TFIDF-Random Forest: Prediction of Aptamer-Protein Interacting Pairs
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 3.6 ) Pub Date : 2021-07-26 , DOI: 10.1109/tcbb.2021.3098709
Eugene Uwiragiye ₁ , Kristen L. Rhinehardt ₁

Affiliation

Aptamers are short, single-stranded oligonucleotides or peptides generated from in vitro selection to selectively bind with various molecules. Due to their molecular recognition capability for proteins, aptamers are becoming promising reagents in new drug development. Aptamers can fold into specific spatial configuration that bind to certain targets with extremely high specificity. The ability of aptamers to reversibly bind proteins has generated increasing interest in using them to facilitate controlled release of therapeutic biomolecules. In-vitro selection experiments to produce the aptamer-protein binding pairs is very complex and MD/MM in-silico experiments can be computationally expensive. In this study, we introduce a natural language processing approach for data-driven computational selection. We compared our method to the sequential model with the embedding layer, applied in the literature. We transformed the DNA/RNA and protein sequences into text format using a sliding window approach. This methodology showed that efficiency was notably higher than those observed from the literature. This indicates that our preliminary model has marked improvement over previous models which brings us closer to a data-driven computational selection method.

中文翻译：

TFIDF-随机森林：适体-蛋白质相互作用对的预测

适体是通过体外选择产生的短的单链寡核苷酸或肽，选择性地与各种分子结合。由于其对蛋白质的分子识别能力，适体正在成为新药开发中很有前途的试剂。适体可以折叠成特定的空间构型，以极高的特异性与某些靶标结合。适体可逆地结合蛋白质的能力引起了人们越来越多的兴趣，利用它们来促进治疗性生物分子的受控释放。产生适体-蛋白质结合对的体外选择实验非常复杂，MD/MM 计算机实验的计算成本可能很高。在这项研究中，我们引入了一种用于数据驱动的计算选择的自然语言处理方法。我们将我们的方法与文献中应用的带有嵌入层的顺序模型进行了比较。我们使用滑动窗口方法将 DNA/RNA 和蛋白质序列转换为文本格式。该方法表明效率明显高于文献中观察到的效率。这表明我们的初步模型比以前的模型有了显着的改进，这使我们更接近数据驱动的计算选择方法。

更新日期：2021-07-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文