Review of unsupervised pretraining strategies for molecules representation,Briefings in Functional Genomics

当前位置： X-MOL 学术 › Brief. Funct. Genomics › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Review of unsupervised pretraining strategies for molecules representation
Briefings in Functional Genomics ( IF 2.5 ) Pub Date : 2021-07-09 , DOI: 10.1093/bfgp/elab036
Linhui Yu , Yansen Su , Yuansheng Liu , Xiangxiang Zeng

In recent years, the computer-assisted techniques make a great progress in the field of drug discovery. And, yet, the problem of limited labeled data problem is still challenging and also restricts the performance of these techniques in specific tasks, such as molecular property prediction, compound-protein interaction and de novo molecular generation. One effective solution is to utilize the experience and knowledge gained from other tasks to cope with related pursuits. Unsupervised pretraining is promising, due to its capability of leveraging a vast number of unlabeled molecules and acquiring a more informative molecular representation for the downstream tasks. In particular, models trained on large-scale unlabeled molecules can capture generalizable features, and this ability can be employed to improve the performance of specific downstream tasks. Many relevant pretraining works have been recently proposed. Here, we provide an overview of molecular unsupervised pretraining and related applications in drug discovery. Challenges and possible solutions are also summarized.

中文翻译：

分子表示的无监督预训练策略回顾

近年来，计算机辅助技术在药物发现领域取得了长足的进步。然而，有限标记数据问题仍然具有挑战性，也限制了这些技术在特定任务中的性能，例如分子性质预测、化合物-蛋白质相互作用和从头分子生成。一种有效的解决方案是利用从其他任务中获得的经验和知识来应对相关的追求。无监督预训练很有前景，因为它能够利用大量未标记的分子并为下游任务获取更多信息的分子表示。特别是，在大规模未标记分子上训练的模型可以捕获可概括的特征，这种能力可以用来提高特定下游任务的性能。最近提出了许多相关的预训练工作。在这里，我们概述了分子无监督预训练和药物发现中的相关应用。还总结了挑战和可能的解决方案。

更新日期：2021-07-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11