当前位置: X-MOL 学术J. Chem. Inf. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving Molecular Contrastive Learning via Faulty Negative Mitigation and Decomposed Fragment Contrast
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2022-05-31 , DOI: 10.1021/acs.jcim.2c00495
Yuyang Wang 1 , Rishikesh Magar 1 , Chen Liang 2 , Amir Barati Farimani 1, 2
Affiliation  

Deep learning has been a prevalence in computational chemistry and widely implemented in molecular property predictions. Recently, self-supervised learning (SSL), especially contrastive learning (CL), has gathered growing attention for the potential to learn molecular representations that generalize to the gigantic chemical space. Unlike supervised learning, SSL can directly leverage large unlabeled data, which greatly reduces the effort to acquire molecular property labels through costly and time-consuming simulations or experiments. However, most molecular SSL methods borrow the insights from the machine learning community but neglect the unique cheminformatics (e.g., molecular fingerprints) and multilevel graphical structures (e.g., functional groups) of molecules. In this work, we propose iMolCLR, improvement of Molecular Contrastive Learning of Representations with graph neural networks (GNNs) in two aspects: (1) mitigating faulty negative contrastive instances via considering cheminformatics similarities between molecule pairs and (2) fragment-level contrasting between intramolecule and intermolecule substructures decomposed from molecules. Experiments have shown that the proposed strategies significantly improve the performance of GNN models on various challenging molecular property predictions. In comparison to the previous CL framework, iMolCLR demonstrates an averaged 1.2% improvement of ROC-AUC on eight classification benchmarks and an averaged 10.1% decrease of the error on six regression benchmarks. On most benchmarks, the generic GNN pretrained by iMolCLR rivals or even surpasses supervised learning models with sophisticated architectures and engineered features. Further investigations demonstrate that representations learned through iMolCLR intrinsically embed scaffolds and functional groups that can reason molecule similarities.

中文翻译:

通过错误的负缓解和分解的片段对比改进分子对比学习

深度学习在计算化学中很流行,并在分子性质预测中得到广泛应用。最近,自我监督学习 (SSL),尤其是对比学习 (CL),由于学习泛化到巨大化学空间的分子表示的潜力而引起了越来越多的关注。与监督学习不同,SSL 可以直接利用大量未标记的数据,从而大大减少了通过昂贵且耗时的模拟或实验来获取分子特性标签的工作量。然而,大多数分子 SSL 方法借鉴了机器学习社区的见解,但忽略了分子的独特化学信息学(例如分子指纹)和多级图形结构(例如官能团)。在这项工作中,我们提出 iMolCLR,i提高R分子对比增益_ _图神经网络(GNN)在两个方面的表示:(1)通过考虑分子对之间的化学信息学相似性和(2)从分子分解的分子内和分子间亚结构之间的片段级对比来减轻错误的负对比实例。实验表明,所提出的策略显着提高了 GNN 模型在各种具有挑战性的分子特性预测方面的性能。与之前的 CL 框架相比,iMolCLR 在 8 个分类基准上平均提高了 1.2% 的 ROC-AUC,在 6 个回归基准上平均降低了 10.1% 的误差。在大多数基准测试中,由 iMolCLR 预训练的通用 GNN 可与甚至超过具有复杂架构和工程特征的监督学习模型相媲美。
更新日期:2022-05-31
down
wechat
bug