当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving VAE based molecular representations for compound property prediction
Journal of Cheminformatics ( IF 8.6 ) Pub Date : 2022-10-14 , DOI: 10.1186/s13321-022-00648-x
Ani Tevosyan 1 , Lusine Khondkaryan 2 , Hrant Khachatrian 1, 3 , Gohar Tadevosyan 2 , Lilit Apresyan 2 , Nelly Babayan 2, 4 , Helga Stopper 5 , Zaven Navoyan 4
Affiliation  

Collecting labeled data for many important tasks in chemoinformatics is time consuming and requires expensive experiments. In recent years, machine learning has been used to learn rich representations of molecules using large scale unlabeled molecular datasets and transfer the knowledge to solve the more challenging tasks with limited datasets. Variational autoencoders are one of the tools that have been proposed to perform the transfer for both chemical property prediction and molecular generation tasks. In this work we propose a simple method to improve chemical property prediction performance of machine learning models by incorporating additional information on correlated molecular descriptors in the representations learned by variational autoencoders. We verify the method on three property prediction tasks. We explore the impact of the number of incorporated descriptors, correlation between the descriptors and the target properties, sizes of the datasets etc. Finally, we show the relation between the performance of property prediction models and the distance between property prediction dataset and the larger unlabeled dataset in the representation space.

中文翻译:

改进基于 VAE 的分子表示以进行化合物性质预测

为化学信息学中的许多重要任务收集标记数据非常耗时,并且需要昂贵的实验。近年来,机器学习已被用于使用大规模未标记的分子数据集来学习丰富的分子表示,并转移知识以解决具有有限数据集的更具挑战性的任务。变分自动编码器是已被提议用于执行化学性质预测和分子生成任务的转移的工具之一。在这项工作中,我们提出了一种简单的方法,通过在变分自动编码器学习的表示中加入有关相关分子描述符的附加信息来提高机器学习模型的化学性质预测性能。我们在三个属性预测任务上验证了该方法。
更新日期:2022-10-15
down
wechat
bug