Improving the generative performance of chemical autoencoders through transfer learning,Machine Learning: Science and Technology

当前位置： X-MOL 学术 › Mach. Learn. Sci. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improving the generative performance of chemical autoencoders through transfer learning
Machine Learning: Science and Technology ( IF 6.3 ) Pub Date : 2020-10-09 , DOI: 10.1088/2632-2153/abae75
Nicholae C Iovanac , Brett Matthew Savoie

Generative models are a sub-class of machine learning models that are capable of generating new samples with a target set of properties. In chemical and materials applications, these new samples might be drug targets, novel semiconductors, or catalysts constrained to exhibit an application-specific set of properties. Given their potential to yield high-value targets from otherwise intractable design spaces, generative models are currently under intense study with respect to how predictions can be improved through changes in model architecture and data representation. Here we explore the potential of multi-task transfer learning as a complementary approach to improving the validity and property specificity of molecules generated by such models. We have compared baseline generative models trained on a single property prediction task against models trained on additional ancillary prediction tasks and observe a generic positive impact on the validity and specificity of the multi-task models. In particular, we observe that the validity of generated structures is strongly affected by whether or not the models have chemical property data, as opposed to only syntactic structural data, supplied during learning. We demonstrate this effect in both interpolative and extrapolative scenarios (i.e., where the generative targets are poorly represented in training data) for models trained to generate high energy structures and models trained to generated structures with targeted bandgaps within certain ranges. In both instances, the inclusion of additional chemical property data improves the ability of models to generate valid, unique structures with increased property specificity. This approach requires only minor alterations to existing generative models, in many cases leveraging prediction frameworks already native to these models. Additionally, the transfer learning strategy is complementary to ongoing efforts to improve model architectures and data representation and can foreseeably be stacked on top of these developments.

中文翻译：

通过迁移学习提高化学自编码器的生成性能

生成模型是机器学习模型的一个子类，能够生成具有目标属性集的新样本。在化学和材料应用中，这些新样品可能是药物靶标、新型半导体或催化剂，它们被限制为表现出一组特定于应用的特性。鉴于它们有可能从难以处理的设计空间中产生高价值目标，生成模型目前正在深入研究如何通过模型架构和数据表示的变化来改进预测。在这里，我们探索了多任务迁移学习作为提高此类模型生成的分子的有效性和特性特异性的补充方法的潜力。我们将在单个属性预测任务上训练的基线生成模型与在附加辅助预测任务上训练的模型进行了比较，并观察到对多任务模型的有效性和特异性的一般积极影响。特别是，我们观察到生成结构的有效性受到模型是否具有化学性质数据的强烈影响，而不仅仅是在学习期间提供的句法结构数据。我们在插值和外推场景（即，生成目标在训练数据中很少代表）中证明了这种效果，用于训练生成高能量结构的模型和训练生成具有特定范围内的目标带隙的结构的模型。在这两种情况下，包含额外的化学特性数据提高了模型生成有效、独特结构的能力，并具有更高的特性特异性。这种方法只需要对现有的生成模型进行微小的改动，在许多情况下利用这些模型已经原生的预测框架。此外，迁移学习策略是对改进模型架构和数据表示的持续努力的补充，并且可以预见地叠加在这些发展之上。

更新日期：2020-10-09

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文