Toward Predictive Chemical Deformulation Enabled by Deep Generative Neural Networks,Industrial & Engineering Chemistry Research

当前位置： X-MOL 学术 › Ind. Eng. Chem. Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Toward Predictive Chemical Deformulation Enabled by Deep Generative Neural Networks
Industrial & Engineering Chemistry Research ( IF 4.2 ) Pub Date : 2021-09-23 , DOI: 10.1021/acs.iecr.1c00634
Emre Sevgen ₁ , Edward Kim ₂ , Brendan Folie ₁ , Ventura Rivera ₁ , Jason Koeller ₁ , Emily Rosenthal ₃ , Andrea Jacobs ₃ , Julia Ling ₁

Affiliation

The design of chemical formulations is a challenging, high-dimensional problem. In typical formulations, tens of thousands of ingredients are available for use, yet only a tiny fraction end up in a given formulation. Deformulation, the problem of reverse engineering the precise amounts of each ingredient starting from just a list of ingredients, is similarly challenging but is a key capability for staying up-to-date with industry competitors. Here, we take advantage of a large, curated formulations dataset from CAS, a division of the American Chemical Society, which offers a consistent and highly structured representation of the formulations and the chemical identities of their components to show that a variational autoencoder neural network learns meaningful representations of formulations in various product classes such as antiperspirants and oral care. Furthermore, it can be used in conjunction with a two-step sampling algorithm to generate accurate ingredient amount suggestions for deformulation. Deformulation using a variational autoencoder produces estimates that are significantly more accurate than nearest neighbor methods, extrapolates better to formulations that are significantly different than previously seen formulations, and provides a way to leverage large datasets for industrially relevant capabilities.

中文翻译：

通过深度生成神经网络实现预测性化学变形

化学配方的设计是一个具有挑战性的高维问题。在典型的配方中，有数以万计的成分可供使用，但在给定的配方中只有一小部分。变形，即仅从成分列表开始对每种成分的精确数量进行逆向工程的问题，同样具有挑战性，但它是与行业竞争对手保持同步的关键能力。在这里，我们利用了来自美国化学学会的一个部门 CAS 的大型精选配方数据集，该数据集提供了配方及其成分的化学特性的一致且高度结构化的表示，以表明变分自编码器神经网络学习各种产品类别（例如止汗剂和口腔护理）中配方的有意义的表示。此外，它可以与两步采样算法结合使用，为变形生成准确的成分量建议。使用变分自编码器进行的变形产生的估计值比最近邻方法要准确得多，可以更好地外推与以前见过的公式有很大不同的公式，并提供一种利用大型数据集实现工业相关功能的方法。

更新日期：2021-10-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>