当前位置: X-MOL 学术Atmos. Chem. Phys. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predicting gas–particle partitioning coefficients of atmospheric molecules with machine learning
Atmospheric Chemistry and Physics ( IF 5.2 ) Pub Date : 2021-09-06 , DOI: 10.5194/acp-21-13227-2021
Emma Lumiaro , Milica Todorović , Theo Kurten , Hanna Vehkamäki , Patrick Rinke

The formation, properties, and lifetime of secondary organic aerosols in the atmosphere are largely determined by gas–particle partitioning coefficients of the participating organic vapours. Since these coefficients are often difficult to measure and to compute, we developed a machine learning model to predict them given molecular structure as input. Our data-driven approach is based on the dataset by Wang et al. (2017), who computed the partitioning coefficients and saturation vapour pressures of 3414 atmospheric oxidation products from the Master Chemical Mechanism using the COSMOtherm programme. We trained a kernel ridge regression (KRR) machine learning model on the saturation vapour pressure (Psat) and on two equilibrium partitioning coefficients: between a water-insoluble organic matter phase and the gas phase (KWIOM/G) and between an infinitely dilute solution with pure water and the gas phase (KW/G). For the input representation of the atomic structure of each organic molecule to the machine, we tested different descriptors. We find that the many-body tensor representation (MBTR) works best for our application, but the topological fingerprint (TopFP) approach is almost as good and computationally cheaper to evaluate. Our best machine learning model (KRR with a Gaussian kernel + MBTR) predicts Psat and KWIOM/G to within 0.3 logarithmic units and KW/G to within 0.4 logarithmic units of the original COSMOtherm calculations. This is equal to or better than the typical accuracy of COSMOtherm predictions compared to experimental data (where available). We then applied our machine learning model to a dataset of 35 383 molecules that we generated based on a carbon-10 backbone functionalized with zero to six carboxyl, carbonyl, or hydroxyl groups to evaluate its performance for polyfunctional compounds with potentially low Psat. The resulting saturation vapour pressure and partitioning coefficient distributions were physico-chemically reasonable, for example, in terms of the average effects of the addition of single functional groups. The volatility predictions for the most highly oxidized compounds were in qualitative agreement with experimentally inferred volatilities of, for example, α-pinene oxidation products with as yet unknown structures but similar elemental compositions.

中文翻译:

用机器学习预测大气分子的气体-粒子分配系数

大气中二次有机气溶胶的形成、性质和寿命在很大程度上取决于参与的有机蒸气的气粒分配系数。由于这些系数通常难以测量和计算,因此我们开发了一种机器学习模型,以在给定分子结构作为输入的情况下预测它们。我们的数据驱动方法基于 Wang 等人的数据集。(2017),他使用 COSMOtherm 程序计算了来自 Master Chemical Mechanism 的 3414 种大气氧化产物的分配系数和饱和蒸气压。我们在饱和蒸汽压 ( P sat) 和两个平衡分配系数:在水不溶性有机物相和气相之间 ( K WIOM/G ) 和纯水无限稀释溶液和气相之间 ( K W/G )。对于输入到机器的每个有机分子的原子结构表示,我们测试了不同的描述符。我们发现多体张量表示 (MBTR) 最适合我们的应用程序,但拓扑指纹 (TopFP) 方法几乎同样好且计算成本更低。我们最好的机器学习模型(带有高斯核的 KRR + MBTR)预测P satK WIOM/G在 0.3 个对数单位和K以内W/G在原始 COSMOtherm 计算的 0.4 个对数单位内。与实验数据(如果可用)相比,这等于或优于 COSMOtherm 预测的典型准确度。然后,我们将我们的机器学习模型应用于我们基于碳 10 骨架生成的 35 383 个分子的数据集,该骨架被 0 到 6 个羧基、羰基或羟基官能化,以评估其对具有潜在低P sat 的多官能化合物的性能. 所得饱和蒸气压和分配系数分布在物理化学上是合理的,例如,就添加单个官能团的平均效果而言。对最高度氧化的化合物的挥发性预测与实验推断的挥发性一致,例如,α-蒎烯氧化产物的结构尚未知,但元素组成相似。
更新日期:2021-09-06
down
wechat
bug