当前位置: X-MOL 学术J. Comput. Aid. Mol. Des. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A comparison of molecular representations for lipophilicity quantitative structure-property relationships with results from the SAMPL6 logP Prediction Challenge.
Journal of Computer-Aided Molecular Design ( IF 3.5 ) Pub Date : 2020-01-13 , DOI: 10.1007/s10822-020-00279-0
Raymond Lui 1 , Davy Guan 1 , Slade Matthews 1
Affiliation  

Effective representation of a molecule is required to develop useful quantitative structure-property relationships (QSPR) for accurate prediction of chemical properties. The octanol-water partition coefficient logP, a measure of lipophilicity, is an important property for pharmacological and toxicological endpoints used in the pharmaceutical and regulatory spheres. We compare physicochemical descriptors, structural keys, and circular fingerprints in their ability to effectively represent a chemical space and characterise molecular features to correlate with lipophilicity. Exploratory landscape continuity analyses revealed that whole-molecule physicochemical descriptors could map together compounds that were similar in both molecular features and logP, indicating higher potential for use in logP QSPRs compared to the substructural approach of structural keys and circular fingerprints. Indeed, logP QSPR models parameterised by physicochemical descriptors consistently performed with the lowest error. Our best performing model was a stochastic gradient descent-optimised multilinear regression with 1438 descriptors, returning an internal benchmark RMSE of 1.03 log units. This corroborates the well-established notion that lipophilicity is an additive, whole-molecule property. We externally tested the model by participating in the 2019 SAMPL6 logP Prediction Challenge and blindly predicting for 11 protein kinase inhibitor fragment-like molecules. Our model returned an RMSE of 0.49 log units, placing eighth overall and third in the empirical methods category (submission ID 'hdpuj'). Permutation feature importance analyses revealed that physicochemical descriptors could characterise predictive molecular features highly relevant to the kinase inhibitor fragment-like molecules.

中文翻译:

亲脂性定量结构-性质关系的分子表征与来自 SAMPL6 logP 预测挑战的结果的比较。

需要分子的有效表示来开发有用的定量结构-性质关系 (QSPR),以准确预测化学性质。辛醇-水分配系数 logP 是亲脂性的衡量标准,是制药和监管领域中使用的药理学和毒理学终点的重要属性。我们比较了物理化学描述符、结构键和圆形指纹在它们有效表示化学空间和表征与亲脂性相关的分子特征的能力方面。探索性景观连续性分析表明,全分子物理化学描述符可以将分子特征和 logP 相似的化合物映射在一起,表明与结构键和圆形指纹的子结构方法相比,在 logP QSPR 中使用的潜力更大。事实上,由物理化学描述符参数化的 logP QSPR 模型始终以最低的误差执行。我们表现​​最好的模型是具有 1438 个描述符的随机梯度下降优化多线性回归,返回 1.03 log 单位的内部基准 RMSE。这证实了一个公认的观点,即亲脂性是一种可加性的全分子特性。我们通过参加 2019 SAMPL6 logP 预测挑战赛并盲目预测 11 个蛋白激酶抑制剂片段样分子,对模型进行了外部测试。我们的模型返回的 RMSE 为 0.49 log 单位,在经验方法类别(提交 ID 'hdpuj')中排名第八,第三。
更新日期:2020-04-21
down
wechat
bug