当前位置: X-MOL 学术Comput. Biol. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A comprehensive comparison of molecular feature representations for use in predictive modeling
Computers in Biology and Medicine ( IF 7.7 ) Pub Date : 2021-01-09 , DOI: 10.1016/j.compbiomed.2020.104197
Tomaž Stepišnik 1 , Blaž Škrlj 1 , Jörg Wicker 2 , Dragi Kocev 3
Affiliation  

Machine learning methods are commonly used for predicting molecular properties to accelerate material and drug design. An important part of this process is deciding how to represent the molecules. Typically, machine learning methods expect examples represented by vectors of values, and many methods for calculating molecular feature representations have been proposed. In this paper, we perform a comprehensive comparison of different molecular features, including traditional methods such as fingerprints and molecular descriptors, and recently proposed learnable representations based on neural networks. Feature representations are evaluated on 11 benchmark datasets, used for predicting properties and measures such as mutagenicity, melting points, activity, solubility, and IC50. Our experiments show that several molecular features work similarly well over all benchmark datasets. The ones that stand out most are Spectrophores, which give significantly worse performance than other features on most datasets. Molecular descriptors from the PaDEL library seem very well suited for predicting physical properties of molecules. Despite their simplicity, MACCS fingerprints performed very well overall. The results show that learnable representations achieve competitive performance compared to expert based representations. However, task-specific representations (graph convolutions and Weave methods) rarely offer any benefits, even though they are computationally more demanding. Lastly, combining different molecular feature representations typically does not give a noticeable improvement in performance compared to individual feature representations.



中文翻译:

用于预测模型的分子特征表示的全面比较

机器学习方法通​​常用于预测分子特性以加速材料和药物设计。该过程的重要部分是决定如何表示分子。通常,机器学习方法期望值向量表示的示例,并且已经提出了许多计算分子特征表示的方法。在本文中,我们对不同分子特征进行了全面比较,包括指纹和分子描述符等传统方法,以及最近基于神经网络提出的可学习表示形式。在11个基准数据集上评估了特征表示,用于预测特性和度量,例如诱变性,熔点,活性,溶解度和IC50。我们的实验表明,几种分子特征在所有基准数据集上的工作原理相似。最突出的是光谱,其性能比大多数数据集上的其他功能明显差。PaDEL库中的分子描述符似乎非常适合预测分子的物理性质。尽管它们很简单,但MACCS指纹总体上还是表现出色。结果表明,与基于专家的表示相比,可学习的表示具有竞争优势。但是,任务特定的表示形式(图形卷积和Weave方法)很少提供任何好处,即使它们在计算上要求更高。最后,与单独的特征表示相比,组合不同的分子特征表示通常不会显着改善性能。

更新日期:2021-01-10
down
wechat
bug