当前位置: X-MOL 学术Mol. Syst. Des. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Less may be more: an informed reflection on molecular descriptors for drug design and discovery
Molecular Systems Design & Engineering ( IF 3.6 ) Pub Date : 2019-11-08 , DOI: 10.1039/c9me00109c
Trent Barnard 1, 2, 3, 4 , Harry Hagan 1, 2, 3, 4 , Steven Tseng 1, 2, 3, 4 , Gabriele C. Sosso 1, 2, 3, 4
Affiliation  

The phenomenal advances of machine learning in the context of drug design and discovery have led to the development of a plethora of molecular descriptors. In fact, many of these “standard” descriptors are now readily available via open source, easy-to-use computational tools. As a result, it is not uncommon to take advantage of large numbers – up to thousands in some cases – of these descriptors to predict the functional properties of drug-like molecules. This “strength in numbers” approach does usually provide excellent flexibility – and thus, good numerical accuracy – to the machine learning framework of choice; however, it suffers from a lack of transparency, in that it becomes very challenging to pinpoint the – usually, few – descriptors that are playing a key role in determining the functional properties of a given molecule. In this work, we show that just a handful of well-tailored molecular descriptors may often be capable to predict the functional properties of drug-like molecules with an accuracy comparable to that obtained by using hundreds of standard descriptors. In particular, we apply feature selection and genetic algorithms to in-house descriptors we have developed building on junction trees and symmetry functions, respectively. We find that information from as few as 10–20 molecular fragments is often enough to predict with decent accuracy even complex biomedical activities. In addition, we demonstrate that the usage of small sets of optimised symmetry functions may pave the way towards the prediction of the physical properties of drugs in their solid phases – a pivotal challenge for the pharmaceutical industry. Thus, this work brings strong arguments in support of the usage of small numbers of selected descriptors to discover the structure–function relation of drug-like molecules – as opposed to blindly leveraging the flexibility of the thousands of molecular descriptors currently available.

中文翻译:

更少可能更多:对药物设计和发现的分子描述符的有见地的反思

在药物设计和发现的背景下,机器学习的惊人进步导致了众多分子描述符的发展。实际上,许多“标准”描述符现在可以通过开源,易于使用的计算工具。因此,利用大量(有时多达数千个)这些描述符来预测类药物分子的功能特性并不少见。这种“数字强度”方法通常确实为所选的机器学习框架提供了出色的灵活性,从而提供了良好的数值准确性。但是,它缺乏透明性,因为很难准确地确定通常在确定给定分子功能特性中起关键作用的描述符(通常很少)。在这项工作中,我们表明,只有少数精心设计的分子描述符可能经常能够以与使用数百种标准描述符所获得的准确性相当的准确度来预测类药物分子的功能特性。特别是,我们将特征选择和遗传算法应用于我们分别在结点树和对称函数基础上开发的内部描述符。我们发现,只有10到20个分子片段的信息通常足以准确预测甚至复杂的生物医学活动。此外,我们证明了使用少量优化的对称函数集可能为预测药物固相的物理特性铺平了道路-这是制药行业的一项重要挑战。因此,
更新日期:2019-11-08
down
wechat
bug