当前位置: X-MOL 学术Eur. J. Pharm. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Enabling design of screening libraries for antibiotic discovery by modeling ChEMBL data.
European Journal of Pharmaceutical Sciences ( IF 4.6 ) Pub Date : 2019-11-27 , DOI: 10.1016/j.ejps.2019.105166
Aurijit Sarkar 1
Affiliation  

It is critical to identify novel antibiotics. Yet, the scientific community has struggled in this pursuit because we do not understand which molecules will penetrate the bacterial outer envelope. In this work, we have identified a large dataset of compounds known to reach their targets in bacterial cells (penetrators) and compared them with molecules that do not (non-penetrators). Our dataset, extracted from the ChEMBL database, is a useful tool to guide the selection of molecules for antibiotic screening. Simple random forest classification models are able to correctly identify penetrators from non-penetrators. The model demonstrated ~87% accuracy, with high precision (~88%) and recall (~97%) in identifying penetrators of Gram-positive bacteria. A paucity of data for non-penetrators was a major hurdle to model-building; we observed a ~86% negative predictive value, but only a ~57% specificity. Accumulation of data on non-penetrators is therefore necessary. Data for Gram-negative bacteria was also sparse, but a larger fraction of these data represented non-penetrators. Correspondingly, the resultant models performed well in predicting those molecules that would fail to enter Gram-negative cells, but were relatively weaker in correctly predicting penetrators. A comparison of physicochemical properties of penetrators and non-penetrators suggests only marginal differences exist. Therefore, it may be difficult to identify overarching rules for generation of screening libraries for antibiotic discovery, based purely on physicochemical properties alone. Instead, models such as ours should be of use. Our models are highly preliminary and based on phenotypic data, but a similar large dataset directly addressing accumulation of chemical matter in bacterial cells is currently unavailable. Hence, our models represent the cutting edge in design of screening libraries for antibiotic discovery until appropriate data can be compiled.

中文翻译:

通过对ChEMBL数据进行建模,可以设计用于发现抗生素的筛选库。

鉴定新型抗生素至关重要。但是,科学界一直在努力追求这一目标,因为我们不了解哪些分子会穿透细菌的外壳。在这项工作中,我们确定了一大批已知可在细菌细胞中达到目标的化合物(穿透剂),并将它们与未达到目标的分子(非穿透剂)进行了比较。我们从ChEMBL数据库中提取的数据集是指导抗生素筛选分子选择的有用工具。简单的随机森林分类模型能够正确识别非穿透者。该模型在识别革兰氏阳性细菌的穿透剂方面显示出〜87%的准确性,高精度(〜88%)和召回率(〜97%)。缺乏针对非渗透者的数据是建立模型的主要障碍;我们观察到〜86%的阴性预测值,但只有〜57%的特异性。因此,有必要积累非穿透者的数据。革兰氏阴性菌的数据也很少,但是这些数据中的很大一部分代表了非穿透性细菌。相应地,所得模型在预测那些无法进入革兰氏阴性细胞但在正确预测穿透剂方面相对较弱的分子方面表现良好。渗透剂和非渗透剂的理化性质比较表明,仅存在边际差异。因此,可能难以仅基于理化性质来确定用于发现抗生素的筛选库的总体规则。相反,应该使用像我们这样的模型。我们的模型是高度初步的,基于表型数据,但是目前尚无直接针对细菌细胞中化学物质积累的类似大型数据集。因此,我们的模型代表了用于发现抗生素的筛选库的设计中的最前沿,直到可以编译适当的数据为止。
更新日期:2019-11-28
down
wechat
bug