当前位置: X-MOL 学术Front. Environ. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predictive Modeling of Estrogen Receptor Binding Agents Using Advanced Cheminformatics Tools and Massive Public Data
Frontiers in Environmental Science ( IF 3.3 ) Pub Date : 2016-03-08 , DOI: 10.3389/fenvs.2016.00012
Kathryn Ribay 1 , Marlene T Kim 2 , Wenyi Wang 3 , Daniel Pinolini 3 , Hao Zhu 2
Affiliation  

Estrogen receptors (ERα) are a critical target for drug design as well as a potential source of toxicity when activated unintentionally. Thus, evaluating potential ERα binding agents is critical in both drug discovery and chemical toxicity areas. Using computational tools, e.g., Quantitative Structure-Activity Relationship (QSAR) models, can predict potential ERα binding agents before chemical synthesis. The purpose of this project was to develop enhanced predictive models of ERα binding agents by utilizing advanced cheminformatics tools that can integrate publicly available bioassay data. The initial ERα binding agent data set, consisting of 446 binders and 8307 non-binders, was obtained from the Tox21 Challenge project organized by the NIH Chemical Genomics Center (NCGC). After removing the duplicates and inorganic compounds, this data set was used to create a training set (259 binders and 259 non-binders). This training set was used to develop QSAR models using chemical descriptors. The resulting models were then used to predict the binding activity of 264 external compounds, which were available to us after the models were developed. The cross-validation results of training set [Correct Classification Rate (CCR) = 0.72] were much higher than the external predictivity of the unknown compounds (CCR = 0.59). To improve the conventional QSAR models, all compounds in the training set were used to search PubChem and generate a profile of their biological responses across thousands of bioassays. The most important bioassays were prioritized to generate a similarity index that was used to calculate the biosimilarity score between each two compounds. The nearest neighbors for each compound within the set were then identified and its ERα binding potential was predicted by its nearest neighbors in the training set. The hybrid model performance (CCR = 0.94 for cross validation; CCR = 0.68 for external prediction) showed significant improvement over the original QSAR models, particularly for the activity cliffs that induce prediction errors. The results of this study indicate that the response profile of chemicals from public data provides useful information for modeling and evaluation purposes. The public big data resources should be considered along with chemical structure information when predicting new compounds, such as unknown ERα binding agents.

中文翻译:

使用先进的化学信息学工具和大量公共数据对雌激素受体结合剂进行预测建模

雌激素受体 (ERα) 是药物设计的关键目标,也是无意激活时的潜在毒性来源。因此,评估潜在的 ERα 结合剂在药物发现和化学毒性领域都至关重要。使用计算工具,例如定量结构-活性关系 (QSAR) 模型,可以在化学合成之前预测潜在的 ERα 结合剂。该项目的目的是利用先进的化学信息学工具开发 ERα 结合剂的增强预测模型,这些工具可以整合公开可用的生物测定数据。最初的 ERα 结合剂数据集由 446 个结合剂和 8307 个非结合剂组成,是从 NIH 化学基因组学中心 (NCGC) 组织的 Tox21 Challenge 项目中获得的。去除重复项和无机化合物后,该数据集用于创建训练集(259 个绑定器和 259 个非绑定器)。该训练集用于使用化学描述符开发 QSAR 模型。然后将所得模型用于预测 264 种外部化合物的结合活性,在模型开发后我们可以使用这些活性。训练集的交叉验证结果[正确分类率 (CCR) = 0.72] 远高于未知化合物的外部预测性 (CCR = 0.59)。为了改进传统的 QSAR 模型,训练集中的所有化合物都用于搜索 PubChem 并生成它们在数千个生物测定中的生物反应的概况。对最重要的生物测定进行优先排序以生成相似性指数,该指数用于计算每两种化合物之间的生物相似性评分。然后识别集合中每个化合物的最近邻居,并通过其在训练集中的最近邻居预测其 ERα 结合潜力。混合模型性能(CCR = 0.94 用于交叉验证;CCR = 0.68 用于外部预测)比原始 QSAR 模型有显着改进,特别是对于导致预测错误的活动悬崖。这项研究的结果表明,来自公共数据的化学品响应概况为建模和评估目的提供了有用的信息。在预测新化合物(例如未知的 ERα 结合剂)时,应考虑公共大数据资源以及化学结构信息。混合模型性能(CCR = 0.94 用于交叉验证;CCR = 0.68 用于外部预测)比原始 QSAR 模型有显着改进,特别是对于导致预测错误的活动悬崖。这项研究的结果表明,来自公共数据的化学品响应概况为建模和评估目的提供了有用的信息。在预测新化合物(例如未知的 ERα 结合剂)时,应考虑公共大数据资源以及化学结构信息。混合模型性能(CCR = 0.94 用于交叉验证;CCR = 0.68 用于外部预测)比原始 QSAR 模型有显着改进,特别是对于导致预测错误的活动悬崖。这项研究的结果表明,来自公共数据的化学品响应概况为建模和评估目的提供了有用的信息。在预测新化合物(例如未知的 ERα 结合剂)时,应考虑公共大数据资源以及化学结构信息。这项研究的结果表明,来自公共数据的化学品响应概况为建模和评估目的提供了有用的信息。在预测新化合物(例如未知的 ERα 结合剂)时,应考虑公共大数据资源以及化学结构信息。这项研究的结果表明,来自公共数据的化学品响应概况为建模和评估目的提供了有用的信息。在预测新化合物(例如未知的 ERα 结合剂)时,应考虑公共大数据资源以及化学结构信息。
更新日期:2016-03-08
down
wechat
bug