当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Free and open-source QSAR-ready workflow for automated standardization of chemical structures in support of QSAR modeling
Journal of Cheminformatics ( IF 8.6 ) Pub Date : 2024-02-20 , DOI: 10.1186/s13321-024-00814-3
Kamel Mansouri , José T. Moreira-Filho , Charles N. Lowe , Nathaniel Charest , Todd Martin , Valery Tkachenko , Richard Judson , Mike Conway , Nicole C. Kleinstreuer , Antony J. Williams

The rapid increase of publicly available chemical structures and associated experimental data presents a valuable opportunity to build robust QSAR models for applications in different fields. However, the common concern is the quality of both the chemical structure information and associated experimental data. This is especially true when those data are collected from multiple sources as chemical substance mappings can contain many duplicate structures and molecular inconsistencies. Such issues can impact the resulting molecular descriptors and their mappings to experimental data and, subsequently, the quality of the derived models in terms of accuracy, repeatability, and reliability. Herein we describe the development of an automated workflow to standardize chemical structures according to a set of standard rules and generate two and/or three-dimensional “QSAR-ready” forms prior to the calculation of molecular descriptors. The workflow was designed in the KNIME workflow environment and consists of three high-level steps. First, a structure encoding is read, and then the resulting in-memory representation is cross-referenced with any existing identifiers for consistency. Finally, the structure is standardized using a series of operations including desalting, stripping of stereochemistry (for two-dimensional structures), standardization of tautomers and nitro groups, valence correction, neutralization when possible, and then removal of duplicates. This workflow was initially developed to support collaborative modeling QSAR projects to ensure consistency of the results from the different participants. It was then updated and generalized for other modeling applications. This included modification of the “QSAR-ready” workflow to generate “MS-ready structures” to support the generation of substance mappings and searches for software applications related to non-targeted analysis mass spectrometry. Both QSAR and MS-ready workflows are freely available in KNIME, via standalone versions on GitHub, and as docker container resources for the scientific community. Scientific contribution: This work pioneers an automated workflow in KNIME, systematically standardizing chemical structures to ensure their readiness for QSAR modeling and broader scientific applications. By addressing data quality concerns through desalting, stereochemistry stripping, and normalization, it optimizes molecular descriptors' accuracy and reliability. The freely available resources in KNIME, GitHub, and docker containers democratize access, benefiting collaborative research and advancing diverse modeling endeavors in chemistry and mass spectrometry.

中文翻译:

免费开源的 QSAR 就绪工作流程,用于化学结构的自动标准化,以支持 QSAR 建模

公开化学结构和相关实验数据的快速增加为构建适用于不同领域的应用的稳健 QSAR 模型提供了宝贵的机会。然而,普遍关注的是化学结构信息和相关实验数据的质量。当这些数据是从多个来源收集时尤其如此,因为化学物质图谱可能包含许多重复的结构和分子不一致。这些问题可能会影响所得的分子描述符及其与实验数据的映射,进而影响派生模型在准确性、可重复性和可靠性方面的质量。在此,我们描述了自动化工作流程的开发,以根据一组标准规则标准化化学结构,并在计算分子描述符之前生成二维和/或三维“QSAR 就绪”形式。该工作流程是在 KNIME 工作流程环境中设计的,由三个高级步骤组成。首先,读取结构编码,然后将生成的内存表示与任何现有标识符交叉引用以保持一致性。最后,使用一系列操作对结构进行标准化,包括脱盐、立体化学剥离(对于二维结构)、互变异构体和硝基的标准化、价态校正、可能的中和,然后去除重复项。该工作流程最初是为了支持协作建模 QSAR 项目而开发的,以确保不同参与者结果的一致性。然后对其进行更新并推广到其他建模应用程序。这包括修改“QSAR 就绪”工作流程,以生成“MS 就绪结构”,以支持物质映射的生成以及与非靶向分析质谱相关的软件应用程序的搜索。QSAR 和 MS-ready 工作流程都可以在 KNIME 中通过 GitHub 上的独立版本免费获得,也可以作为科学界的 docker 容器资源。科学贡献:这项工作开创了 KNIME 的自动化工作流程,系统地标准化化学结构,以确保其为 QSAR 建模和更广泛的科学应用做好准备。通过脱盐、立体化学剥离和标准化解决数据质量问题,它优化了分子描述符的准确性和可靠性。KNIME、GitHub 和 docker 容器中的免费资源使访问民主化,有利于协作研究并推进化学和质谱领域的多样化建模工作。
更新日期:2024-02-21
down
wechat
bug