当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
YoDe-Segmentation: automated noise-free retrieval of molecular structures from scientific publications
Journal of Cheminformatics ( IF 8.6 ) Pub Date : 2023-11-20 , DOI: 10.1186/s13321-023-00783-z
Chong Zhou 1 , Wei Liu 1 , Xiyue Song 1 , Mengling Yang 1 , Xiaowang Peng 1
Affiliation  

In chemistry-related disciplines, a vast repository of molecular structural data has been documented in scientific publications but remains inaccessible to computational analyses owing to its non-machine-readable format. Optical chemical structure recognition (OCSR) addresses this gap by converting images of chemical molecular structures into a format accessible to computers and convenient for storage, paving the way for further analyses and studies on chemical information. A pivotal initial step in OCSR is automating the noise-free extraction of molecular descriptions from literature. Despite efforts utilising rule-based and deep learning approaches for the extraction process, the accuracy achieved to date is unsatisfactory. To address this issue, we introduce a deep learning model named YoDe-Segmentation in this study, engineered for the automated retrieval of molecular structures from scientific documents. This model operates via a three-stage process encompassing detection, mask generation, and calculation. Initially, it identifies and isolates molecular structures during the detection phase. Subsequently, mask maps are created based on these isolated structures in the mask generation stage. In the final calculation stage, refined and separated mask maps are combined with the isolated molecular structure images, resulting in the acquisition of pure molecular structures. Our model underwent rigorous testing using texts from multiple chemistry-centric journals, with the outcomes subjected to manual validation. The results revealed the superior performance of YoDe-Segmentation compared to alternative algorithms, documenting an average extraction efficiency of 97.62%. This outcome not only highlights the robustness and reliability of the model but also suggests its applicability on a broad scale.

中文翻译:

YoDe-Segmentation:从科学出版物中自动无噪声检索分子结构

在化学相关学科中,科学出版物中记录了大量分子结构数据,但由于其非机器可读格式,仍然无法进行计算分析。光学化学结构识别(OCSR)通过将化学分子结构图像转换为计算机可访问且便于存储的格式来弥补这一差距,为化学信息的进一步分析和研究铺平道路。OCSR 的关键第一步是自动从文献中无噪声提取分子描述。尽管人们努力利用基于规则的深度学习方法进行提取过程,但迄今为止所达到的准确性仍不能令人满意。为了解决这个问题,我们在本研究中引入了一种名为 YoDe-Segmentation 的深度学习模型,该模型旨在从科学文档中自动检索分子结构。该模型通过包含检测、掩模生成和计算的三阶段过程进行操作。最初,它在检测阶段识别并分离分子结构。随后,在掩模生成阶段基于这些隔离结构创建掩模图。在最后的计算阶段,将细化和分离的掩模图与分离的分子结构图像相结合,从而获得纯分子结构。我们的模型使用来自多个以化学为中心的期刊的文本进行了严格的测试,结果经过手动验证。结果显示,与其他算法相比,YoDe-Segmentation 具有卓越的性能,平均提取效率为 97.62%。这一结果不仅凸显了该模型的稳健性和可靠性,而且表明了其广泛的适用性。
更新日期:2023-11-22
down
wechat
bug