当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DECIMER—hand-drawn molecule images dataset
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2022-06-09 , DOI: 10.1186/s13321-022-00620-9
Henning Otto Brinkhaus 1 , Achim Zielesny 2 , Christoph Steinbeck 1 , Kohulan Rajan 1
Affiliation  

The translation of images of chemical structures into machine-readable representations of the depicted molecules is known as optical chemical structure recognition (OCSR). There has been a lot of progress over the last three decades in this field, but the development of systems for the recognition of complex hand-drawn structure depictions is still at the beginning. Currently, there is no data for the systematic evaluation of OCSR methods on hand-drawn structures available. Here we present DECIMER — Hand-drawn molecule images, a standardised, openly available benchmark dataset of 5088 hand-drawn depictions of diversely picked chemical structures. Every structure depiction in the dataset is mapped to a machine-readable representation of the underlying molecule. The dataset is openly available and published under the CC-BY 4.0 licence which applies very few limitations. We hope that it will contribute to the further development of the field.

中文翻译:

DECIMER—手绘分子图像数据集

将化学结构的图像转换为所描绘分子的机器可读表示称为光学化学结构识别 (OCSR)。在过去的三年中,该领域取得了很大进展,但用于识别复杂手绘结构描绘的系统的开发仍处于起步阶段。目前,没有可用的手绘结构上 OCSR 方法的系统评估数据。在这里,我们展示了 DECIMER — 手绘分子图像,这是一个标准化、公开可用的基准数据集,包含 5088 个手绘描述的不同化学结构。数据集中的每个结构描述都映射到底层分子的机器可读表示。该数据集是公开可用的,并在 CC-BY 4 下发布。0 许可证,适用的限制很少。我们希望这将有助于该领域的进一步发展。
更新日期:2022-06-10
down
wechat
bug