当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RanDepict: Random chemical structure depiction generator
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2022-06-06 , DOI: 10.1186/s13321-022-00609-4
Henning Otto Brinkhaus 1 , Kohulan Rajan 1 , Achim Zielesny 2 , Christoph Steinbeck 1
Affiliation  

The development of deep learning-based optical chemical structure recognition (OCSR) systems has led to a need for datasets of chemical structure depictions. The diversity of the features in the training data is an important factor for the generation of deep learning systems that generalise well and are not overfit to a specific type of input. In the case of chemical structure depictions, these features are defined by the depiction parameters such as bond length, line thickness, label font style and many others. Here we present RanDepict, a toolkit for the creation of diverse sets of chemical structure depictions. The diversity of the image features is generated by making use of all available depiction parameters in the depiction functionalities of the CDK, RDKit, and Indigo. Furthermore, there is the option to enhance and augment the image with features such as curved arrows, chemical labels around the structure, or other kinds of distortions. Using depiction feature fingerprints, RanDepict ensures diversely picked image features. Here, the depiction and augmentation features are summarised in binary vectors and the MaxMin algorithm is used to pick diverse samples out of all valid options. By making all resources described herein publicly available, we hope to contribute to the development of deep learning-based OCSR systems.

中文翻译:

RanDepict:随机化学结构描述生成器

基于深度学习的光学化学结构识别 (OCSR) 系统的发展导致了对化学结构描述数据集的需求。训练数据中特征的多样性是生成具有良好泛化能力且不会过度拟合特定类型输入的深度学习系统的重要因素。在化学结构描述的情况下,这些特征由描述参数定义,例如键长、线粗、标签字体样式等。在这里,我们介绍 RanDepict,这是一个用于创建各种化学结构描述集的工具包。通过利用 CDK、RDKit 和 Indigo 的描绘功能中的所有可用描绘参数来生成图像特征的多样性。此外,可以选择使用弯曲箭头、结构周围的化学标签或其他类型的扭曲等特征来增强和增强图像。使用描述特征指纹,RanDepict 确保多样化挑选的图像特征。在这里,描述和增强特征以二进制向量进行总结,MaxMin 算法用于从所有有效选项中挑选不同的样本。通过公开此处描述的所有资源,我们希望为基于深度学习的 OCSR 系统的开发做出贡献。描述和增强特征以二进制向量进行总结,MaxMin 算法用于从所有有效选项中挑选不同的样本。通过公开此处描述的所有资源,我们希望为基于深度学习的 OCSR 系统的开发做出贡献。描述和增强特征以二进制向量进行总结,MaxMin 算法用于从所有有效选项中挑选不同的样本。通过公开此处描述的所有资源,我们希望为基于深度学习的 OCSR 系统的开发做出贡献。
更新日期:2022-06-06
down
wechat
bug