当前位置: X-MOL 学术Sci. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automated BigSMILES conversion workflow and dataset for homopolymeric macromolecules
Scientific Data ( IF 9.8 ) Pub Date : 2024-04-11 , DOI: 10.1038/s41597-024-03212-4
Sunho Choi , Joonbum Lee , Jangwon Seo , Sung Won Han , Sang Hyun Lee , Ji-Hun Seo , Junhee Seok

The simplified molecular-input line-entry system (SMILES) has been utilized in a variety of artificial intelligence analyses owing to its capability of representing chemical structures using line notation. However, its ease of representation is limited, which has led to the proposal of BigSMILES as an alternative method suitable for the representation of macromolecules. Nevertheless, research on BigSMILES remains limited due to its preprocessing requirements. Thus, this study proposes a conversion workflow of BigSMILES, focusing on its automated generation from SMILES representations of homopolymers. BigSMILES representations for 4,927,181 records are provided, thereby enabling its immediate use for various research and development applications. Our study presents detailed descriptions on a validation process to ensure the accuracy, interchangeability, and robustness of the conversion. Additionally, a systematic overview of utilized codes and functions that emphasizes their relevance in the context of BigSMILES generation are produced. This advancement is anticipated to significantly aid researchers and facilitate further studies in BigSMILES representation, including potential applications in deep learning and further extension to complex structures such as copolymers.



中文翻译:

均聚物大分子的自动化 BigSMILES 转换工作流程和数据集

简化分子输入行输入系统(SMILES)由于能够使用行符号表示化学结构,因此已被用于各种人工智能分析。然而,其表示的简便性受到限制,这导致了 BigSMILES 作为适合表示大分子的替代方法的提议。然而,由于其预处理要求,对 BigSMILES 的研究仍然有限。因此,本研究提出了 BigSMILES 的转换工作流程,重点关注其从均聚物的 SMILES 表示自动生成。提供了 4,927,181 条记录的 BigSMILES 表示,从而使其能够立即用于各种研究和开发应用程序。我们的研究详细描述了验证过程,以确保转换的准确性、可互换性和稳健性。此外,还对所使用的代码和功能进行了系统概述,强调了它们在 BigSMILES 生成环境中的相关性。这一进展预计将极大地帮助研究人员并促进 BigSMILES 表示的进一步研究,包括深度学习中的潜在应用以及对共聚物等复杂结构的进一步扩展。

更新日期:2024-04-13
down
wechat
bug