AI-driven molecular generation of not-patented pharmaceutical compounds using world open patent data,Journal of Cheminformatics

当前位置： X-MOL 学术 › J. Cheminfom. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

AI-driven molecular generation of not-patented pharmaceutical compounds using world open patent data
Journal of Cheminformatics ( IF 8.6 ) Pub Date : 2023-12-13 , DOI: 10.1186/s13321-023-00791-z
Yugo Shimizu , Masateru Ohta , Shoichi Ishida , Kei Terayama , Masanori Osawa , Teruki Honma , Kazuyoshi Ikeda

Developing compounds with novel structures is important for the production of new drugs. From an intellectual perspective, confirming the patent status of newly developed compounds is essential, particularly for pharmaceutical companies. The generation of a large number of compounds has been made possible because of the recent advances in artificial intelligence (AI). However, confirming the patent status of these generated molecules has been a challenge because there are no free and easy-to-use tools that can be used to determine the novelty of the generated compounds in terms of patents in a timely manner; additionally, there are no appropriate reference databases for pharmaceutical patents in the world. In this study, two public databases, SureChEMBL and Google Patents Public Datasets, were used to create a reference database of drug-related patented compounds using international patent classification. An exact structure search system was constructed using InChIKey and a relational database system to rapidly search for compounds in the reference database. Because drug-related patented compounds are a good source for generative AI to learn useful chemical structures, they were used as the training data. Furthermore, molecule generation was successfully directed by increasing and decreasing the number of generated patented compounds through incorporation of patent status (i.e., patented or not) into learning. The use of patent status enabled generation of novel molecules with high drug-likeness. The generation using generative AI with patent information would help efficiently propose novel compounds in terms of pharmaceutical patents. Scientific contribution: In this study, a new molecule-generation method that takes into account the patent status of molecules, which has rarely been considered but is an important feature in drug discovery, was developed. The method enables the generation of novel molecules based on pharmaceutical patents with high drug-likeness and will help in the efficient development of effective drug compounds.

中文翻译：

使用世界开放专利数据由人工智能驱动的非专利药物化合物的分子生成

开发具有新颖结构的化合物对于新药的生产非常重要。从知识的角度来看，确认新开发的化合物的专利状态至关重要，特别是对于制药公司而言。由于人工智能（AI）的最新进展，大量化合物的产生成为可能。然而，确认这些生成的分子的专利状态一直是一个挑战，因为没有免费且易于使用的工具可以用来及时确定生成的化合物在专利方面的新颖性；此外，世界上没有适当的药品专利参考数据库。在本研究中，使用 SureChEMBL 和 Google Patents Public Datasets 这两个公共数据库，利用国际专利分类创建了药物相关专利化合物的参考数据库。利用InChIKey和关系数据库系统构建了精确结构搜索系统，以在参考数据库中快速搜索化合物。由于与药物相关的专利化合物是生成式人工智能学习有用化学结构的良好来源，因此它们被用作训练数据。此外，通过将专利状态（即是否获得专利）纳入学习中，通过增加和减少生成的专利化合物的数量，成功地指导了分子生成。专利地位的使用使得能够产生具有高度药物相似性的新型分子。使用生成式人工智能和专利信息的一代将有助于有效地提出药物专利方面的新化合物。科学贡献：在这项研究中，开发了一种新的分子生成方法，该方法考虑了分子的专利状态，这种方法很少被考虑，但却是药物发现中的一个重要特征。该方法能够基于药物专利生成具有高度药物相似性的新型分子，并将有助于有效开发有效的药物化合物。

更新日期：2023-12-13

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>