当前位置: X-MOL 学术Chem. Rev. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Information Retrieval and Text Mining Technologies for Chemistry
Chemical Reviews ( IF 51.4 ) Pub Date : 2017-05-05 00:00:00 , DOI: 10.1021/acs.chemrev.6b00851
Martin Krallinger 1 , Obdulia Rabal 2 , Anália Lourenço 3, 4, 5 , Julen Oyarzabal 2 , Alfonso Valencia 6, 7, 8
Affiliation  

Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.

中文翻译:

化学信息检索和文本挖掘技术

来自不同化学学科的研究人员和专利律师迫切需要有效访问科学文献,专利,技术报告或网络中包含的化学信息。在大多数情况下,重要化学信息的检索始于查找特定化合物或家族的相关文件。定向检索化学文件与文本中化学实体的自动识别紧密相关,该自动识别通常涉及提取文件中提到的化学物的整个列表,包括所有相关信息。在此评论中,我们提供了满足这些信息需求的基本概念,技术实现和当前技术的全面而深入的描述。社区关注的重点是解决系统性能的挑战,尤其是分别针对BioCreative IV和V的CHEMDNER和CHEMDNER专利任务。考虑到对集成化学信息和生物学数据的自动注释化学知识库的构建的兴趣日益浓厚,还提出了将提取的化学名称映射到化学结构及其后续注释的化学信息学方法,以及将化学与生物学信息联系起来的文本挖掘应用程序。最后,作为该新兴领域研究的路线图,强调了未来的趋势和当前的挑战。考虑到对集成化学信息和生物学数据的自动注释化学知识库的构建的兴趣日益浓厚,还提出了将提取的化学名称映射到化学结构及其后续注释的化学信息学方法,以及将化学与生物学信息联系起来的文本挖掘应用程序。最后,作为该新兴领域研究的路线图,强调了未来的趋势和当前的挑战。考虑到对集成化学信息和生物学数据的自动注释化学知识库的构建的兴趣日益浓厚,还提出了将提取的化学名称映射到化学结构及其后续注释的化学信息学方法,以及将化学与生物学信息联系起来的文本挖掘应用程序。最后,作为该新兴领域研究的路线图,强调了未来的趋势和当前的挑战。
更新日期:2017-05-05
down
wechat
bug