当前位置: X-MOL 学术Earth Sci. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automatic spatiotemporal and semantic information extraction from unstructured geoscience reports using text mining techniques
Earth Science Informatics ( IF 2.7 ) Pub Date : 2020-09-19 , DOI: 10.1007/s12145-020-00527-9
Qinjun Qiu , Zhong Xie , Liang Wu , Liufeng Tao

A large number of georeferenced quantitative data about rock and geoscience surveys are buried in geological documents and remain unused. Data analytics and information extraction offer opportunities to use this data for improved understanding of ore forming processes and to enhance our knowledge. Extracting spatiotemporal and semantic information from a set of geological documents enables us to develop a rich representation of the geoscience knowledge recorded in unstructured text written in Chinese. This paper presents the workflow for spatiotemporal and semantic information extraction, which is a geological document analysis approach that uses automated techniques for browsing and searching relevant geological content. The developed workflow applies spatial and temporal gazetteer matching, pattern-based rules and spatiotemporal relationship extraction to identify and label terms in geological text documents. It offers a representation of contextual information in knowledge graph form, extracts a set of relevant tables and figures, and queries a list of relevant documents by using geological topic information. Here, text mining techniques are used to facilitate the analysis of geological knowledge and to show the effectiveness of text analysis for improving the rapid assessment of a massive number of documents. Furthermore, autogenerated keyword suggestions derived from extracted keyword associations are used to reduce document search efforts. This research illustrates the usefulness and effectiveness of the developed information extraction workflow and demonstrates the potential of incorporating text mining and NLP techniques for geoscience.



中文翻译:

使用文本挖掘技术从非结构化地球科学报告中自动提取时空和语义信息

有关岩石和地球科学勘测的大量地理参考定量数据被掩埋在地质文件中,并且仍未使用。数据分析和信息提取提供了使用这些数据的机会,以更好地理解成矿过程并增强我们的知识。从一组地质文件中提取时空信息和语义信息,使我们能够丰富地表达以中文书写的非结构化文本中记录的地球科学知识。本文介绍了时空和语义信息提取的工作流,这是一种地质文档分析方法,它使用自动化技术来浏览和搜索相关的地质内容。开发的工作流程应用了时空地名索引匹配,基于模式的规则和时空关系提取,以识别和标记地质文本文档中的术语。它以知识图形式表示上下文信息,提取一组相关的表格和图形,并使用地质主题信息查询相关文档的列表。在这里,文本挖掘技术被用来促进地质知识的分析,并显示出文本分析在改善快速评估大量文件方面的有效性。此外,从提取的关键字关联中派生的自动生成的关键字建议可用于减少文档搜索工作。

更新日期:2020-09-20
down
wechat
bug