当前位置: X-MOL 学术arXiv.cs.DL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Looking Through Glass: Knowledge Discovery from Materials Science Literature using Natural Language Processing
arXiv - CS - Digital Libraries Pub Date : 2021-01-05 , DOI: arxiv-2101.01508
Vineeth Venugopal, Sourav Sahoo, Mohd Zaki, Manish Agarwal, Nitya Nand Gosvami, N. M. Anoop Krishnan

Most of the knowledge in materials science literature is in the form of unstructured data such as text and images. Here, we present a framework employing natural language processing, which automates text and image comprehension and precision knowledge extraction from inorganic glasses' literature. The abstracts are automatically categorized using latent Dirichlet allocation (LDA), providing a way to classify and search semantically linked publications. Similarly, a comprehensive summary of images and plots are presented using the 'Caption Cluster Plot' (CCP), which provides direct access to the images buried in the papers. Finally, we combine the LDA and CCP with the chemical elements occurring in the manuscript to present an 'Elemental map', a topical and image-wise distribution of chemical elements in the literature. Overall, the framework presented here can be a generic and powerful tool to extract and disseminate material-specific information on composition-structure-processing-property dataspaces, allowing insights into fundamental problems relevant to the materials science community and accelerated materials discovery.

中文翻译:

透过玻璃看:使用自然语言处理从材料科学文献中发现知识

材料科学文献中的大多数知识都是非结构化数据的形式,例如文本和图像。在这里,我们介绍了一个采用自然语言处理的框架,该框架可自动执行文本和图像理解以及从无机玻璃文献中提取精确知识的过程。摘要使用潜在的Dirichlet分配(LDA)自动分类,从而提供了一种对语义链接的出版物进行分类和搜索的方法。同样,使用“标题群集图”(CCP)可以显示图像和图的全面摘要,该图可直接访问埋在论文中的图像。最后,我们将LDA和CCP与手稿中出现的化学元素相结合,以呈现“元素图”,即文献中化学元素的局部分布和图像分布。总体,
更新日期:2021-01-06
down
wechat
bug