当前位置:
X-MOL 学术
›
arXiv.cs.DL
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Looking Through Glass: Knowledge Discovery from Materials Science Literature using Natural Language Processing
arXiv - CS - Digital Libraries Pub Date : 2021-01-05 , DOI: arxiv-2101.01508 Vineeth Venugopal, Sourav Sahoo, Mohd Zaki, Manish Agarwal, Nitya Nand Gosvami, N. M. Anoop Krishnan
arXiv - CS - Digital Libraries Pub Date : 2021-01-05 , DOI: arxiv-2101.01508 Vineeth Venugopal, Sourav Sahoo, Mohd Zaki, Manish Agarwal, Nitya Nand Gosvami, N. M. Anoop Krishnan
Most of the knowledge in materials science literature is in the form of
unstructured data such as text and images. Here, we present a framework
employing natural language processing, which automates text and image
comprehension and precision knowledge extraction from inorganic glasses'
literature. The abstracts are automatically categorized using latent Dirichlet
allocation (LDA), providing a way to classify and search semantically linked
publications. Similarly, a comprehensive summary of images and plots are
presented using the 'Caption Cluster Plot' (CCP), which provides direct access
to the images buried in the papers. Finally, we combine the LDA and CCP with
the chemical elements occurring in the manuscript to present an 'Elemental
map', a topical and image-wise distribution of chemical elements in the
literature. Overall, the framework presented here can be a generic and powerful
tool to extract and disseminate material-specific information on
composition-structure-processing-property dataspaces, allowing insights into
fundamental problems relevant to the materials science community and
accelerated materials discovery.
中文翻译:
透过玻璃看:使用自然语言处理从材料科学文献中发现知识
材料科学文献中的大多数知识都是非结构化数据的形式,例如文本和图像。在这里,我们介绍了一个采用自然语言处理的框架,该框架可自动执行文本和图像理解以及从无机玻璃文献中提取精确知识的过程。摘要使用潜在的Dirichlet分配(LDA)自动分类,从而提供了一种对语义链接的出版物进行分类和搜索的方法。同样,使用“标题群集图”(CCP)可以显示图像和图的全面摘要,该图可直接访问埋在论文中的图像。最后,我们将LDA和CCP与手稿中出现的化学元素相结合,以呈现“元素图”,即文献中化学元素的局部分布和图像分布。总体,
更新日期:2021-01-06
中文翻译:
透过玻璃看:使用自然语言处理从材料科学文献中发现知识
材料科学文献中的大多数知识都是非结构化数据的形式,例如文本和图像。在这里,我们介绍了一个采用自然语言处理的框架,该框架可自动执行文本和图像理解以及从无机玻璃文献中提取精确知识的过程。摘要使用潜在的Dirichlet分配(LDA)自动分类,从而提供了一种对语义链接的出版物进行分类和搜索的方法。同样,使用“标题群集图”(CCP)可以显示图像和图的全面摘要,该图可直接访问埋在论文中的图像。最后,我们将LDA和CCP与手稿中出现的化学元素相结合,以呈现“元素图”,即文献中化学元素的局部分布和图像分布。总体,