当前位置: X-MOL 学术Mach. Learn. Sci. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Advances in scientific literature mining for interpreting materials characterization
Machine Learning: Science and Technology ( IF 6.013 ) Pub Date : 2021-07-13 , DOI: 10.1088/2632-2153/abf751
Gilchan Park , Line Pouchard

Using synchrotron light sources, such as the National Synchrotron Light Source II at Brookhaven National Laboratory, scientists in fields as diverse as physics, biology, and materials science, identify the atomic structure, chemical composition, or other important properties of varied specimens. x-ray spectroscopy from light sources is particularly valuable for materials research with vast information available about reference spectra in the scientific literature. However, as the technique is applicable to many science domains, searching for information about select x-ray spectroscopy spectra is impeded by the sheer number of publications. Moreover, useful information about the context of an experiment or figures presented in papers can be buried among the details, which takes time to assess. This work presents a scientific literature mining system that supports data acquisition, information extraction, and user interaction for referencing x-ray spectra identification and spectral interpretation. The goal is to provide efficient access to useful spectral data to researchers who may spend only a few days at a synchrotron light source. With this system, users browse a classification tree for papers arranged according to x-ray spectroscopic methods, chemical elements, and x-ray absorption spectroscopy edges. Relevant figures are extracted with sentences from the paper that explain them, known as ‘figure explanatory text.’ Notably, this system focuses on semantic aspects (logical analysis) to find figure explanatory text using deep contextualized word embeddings techniques and contains an interface to obtain labeled data from domain experts that is used to evaluate and improve the model.



中文翻译:

用于解释材料特性的科学文献挖掘进展

使用同步加速器光源,例如布鲁克海文国家实验室的国家同步加速器光源 II,物理学、生物学和材料科学等不同领域的科学家可以确定各种标本的原子结构、化学成分或其他重要特性。来自光源的 X 射线光谱对于材料研究特别有价值,因为科学文献中提供了大量关于参考光谱的信息。然而,由于该技术适用于许多科学领域,因此大量出版物阻碍了有关选定 X 射线光谱的信息的搜索。此外,关于实验背景或论文中的数字的有用信息可能隐藏在细节中,这需要时间来评估。这项工作提出了一个科学文献挖掘系统,该系统支持数据采集、信息提取和用户交互,用于参考 X 射线光谱识别和光谱解释。目标是为可能只在同步加速器光源下呆几天的研究人员提供对有用光谱数据的有效访问。使用该系统,用户可以浏览根据 X 射线光谱方法、化学元素和 X 射线吸收光谱边缘排列的论文分类树。用论文中解释它们的句子提取相关图形,称为“图形说明文本”。尤其,

更新日期:2021-07-13
down
wechat
bug