当前位置: X-MOL 学术Scientometrics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Math-word embedding in math search and semantic extraction
Scientometrics ( IF 3.9 ) Pub Date : 2020-06-09 , DOI: 10.1007/s11192-020-03502-9
André Greiner-Petter , Abdou Youssef , Terry Ruas , Bruce R. Miller , Moritz Schubotz , Akiko Aizawa , Bela Gipp

Word embedding, which represents individual words with semantically fixed-length vectors, has made it possible to successfully apply deep learning to natural language processing tasks such as semantic role-modeling, question answering, and machine translation. As math text consists of natural text, as well as math expressions that similarly exhibit linear correlation and contextual characteristics, word embedding techniques can also be applied to math documents. However, while mathematics is a precise and accurate science, it is usually expressed through imprecise and less accurate descriptions, contributing to the relative dearth of machine learning applications for information retrieval in this domain. Generally, mathematical documents communicate their knowledge with an ambiguous, context-dependent, and non-formal language. Given recent advances in word embedding, it is worthwhile to explore their use and effectiveness in math information retrieval tasks, such as math language processing and semantic knowledge extraction. In this paper, we explore math embedding by testing it on several different scenarios, namely, (1) math-term similarity, (2) analogy, (3) numerical concept-modeling based on the centroid of the keywords that characterize a concept, (4) math search using query expansions, and (5) semantic extraction, i.e., extracting descriptive phrases for math expressions. Due to the lack of benchmarks, our investigations were performed using the arXiv collection of STEM documents and carefully selected illustrations on the Digital Library of Mathematical Functions (DLMF: NIST digital library of mathematical functions. Release 1.0.20 of 2018-09-1, 2018). Our results show that math embedding holds much promise for similarity, analogy, and search tasks. However, we also observed the need for more robust math embedding approaches. Moreover, we explore and discuss fundamental issues that we believe thwart the progress in mathematical information retrieval in the direction of machine learning.

中文翻译:

数学搜索和语义提取中的数学词嵌入

词嵌入使用语义固定长度的向量表示单个单词,使深度学习成功应用于自然语言处理任务成为可能,例如语义角色建模、问答和机器翻译。由于数学文本由自然文本以及类似表现出线性相关性和上下文特征的数学表达式组成,因此词嵌入技术也可​​以应用于数学文档。然而,虽然数学是一门精确而准确的科学,但它通常通过不精确和不太准确的描述来表达,导致该领域信息检索的机器学习应用程序相对缺乏。通常,数学文档使用模棱两可的、依赖于上下文的和非正式的语言来传达他们的知识。鉴于词嵌入的最新进展,值得探索它们在数学信息检索任务中的使用和有效性,例如数学语言处理和语义知识提取。在本文中,我们通过在几种不同的场景中测试数学嵌入来探索数学嵌入,即(1)数学术语相似性,(2)类比,(3)基于表征概念的关键字的质心的数值概念建模, (4) 使用查询扩展的数学搜索,以及 (5) 语义提取,即提取数学表达式的描述性短语。由于缺乏基准,我们使用 arXiv 收集的 STEM 文档和精心挑选的插图在数学函数数字图书馆(DLMF:NIST 数学函数数字图书馆。2018 年 9 月 1 日发布 1.0.20, 2018)。我们的结果表明,数学嵌入对相似性、类比和搜索任务有很大的希望。然而,我们还观察到需要更强大的数学嵌入方法。此外,我们探索和讨论了我们认为阻碍数学信息检索在机器学习方向上取得进展的基本问题。
更新日期:2020-06-09
down
wechat
bug