当前位置: X-MOL 学术arXiv.cs.DL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ArGoT: A Glossary of Terms extracted from the arXiv
arXiv - CS - Digital Libraries Pub Date : 2021-09-07 , DOI: arxiv-2109.02801
Luis BerliozUniversity of Pittsburgh

We introduce ArGoT, a data set of mathematical terms extracted from the articles hosted on the arXiv website. A term is any mathematical concept defined in an article. Using labels in the article's source code and examples from other popular math websites, we mine all the terms in the arXiv data and compile a comprehensive vocabulary of mathematical terms. Each term can be then organized in a dependency graph by using the term's definitions and the arXiv's metadata. Using both hyperbolic and standard word embeddings, we demonstrate how this structure is reflected in the text's vector representation and how they capture relations of entailment in mathematical concepts. This data set is part of an ongoing effort to align natural mathematical text with existing Interactive Theorem Prover Libraries (ITPs) of formally verified statements.

中文翻译:

ArGoT:从 arXiv 中提取的术语表

我们介绍了 ArGoT,这是从 arXiv 网站上托管的文章中提取的数学术语数据集。术语是文章中定义的任何数学概念。使用文章源代码中的标签和来自其他流行数学网站的示例,我们挖掘 arXiv 数据中的所有术语并编译出全面的数学术语词汇表。然后可以使用术语的定义和 arXiv 的元数据将每个术语组织在依赖关系图中。使用双曲线和标准词嵌入,我们展示了这种结构如何反映在文本的向量表示中,以及它们如何捕捉数学概念中的蕴涵关系。
更新日期:2021-09-08
down
wechat
bug