MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics,arXiv - CS - Formal Languages and Automata Theory

当前位置： X-MOL 学术 › arXiv.cs.FL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics
arXiv - CS - Formal Languages and Automata Theory Pub Date : 2021-08-31 , DOI: arxiv-2109.00110
Kunhao Zheng, Jesse Michael Han, Stanislas Polu

We present miniF2F, a dataset of formal Olympiad-level mathematics problems statements intended to provide a unified cross-system benchmark for neural theorem proving. The miniF2F benchmark currently targets Metamath, Lean, and Isabelle and consists of 488 problem statements drawn from the AIME, AMC, and the International Mathematical Olympiad (IMO), as well as material from high-school and undergraduate mathematics courses. We report baseline results using GPT-f, a neural theorem prover based on GPT-3 and provide an analysis of its performance. We intend for miniF2F to be a community-driven effort and hope that our benchmark will help spur advances in neural theorem proving.

中文翻译：

MiniF2F：正式奥林匹克级别数学的跨系统基准

我们提出了 miniF2F，这是一个正式的奥林匹克级别数学问题陈述的数据集，旨在为神经定理证明提供统一的跨系统基准。miniF2F 基准目前针对 Metamath、Lean 和 Isabelle，由来自 AIME、AMC 和国际数学奥林匹克 (IMO) 的 488 个问题陈述以及高中和本科数学课程的材料组成。我们使用基于 GPT-3 的神经定理证明器 GPT-f 报告基线结果，并对其性能进行分析。我们打算让 miniF2F 成为社区驱动的努力，并希望我们的基准测试将有助于推动神经定理证明的进步。

更新日期：2021-09-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>