Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training,International Journal on Document Analysis and Recognition

当前位置： X-MOL 学术 › Int. J. Doc. Anal. Recognit. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training
International Journal on Document Analysis and Recognition ( IF 1.8 ) Pub Date : 2020-11-08 , DOI: 10.1007/s10032-020-00360-2
Zelun Wang , Jyh-Charn Liu

In this paper, we propose a deep neural network model with an encoder–decoder architecture that translates images of math formulas into their LaTeX markup sequences. The encoder is a convolutional neural network that transforms images into a group of feature maps. To better capture the spatial relationships of math symbols, the feature maps are augmented with 2D positional encoding before being unfolded into a vector. The decoder is a stacked bidirectional long short-term memory model integrated with the soft attention mechanism, which works as a language model to translate the encoder output into a sequence of LaTeX tokens. The neural network is trained in two steps. The first step is token-level training using the maximum likelihood estimation as the objective function. At completion of the token-level training, the sequence-level training objective function is employed to optimize the overall model based on the policy gradient algorithm from reinforcement learning. Our design also overcomes the exposure bias problem by closing the feedback loop in the decoder during sequence-level training, i.e., feeding in the predicted token instead of the ground truth token at every time step. The model is trained and evaluated on the IM2LATEX-100 K dataset and shows state-of-the-art performance on both sequence-based and image-based evaluation metrics.

中文翻译：

使用具有序列级训练的深度神经网络将数学公式图像转换为LaTeX序列

在本文中，我们提出了一种具有编码器-解码器体系结构的深度神经网络模型，该模型可将数学公式的图像转换为其LaTeX标记序列。编码器是一个卷积神经网络，可将图像转换为一组特征图。为了更好地捕捉数学符号的空间关系，在将特征图展开为矢量之前，先对其进行2D位置编码。解码器是与软注意机制集成在一起的堆叠式双向长短期存储模型，该模型用作将编码器输出转换为LaTeX令牌序列的语言模型。分两步训练神经网络。第一步是使用最大似然估计作为目标函数的令牌级训练。完成令牌级别的培训后，基于强化学习的策略梯度算法，采用序列级训练目标函数对整体模型进行优化。我们的设计还通过在序列级训练期间关闭解码器中的反馈环路（即在每个时间步长馈入预测令牌而不是地面真实令牌）来克服曝光偏差问题。该模型在IM2LATEX-100 K数据集上进行了训练和评估，并显示了基于序列和基于图像的评估指标的最新性能。在每个时间步长输入预测令牌而不是地面真实令牌。该模型在IM2LATEX-100 K数据集上进行了训练和评估，并显示了基于序列和基于图像的评估指标的最新性能。在每个时间步长输入预测令牌而不是地面真实令牌。该模型在IM2LATEX-100 K数据集上进行了训练和评估，并显示了基于序列和基于图像的评估指标的最新性能。

更新日期：2020-11-09

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11