当前位置: X-MOL 学术Int. J. Doc. Anal. Recognit. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Are 2D-LSTM really dead for offline text recognition?
International Journal on Document Analysis and Recognition ( IF 2.3 ) Pub Date : 2019-06-06 , DOI: 10.1007/s10032-019-00325-0
Bastien Moysset , Ronaldo Messina

There is a recent trend in handwritten text recognition with deep neural networks to replace 2D recurrent layers with 1D and in some cases even completely remove the recurrent layers, relying on simple feed-forward convolutional-only architectures. The most used type of recurrent layer is the long short-term memory (LSTM). The motivations to do so are many: there are few open-source implementations of 2D-LSTM, even fewer supporting GPU implementations (currently cuDNN only implements 1D-LSTM); 2D recurrences reduce the amount of computations that can be parallelized and thus possibly increase the training/inference time; recurrences create global dependencies with respect to the input, and sometimes this may not be desirable. Many recent competitions were won by systems that employed networks that use 2D-LSTM layers. Most previous works that compared 1D or pure feed-forward architectures to 2D recurrent models have done so on simple datasets or did not fully optimize the “baseline” 2D model compared to the challenger model, which was dully optimized. In this work, we aim at a fair comparison between 2D and competing models and also extensively evaluate them on more complex datasets that are more representative of challenging “real-world” data, compared to “academic” datasets that are more restricted in their complexity. We aim at determining when and why the 1D and 2D recurrent models have different results. We also compare the results with a language model to assess if linguistic constraints do level the performance of the different networks. Our results show that for challenging datasets, 2D-LSTM networks still seem to provide the highest performances and we propose a visualization strategy to explain it.

中文翻译:

2D-LSTM对于脱机文本识别真的死了吗?

依靠深度​​神经网络的手写文本识别技术最近出现了一种趋势,即依靠简单的仅前馈卷积体系结构,用1D替换2D递归层,在某些情况下甚至完全删除了递归层。递归层最常用的类型是长短期记忆(LSTM)。这样做的动机很多:2D-LSTM的开源实现很少,支持GPU的实现更少(当前cuDNN仅实现1D-LSTM);2D递归减少了可以并行化的计算量,因此可能会增加训练/推理时间;重复创建相对于输入的全局依赖性,有时这可能是不希望的。使用采用2D-LSTM层的网络的系统赢得了许多最近的竞争。以前的大多数将一维或纯前馈体系结构与二维递归模型进行比较的工作,都是在简单的数据集上完成的,或者与完全优化的挑战者模型相比,并没有完全优化“基线”二维模型。在这项工作中,我们的目标是在2D模型和竞争模型之间进行公平的比较,并在更复杂的数据集上进行广泛评估,这些数据集更能代表具有挑战性的“现实世界”数据,而在复杂性方面,“学术”数据集的复杂性受到更大限制。我们旨在确定何时以及为什么一维和二维递归模型具有不同的结果。我们还将结果与语言模型进行比较,以评估语言约束条件是否确实可以提高不同网络的性能。我们的结果表明,对于具有挑战性的数据集,
更新日期:2019-06-06
down
wechat
bug