Hamming OCR: A Locality Sensitive Hashing Neural Network for Scene Text Recognition,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Hamming OCR: A Locality Sensitive Hashing Neural Network for Scene Text Recognition
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-09-23 , DOI: arxiv-2009.10874
Bingcong Li, Xin Tang, Xianbiao Qi, Yihao Chen, Rong Xiao

Recently, inspired by Transformer, self-attention-based scene text recognition approaches have achieved outstanding performance. However, we find that the size of model expands rapidly with the lexicon increasing. Specifically, the number of parameters for softmax classification layer and output embedding layer are proportional to the vocabulary size. It hinders the development of a lightweight text recognition model especially applied for Chinese and multiple languages. Thus, we propose a lightweight scene text recognition model named Hamming OCR. In this model, a novel Hamming classifier, which adopts locality sensitive hashing (LSH) algorithm to encode each character, is proposed to replace the softmax regression and the generated LSH code is directly employed to replace the output embedding. We also present a simplified transformer decoder to reduce the number of parameters by removing the feed-forward network and using cross-layer parameter sharing technique. Compared with traditional methods, the number of parameters in both classification and embedding layers is independent on the size of vocabulary, which significantly reduces the storage requirement without loss of accuracy. Experimental results on several datasets, including four public benchmaks and a Chinese text dataset synthesized by SynthText with more than 20,000 characters, shows that Hamming OCR achieves competitive results.

中文翻译：

Hamming OCR：用于场景文本识别的局部敏感哈希神经网络

最近，受 Transformer 的启发，基于自我注意的场景文本识别方法取得了出色的表现。然而，我们发现模型的大小随着词典的增加而迅速扩大。具体来说，softmax 分类层和输出嵌入层的参数数量与词汇量大小成正比。它阻碍了轻量级文本识别模型的开发，尤其适用于中文和多语言。因此，我们提出了一种名为 Hamming OCR 的轻量级场景文本识别模型。在该模型中，提出了一种采用局部敏感哈希（LSH）算法对每个字符进行编码的新型汉明分类器来代替 softmax 回归，并直接使用生成的 LSH 代码代替输出嵌入。我们还提出了一个简化的转换器解码器，通过去除前馈网络和使用跨层参数共享技术来减少参数数量。与传统方法相比，分类层和嵌入层中的参数数量与词汇量的大小无关，在不损失准确性的情况下显着降低了存储需求。在几个数据集上的实验结果，包括四个公共基准和一个由 SynthText 合成的超过 20,000 个字符的中文文本数据集，表明汉明 OCR 取得了有竞争力的结果。分类和嵌入层中的参数数量与词汇量的大小无关，这在不损失准确性的情况下显着降低了存储需求。在几个数据集上的实验结果，包括四个公共基准和一个由 SynthText 合成的超过 20,000 个字符的中文文本数据集，表明汉明 OCR 取得了有竞争力的结果。分类和嵌入层中的参数数量与词汇量的大小无关，这在不损失准确性的情况下显着降低了存储需求。在几个数据集上的实验结果，包括四个公共基准和一个由 SynthText 合成的超过 20,000 个字符的中文文本数据集，表明汉明 OCR 取得了有竞争力的结果。

更新日期：2020-09-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>