当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Precise Detection of Chinese Characters in Historical Documents with Deep Reinforcement Learning
Pattern Recognition ( IF 8 ) Pub Date : 2020-11-01 , DOI: 10.1016/j.patcog.2020.107503
Wu Sihang , Wang Jiapeng , Ma Weihong , Jin Lianwen

Abstract The decision-making ability of deep reinforcement learning has been proved successfully in a variety of fields. Here, we use this method for precise character detection by making tight bounding boxes around the Chinese characters in historical documents. An agent is trained to learn the control policy of fine-tuning a bounding box step-by-step through a Markov Decision Process. We introduce a novel fully convolutional network with position-sensitive Region-of-Interest (RoI) pooling (FCPN). The network receives character patches as input without fixed size, and it can fuse position information into the features of actions. Besides, we propose a dense reward function (DRF) that provides excellent rewards according to different actions and environment states, improving the decision-making ability of the agent. Our approach is designed as a universal method that can be applied to the output of all character-level or word-level text detectors to obtain more precise detection results. Application to the Tripitaka Koreana in Han (TKH) and Multiple Tripitaka in Han (MTH) datasets confirm the very promising performance of this method. In particular, our approach yields a significant improvement under a large Intersection over Union (IoU) of 0.8. The robustness and generality are also proved by experiments on the scene text datasets ICDAR2013 and ICDAR2015.

中文翻译:

利用深度强化学习精确检测历史文献中的汉字

摘要 深度强化学习的决策能力已在多个领域得到成功证明。在这里,我们使用这种方法通过在历史文档中的汉字周围制作紧密的边界框来进行精确的字符检测。训练代理以学习通过马尔可夫决策过程逐步微调边界框的控制策略。我们引入了一种具有位置敏感的兴趣区域 (RoI) 池化 (FCPN) 的新型全卷积网络。网络接收字符块作为输入,没有固定大小,它可以将位置信息融合到动作的特征中。此外,我们提出了一个密集奖励函数(DRF),它根据不同的动作和环境状态提供出色的奖励,提高了代理的决策能力。我们的方法被设计为一种通用方法,可以应用于所有字符级或单词级文本检测器的输出,以获得更精确的检测结果。韩文大藏经 (TKH) 和韩文多部大藏经 (MTH) 数据集的应用证实了该方法非常有前途的性能。特别是,我们的方法在 0.8 的大联合交集 (IoU) 下产生了显着的改进。通过在场景文本数据集 ICDAR2013 和 ICDAR2015 上的实验也证明了鲁棒性和通用性。我们的方法在 0.8 的大联合交集 (IoU) 下产生了显着的改进。通过在场景文本数据集 ICDAR2013 和 ICDAR2015 上的实验也证明了鲁棒性和通用性。我们的方法在 0.8 的大联合交集 (IoU) 下产生了显着的改进。通过在场景文本数据集 ICDAR2013 和 ICDAR2015 上的实验也证明了鲁棒性和通用性。
更新日期:2020-11-01
down
wechat
bug