当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-task learning for simultaneous script identification and keyword spotting in document images
Pattern Recognition ( IF 8 ) Pub Date : 2021-01-18 , DOI: 10.1016/j.patcog.2021.107832
Ahmed Cheikhrouhou , Yousri Kessentini , Slim Kanoun

In this paper, an end-to-end multi-task deep neural network was proposed for simultaneous script identification and Keyword Spotting (KWS) in multi-lingual hand-written and printed document images. We introduced a unified approach which addresses both challenges cohesively, by designing a novel CNN-BLSTM architecture. The script identification stage involves local and global features extraction to allow the network to cover more relevant information. Contrarily to the traditional feature fusion approaches which build a linear feature concatenation, we employed a compact bi-linear pooling to capture pairwise correlations between these features. The script identification result is, then, injected in the KWS module to eliminate characters of irrelevant scripts and perform the decoding stage using a single-script mode. All the network parameters were trained in an end-to-end fashion using a multi-task learning that jointly minimizes the NLL loss for the script identification and the CTC loss for the KWS. Our approach was evaluated on a variety of public datasets of different languages and writing types.. Experiments proved the efficacy of our deep multi-task representation learning compared to the state-of-the-art systems for both of keyword spotting and script identification tasks.



中文翻译:

多任务学习,可同时识别文档图像中的脚本和发现关键词

本文提出了一种端到端的多任务深度神经网络,用于在多语言的手写和打印文档图像中同时进行脚本识别和关键字发现(KWS)。我们通过设计新颖的CNN-BLSTM体系结构,引入了一种统一的方法,可同时解决这两个挑战。脚本识别阶段涉及本地和全局特征提取,以允许网络覆盖更多相关信息。与建立线性特征串联的传统特征融合方法相反,我们采用了紧凑的双线性池来捕获这些特征之间的成对相关性。然后,将脚本识别结果注入到KWS模块中,以消除无关脚本的字符,并使用单脚本模式执行解码阶段。使用多任务学习以端到端的方式对所有网络参数进行了训练,该学习共同最小化了脚本标识的NLL损失和KWS的CTC损失。我们的方法在各种不同语言和写作类型的公共数据集上进行了评估。实验证明,与最新系统相比,我们的深度多任务表示学习方法在关键词识别和脚本识别任务上均具有最佳的效果。

更新日期:2021-02-02
down
wechat
bug