当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Real-Time Lexicon-Free Scene Text Retrieval
Pattern Recognition ( IF 7.5 ) Pub Date : 2021-02-01 , DOI: 10.1016/j.patcog.2020.107656
Andrés Mafla , Rubèn Tito , Sounak Dey , Lluís Gómez , Marçal Rusiñol , Ernest Valveny , Dimosthenis Karatzas

Abstract In this work, we address the task of scene text retrieval: given a text query, the system returns all images containing the queried text. The proposed model uses a single shot CNN architecture that predicts bounding boxes and builds a compact representation of spotted words. In this way, this problem can be modeled as a nearest neighbor search of the textual representation of a query over the outputs of the CNN collected from the totality of an image database. Our experiments demonstrate that the proposed model outperforms previous state-of-the-art, while offering a significant increase in processing speed and unmatched expressiveness with samples never seen at training time. Several experiments to assess the generalization capability of the model are conducted in a multilingual dataset, as well as an application of real-time text spotting in videos.

中文翻译:

实时无词典场景文本检索

摘要在这项工作中,我们解决了场景文本检索的任务:给定一个文本查询,系统返回包含查询文本的所有图像。所提出的模型使用单次 CNN 架构来预测边界框并构建被点词的紧凑表示。通过这种方式,这个问题可以被建模为对从整个图像数据库收集的 CNN 输出的查询的文本表示的最近邻搜索。我们的实验表明,所提出的模型优于以前的最新技术,同时显着提高了处理速度和无与伦比的表现力,而这些样本在训练时从未见过。在多语言数据集中进行了几个评估模型泛化能力的实验,
更新日期:2021-02-01
down
wechat
bug