当前位置: X-MOL 学术EURASIP J. Image Video Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Detection and recognition of cursive text from video frames
EURASIP Journal on Image and Video Processing ( IF 2.4 ) Pub Date : 2020-08-28 , DOI: 10.1186/s13640-020-00523-5
Ali Mirza , Ossama Zeshan , Muhammad Atif , Imran Siddiqi

Textual content appearing in videos represents an interesting index for semantic retrieval of videos (from archives), generation of alerts (live streams), as well as high level applications like opinion mining and content summarization. The key components of such systems require detection and recognition of textual content which also make the subject of our study. This paper presents a comprehensive framework for detection and recognition of textual content in video frames. More specifically, we target cursive scripts taking Urdu text as a case study. Detection of textual regions in video frames is carried out by fine-tuning deep neural networks based object detectors for the specific case of text detection. Script of the detected textual content is identified using convoluational neural networks (CNNs), while for recognition, we propose a UrduNet, a combination of CNNs and long short- term memory (LSTM) networks. A benchmark dataset containing cursive text with more than 13,000 video frame is also developed. A comprehensive series of experiments is carried out reporting an F-measure of 88.3% for detection while a recognition rate of 87%.

中文翻译:

从视频帧中检测和识别草书文本

视频中出现的文本内容代表了一个有趣的索引,用于视频的语义检索(来自档案),警报的生成(实时流)以及高级应用程序,例如观点挖掘和内容摘要。这种系统的关键组件要求检测和识别文本内容,这也是我们研究的主题。本文提出了一个用于检测和识别视频帧中文本内容的综合框架。更具体地说,我们以乌尔都语文本为案例研究针对草书。对于特定的文本检测情况,通过对基于深度神经网络的对象检测器进行微调,可以对视频帧中的文本区域进行检测。使用卷积神经网络(CNN)识别检测到的文本内容的脚本,同时为了识别,我们提出了UrduNet,CNN和长短期记忆(LSTM)网络的组合。还开发了一个基准数据集,其中包含草书文本以及超过13,000个视频帧。进行了一系列综合实验,报告了88.3%的F值用于检测,而识别率为87%。
更新日期:2020-08-28
down
wechat
bug