Understanding movie poster: transfer-deep learning approach for graphic-rich text recognition,The Visual Computer

当前位置： X-MOL 学术 › Vis. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Understanding movie poster: transfer-deep learning approach for graphic-rich text recognition
The Visual Computer ( IF 3.0 ) Pub Date : 2021-03-26 , DOI: 10.1007/s00371-021-02094-6
Mridul Ghosh , Sayan Saha Roy , Himadri Mukherjee , Sk Md Obaidullah , K. C. Santosh , Kaushik Roy

Graphic-rich texts are common in posters. In a movie poster, information, such as movie title, tag lines, and names of the actors, director, and production house, is available. Graphic-rich texts in movie titles represent not only sentiments but also their genre. Understanding the poster requires graphic-rich text recognition. Prior to that, one requires text localization, so background and foreground graffiti can be well segmented. In this paper, we propose a transfer learning-based approach for graphic-rich text localization, which was tuned by introducing reverse augmentation and rotated/inclined rectangle drawing technique. A convolution neural network-based model is then applied to identify their corresponding scripts. In our experiments, on a newly developed dataset (available upon request) that is composed of movie posters with multiple scripts of 1154 images, we achieved an average accuracy of 99.30%. Our results outperformed previously developed tools that are relying on handcrafted features.

中文翻译：

了解电影海报：用于图形丰富的文本识别的转移深度学习方法

富含图形的文本在海报中很常见。在电影海报中，可以使用诸如电影标题，标语和演员，导演和制片厂的名称之类的信息。电影标题中的图形丰富文本不仅代表情感，而且还代表其体裁。了解海报需要图形丰富的文本识别。在此之前，需要对文本进行本地化，因此可以很好地分割背景和前景涂鸦。在本文中，我们提出了一种基于转移学习的图形丰富文本本地化方法，该方法通过引入反向扩展和旋转/倾斜矩形绘制技术进行了调整。然后将基于卷积神经网络的模型应用于识别其相应的脚本。在我们的实验中在一个新开发的数据集（可根据要求提供）上，该数据集由具有1154张图像的多个脚本的电影海报组成，我们的平均准确率达到了99.30％。我们的结果优于以前依靠手工功能开发的工具。

更新日期：2021-03-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文