当前位置: X-MOL 学术Int. J. Doc. Anal. Recognit. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CNN-based segmentation of speech balloons and narrative text boxes from comic book page images
International Journal on Document Analysis and Recognition ( IF 2.3 ) Pub Date : 2021-04-21 , DOI: 10.1007/s10032-021-00366-4
Arpita Dutta , Samit Biswas , Amit Kumar Das

Most of the recent research works on comic document images have focused on the reading and distribution of comics digitally due to the evolution of technologies. In this work, the extraction of narrative text boxes and speech balloons, which contain the conversations among comic characters along with their feelings, is presented. Due to the huge variety of drawing styles, the shape of these speech balloons is complex, and extraction is difficult. We present a shape-aware dual-stream convolutional neural network for the segmentation of narrative text boxes and speech balloons of various shapes. In our dual-stream architecture, an added shape module processes edge information of the speech balloons and narrative texts with the main module. Later, the concatenation of these two modules produces more accurate segmentation of speech balloons and narrative text boxes. The proposed method achieves significant performance improvements in terms of both region accuracy (mIOU) and boundary accuracy (F-measure and Hausdorff distance) compared to other state-of-the-art methods on various publicly available comic datasets (namely eBDtheque, DCM and Manga 109 dataset subset) in different languages. In addition, we have developed a new dataset (BCBId) for comics in Bangla, the eighth most spoken language in the world, and propose a method for the development of ground-truth images in a semiautomatic way.



中文翻译:

基于CNN的漫画页面图像中语音气球和叙述文本框的分割

由于技术的发展,有关漫画文档图像的最新研究工作大部分都集中在数字漫画的阅读和分发上。在这项工作中,提出了叙述性文本框和语音气球的提取,其中包含漫画人物之间的对话以及他们的感受。由于绘画样式的多样性,这些语音气球的形状很复杂,并且提取困难。我们提出了一种形状感知的双流卷积神经网络,用于分割各种形状的叙述性文本框和语音气球。在我们的双流体系结构中,添加的形状模块与主模块一起处理语音气球和叙述文本的边缘信息。之后,这两个模块的串联产生了语音气球和叙述性文本框的更准确的分段。与在各种公共漫画数据集(即eBDtheque,DCM和Manga 109数据集子集)以不同的语言显示。此外,我们还以世界第八大口头语言孟加拉语为漫画开发了一个新的数据集(BCBId),并提出了一种以半自动方式开发真实图像的方法。与在各种公共漫画数据集(即eBDtheque,DCM和Manga 109数据集子集)以不同的语言显示。此外,我们还以世界第八大口头语言孟加拉语为漫画开发了一个新的数据集(BCBId),并提出了一种以半自动方式开发真实图像的方法。与在各种公共漫画数据集(即eBDtheque,DCM和Manga 109数据集子集)以不同的语言显示。此外,我们还以世界第八大口头语言孟加拉语为漫画开发了一个新的数据集(BCBId),并提出了一种以半自动方式开发真实图像的方法。

更新日期:2021-04-22
down
wechat
bug