Automatic Comic Generation with Stylistic Multi-page Layouts and Emotion-driven Text Balloon Generation,ACM Transactions on Multimedia Computing, Communications, and Applications

当前位置： X-MOL 学术 › ACM Trans. Multimed. Comput. Commun. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automatic Comic Generation with Stylistic Multi-page Layouts and Emotion-driven Text Balloon Generation
ACM Transactions on Multimedia Computing, Communications, and Applications ( IF 5.2 ) Pub Date : 2021-05-30 , DOI: 10.1145/3440053
Xin Yang ₁ , Zongliang Ma ₁ , Letian Yu ₁ , Ying Cao ₂ , Baocai Yin ₁ , Xiaopeng Wei ₁ , Qiang Zhang ₁ , Rynson W. H. Lau ₂

Affiliation

In this article, we propose a fully automatic system for generating comic books from videos without any human intervention. Given an input video along with its subtitles, our approach first extracts informative keyframes by analyzing the subtitles and stylizes keyframes into comic-style images. Then, we propose a novel automatic multi-page layout framework that can allocate the images across multiple pages and synthesize visually interesting layouts based on the rich semantics of the images (e.g., importance and inter-image relation). Finally, as opposed to using the same type of balloon as in previous works, we propose an emotion-aware balloon generation method to create different types of word balloons by analyzing the emotion of subtitles and audio. Our method is able to vary balloon shapes and word sizes in balloons in response to different emotions, leading to more enriched reading experience. Once the balloons are generated, they are placed adjacent to their corresponding speakers via speaker detection. Our results show that our method, without requiring any user inputs, can generate high-quality comic pages with visually rich layouts and balloons. Our user studies also demonstrate that users prefer our generated results over those by state-of-the-art comic generation systems.

中文翻译：

具有风格多页布局和情感驱动的文本气球生成的自动漫画生成

在本文中，我们提出了一个全自动系统，可以在没有任何人工干预的情况下从视频中生成漫画书。给定输入视频及其字幕，我们的方法首先通过分析字幕提取信息关键帧，并将关键帧风格化为漫画风格的图像。然后，我们提出了一种新颖的自动多页面布局框架，可以跨多个页面分配图像，并基于图像的丰富语义（例如，重要性和图像间关系）合成视觉上有趣的布局。最后，与使用与之前作品相同类型的气球相反，我们提出了一种情绪感知气球生成方法，通过分析字幕和音频的情绪来创建不同类型的文字气球。我们的方法能够根据不同的情绪改变气球的形状和字号，从而带来更丰富的阅读体验。生成气球后，通过扬声器检测将它们放置在相应的扬声器附近。我们的结果表明，我们的方法不需要任何用户输入，就可以生成具有视觉丰富的布局和气球的高质量漫画页面。我们的用户研究还表明，与最先进的漫画生成系统相比，用户更喜欢我们生成的结果。

更新日期：2021-05-30

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文