当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Face Mask Extraction in Video Sequence
International Journal of Computer Vision ( IF 19.5 ) Pub Date : 2018-11-16 , DOI: 10.1007/s11263-018-1130-2
Yujiang Wang , Bingnan Luo , Jie Shen , Maja Pantic

Inspired by the recent development of deep network-based methods in semantic image segmentation, we introduce an end-to-end trainable model for face mask extraction in video sequence. Comparing to landmark-based sparse face shape representation, our method can produce the segmentation masks of individual facial components, which can better reflect their detailed shape variations. By integrating convolutional LSTM (ConvLSTM) algorithm with fully convolutional networks (FCN), our new ConvLSTM-FCN model works on a per-sequence basis and takes advantage of the temporal correlation in video clips. In addition, we also propose a novel loss function, called segmentation loss, to directly optimise the intersection over union (IoU) performances. In practice, to further increase segmentation accuracy, one primary model and two additional models were trained to focus on the face, eyes, and mouth regions, respectively. Our experiment shows the proposed method has achieved a 16.99% relative improvement (from 54.50 to 63.76% mean IoU) over the baseline FCN model on the 300 Videos in the Wild (300VW) dataset.

中文翻译:

视频序列中的面罩提取

受到语义图像分割中基于深度网络的方法的最新发展的启发,我们引入了一种端到端的可训练模型,用于视频序列中的面罩提取。与基于地标的稀疏面部形状表示相比,我们的方法可以生成单个面部组件的分割掩码,可以更好地反映其详细的形状变化。通过将卷积 LSTM (ConvLSTM) 算法与完全卷积网络 (FCN) 相结合,我们新的 ConvLSTM-FCN 模型基于每个序列工作,并利用视频剪辑中的时间相关性。此外,我们还提出了一种新的损失函数,称为分割损失,以直接优化联合交叉(IoU)性能。在实践中,为了进一步提高分割精度,训练了一个主要模型和两个附加模型,分别专注于面部、眼睛和嘴巴区域。我们的实验表明,在 300 Videos in the Wild (300VW) 数据集上,所提出的方法相对于基线 FCN 模型实现了 16.99% 的相对改进(从 54.50 到 63.76% 平均 IoU)。
更新日期:2018-11-16
down
wechat
bug