当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Two-Stream Transformer Networks for Video-Based Face Alignment
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2017-08-01 , DOI: 10.1109/tpami.2017.2734779
Hao Liu , Jiwen Lu , Jianjiang Feng , Jie Zhou

In this paper, we propose a two-stream transformer networks (TSTN) approach for video-based face alignment. Unlike conventional image-based face alignment approaches which cannot explicitly model the temporal dependency in videos and motivated by the fact that consistent movements of facial landmarks usually occur across consecutive frames, our TSTN aims to capture the complementary information of both the spatial appearance on still frames and the temporal consistency information across frames. To achieve this, we develop a two-stream architecture, which decomposes the video-based face alignment into spatial and temporal streams accordingly. Specifically, the spatial stream aims to transform the facial image to the landmark positions by preserving the holistic facial shape structure. Accordingly, the temporal stream encodes the video input as active appearance codes, where the temporal consistency information across frames is captured to help shape refinements. Experimental results on the benchmarking video-based face alignment datasets show very competitive performance of our method in comparisons to the state-of-the-arts.

中文翻译:

基于视频的人脸对准的两流变压器网络

在本文中,我们提出了一种基于视频的人脸对齐的两流变压器网络(TSTN)方法。与传统的基于图像的面部对齐方法不同,该方法无法显式地对视频中的时间依赖性进行建模,并且受面部地标的连续移动通常在连续帧中发生这一事实的启发,我们的TSTN旨在捕获静止帧上空间外观的互补信息以及跨帧的时间一致性信息。为了实现这一点,我们开发了一种两流架构,该架构将基于视频的面部对齐分解为相应的空间和时间流。具体而言,空间流旨在通过保留整体面部形状结构将面部图像转换为界标位置。因此,时间流将视频输入编码为活动外观代码,其中跨帧的时间一致性信息将被捕获以帮助优化形状。基于基准视频的人脸对齐数据集的实验结果表明,与最新技术相比,我们的方法具有极强的竞争力。
更新日期:2018-10-03
down
wechat
bug