A deep attention-based ensemble network for real-time face hallucination,Journal of Real-Time Image Processing

当前位置： X-MOL 学术 › J. Real-Time Image Proc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A deep attention-based ensemble network for real-time face hallucination
Journal of Real-Time Image Processing ( IF 2.9 ) Pub Date : 2020-08-17 , DOI: 10.1007/s11554-020-01009-3
Dongdong Liu , Jincai Chen , Zhenxing Huang , Ni Zeng , Ping Lu , Lin Yang , Haofeng Wang , Jinqiao Kou , Min Wu

Face hallucination (FH) aims to reconstruct high-resolution faces from low-resolution face inputs, making it significant to other face-related tasks. Different from general super resolution issue, it often requires facial priors other than general extracted features thus leading to fusion of more than one kind of feature. The existing CNN-based FH methods often fuse different features indiscriminately which may introduce noises. Also the latent relations among different features which may be useful are taken into less consideration. To address the above issues, we propose an end-to-end deep ensemble network which aggregates three extraction sub-nets in attention-based manner. In our ensemble strategy, both relations among different features and inter-dependencies among different channels are dug out through the exploitation of spatial attention and channel attention. And for the diversity of extracted features, we aggregate three different sub-nets, which are the basic sub-net for basic features, the auto-encoder sub-net for facial shape priors and the dense residual attention sub-net for fine-grained texture features. Conducted ablation studies and experimental results show that our method achieves effectiveness not only in PSNR (Peak Signal to Noise Ratio) and SSIM (Structural Similarity Index) metrics but more importantly in clearer details within both key facial areas and whole range. Also results show that our method achieves real-time hallucinating faces by generating one image in 0.0237s.

中文翻译：

基于深度注意力的集成网络，用于实时幻觉

人脸幻觉（FH）旨在从低分辨率人脸输入中重建高分辨率人脸，使其对其他人脸相关任务具有重要意义。与一般的超分辨率问题不同，它通常需要除一般提取的特征以外的面部先验，因此导致不止一种特征的融合。现有的基于CNN的FH方法通常会不加区别地融合不同的功能，这可能会引入噪声。同样，较少考虑不同特征之间的潜在关系。为了解决上述问题，我们提出了一种端到端的深度集成网络，该网络以基于注意力的方式聚合三个提取子网。在我们的整体策略中通过空间关注度和渠道关注度的挖掘，挖掘出不同特征之间的关系以及不同渠道之间的依存关系。对于所提取特征的多样性，我们聚合了三个不同的子网，它们是基本特征的基本子网，用于面部形状先验的自动编码器子网以及用于细粒度的密集剩余注意力子网纹理特征。进行的消融研究和实验结果表明，我们的方法不仅在PSNR（峰值信噪比）和SSIM（结构相似性指数）指标上均取得了效果，而且更重要的是在关键面部区域和整个范围内的更清晰细节上均取得了效果。结果还表明，我们的方法通过在0.0237s内生成一张图像来实现实时幻觉人脸。对于所提取特征的多样性，我们聚合了三个不同的子网，它们是基本特征的基本子网，用于面部形状先验的自动编码器子网以及用于细粒度的密集剩余注意力子网纹理特征。进行的消融研究和实验结果表明，我们的方法不仅在PSNR（峰值信噪比）和SSIM（结构相似性指数）指标上均取得了效果，而且更重要的是在关键面部区域和整个范围内的更清晰细节上均取得了效果。结果还表明，我们的方法通过在0.0237s内生成一张图像来实现实时幻觉人脸。对于所提取特征的多样性，我们聚合了三个不同的子网，它们是基本特征的基本子网，用于面部形状先验的自动编码器子网以及用于细粒度的密集剩余注意力子网纹理特征。进行的消融研究和实验结果表明，我们的方法不仅在PSNR（峰值信噪比）和SSIM（结构相似性指数）指标上均取得了效果，而且更重要的是在关键面部区域和整个范围内的更清晰细节上均取得了效果。结果还表明，我们的方法通过在0.0237s内生成一张图像来实现实时幻觉人脸。用于面部形状先验的自动编码器子网和用于细粒度纹理特征的密集剩余注意力子网。进行的消融研究和实验结果表明，我们的方法不仅在PSNR（峰值信噪比）和SSIM（结构相似性指数）指标上均取得了效果，而且更重要的是在关键面部区域和整个范围内的更清晰细节上均取得了效果。结果还表明，我们的方法通过在0.0237s内生成一张图像来实现实时幻觉人脸。用于面部形状先验的自动编码器子网和用于细粒度纹理特征的密集剩余注意力子网。进行的消融研究和实验结果表明，我们的方法不仅在PSNR（峰值信噪比）和SSIM（结构相似性指数）指标上均取得了效果，而且更重要的是在关键面部区域和整个范围内的更清晰细节上均取得了效果。结果还表明，我们的方法通过在0.0237s内生成一张图像来实现实时幻觉人脸。

更新日期：2020-08-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11