Multistage Model for Robust Face Alignment Using Deep Neural Networks

Wang, Huabin; Cheng, Rui; Zhou, Jian; Tao, Liang; Kwan, Hon Keung

doi:10.1007/s12559-021-09846-5

Multistage Model for Robust Face Alignment Using Deep Neural Networks

Published: 07 March 2021

Volume 14, pages 1123–1139, (2022)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Huabin Wang ORCID: orcid.org/0000-0001-5938-5409¹,
Rui Cheng¹,
Jian Zhou¹,
Liang Tao¹ &
…
Hon Keung Kwan²

448 Accesses
6 Citations
Explore all metrics

Abstract

The ability to generalize unconstrained conditions such as severe occlusions and large pose variations remains a challenging goal to achieve in face alignment. In this paper, a multistage model based on deep neural networks is proposed which takes advantage of spatial transformer networks, hourglass networks and exemplar-based shape constraints. First, a spatial transformer-generative adversarial network which consists of convolutional layers and residual units is utilized to solve the initialization issues caused by face detectors, such as rotation and scale variations, to obtain improved face bounding boxes for face alignment. Then, stacked hourglass network is employed to obtain preliminary locations of landmarks as well as their corresponding scores. In addition, an exemplar-based shape dictionary is designed to determine landmarks with low scores based on those with high scores. By incorporating face shape constraints, misaligned landmarks caused by occlusions or cluttered backgrounds can be considerably improved. Extensive experiments based on challenging benchmark datasets are performed to demonstrate the superior performance of the proposed method over other state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 3

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

End-to-End Object Detection with Transformers

Image Matching from Handcrafted to Deep Features: A Survey

Article Open access 04 August 2020

Notes

This work is built on top of [14] with four major contributions as listed at the end of Section I.

References

Deng Y, Li H, Wang Q, Du Q. Nuclear norm-based matrix regression preserving embedding for face recognition. Neurocomputing. 2018;311:279–90.
Article Google Scholar
Li X, Yang J, Wang Q. Nonrigid points alignment with soft-weighted selection. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. 2018. pp. 800–806.
Cao C, Hou Q, Zhou K. Displaced dynamic expression regression for real-time facial tracking and animation. ACM Trans Graph. 2014;33(4):43.
Google Scholar
Jourabloo A, Liu X. Large-pose face alignment via CNN-based dense 3d model fitting. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2016. pp. 4188–4196.
Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell. 2017;39(4):640–51.
Article Google Scholar
Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation. In: Proceedings of European Conference on Computer Vision, vol. 9905. Springer 2016. pp. 483–499.
Burgos-Artizzu XP, Perona P, Dollár P. Robust face landmark estimation under occlusion. In: Proceedings of the IEEE International Conference on Computer Vision. 2013. pp. 1513–1520.
Wu Y, Ji Q. Robust facial landmark detection under significant head poses and occlusion. In: Proceedings of the IEEE International Conference on Computer Vision. 2015. pp. 3658–3666.
Xing J, Niu Z, Huang J, Hu W, Zhou X, Yan S. Towards robust and accurate multi-view and partially-occluded face alignment. IEEE Trans Pattern Anal Mach Intell. 2018;40(4):987–1001.
Article Google Scholar
Liu Q, Deng J, Yang J, Liu G, Tao D. Adaptive cascade regression model for robust face alignment. IEEE Trans Image Process. 2017;26(2):797–807.
Article MathSciNet Google Scholar
Ren S, Cao X, Wei Y, Sun J. Face alignment via regressing local binary features. IEEE Trans Image Process. 2016;25(3):1233–45.
Article MathSciNet Google Scholar
Lv J, Shao X, Xing J, Cheng C, Zhou X. A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2017. pp. 3317–3326.
Yang J, Liu Q, Zhang K. Stacked hourglass network for robust facial landmark localisation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017. pp. 79–87.
Yan X, Wang H, Wang Q, Song J, Tao L. Score-guided face alignment network under occlusions. In: Chinese Conference on Pattern Recognition and Computer Vision. Springer 2018. pp. 195–206.
Jaderberg M, Simonyan K, Zisserman A, et al. Spatial transformer networks. In: Adv Neural Inf Proces Syst. 2015. pp. 2017–2025.
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Adv Neural Inf Proces Syst. 2014. pp. 2672–2680.
Cootes TF, Edwards GJ, Taylor CJ. Active appearance models. IEEE Trans Pattern Anal Mach Intell. 2001;23(6):681–5.
Article Google Scholar
Cootes TF, Taylor CJ, Cooper DH, Graham J. Active shape models-their training and application. Comput Vis Image Underst. 1995;61(1):38–59.
Article Google Scholar
Cristinacce D, Cootes TF. Feature detection and tracking with constrained local models. In: Proceedings of British Machine Vision Conference. Citeseer 2006. pp. 1–10.
Tzimiropoulos G, Pantic M. Gauss-newton deformable part models for face alignment in-the-wild. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2014. pp. 1851–1858.
Cootes TF, Ionita MC, Lindner C, Sauer P. Robust and accurate shape model fitting using random forest regression voting. In: Proceedings of European Conference on Computer Vision, vol. 7578. Springer Heidelberg 2012. pp. 278–291.
Xiong X, De la Torre F. Supervised descent method and its applications to face alignment. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2013. pp. 532–539.
Fan X, Liu R, Luo Z, Li Y, Feng Y. Explicit shape regression with characteristic number for facial landmark localization. IEEE Trans Multimedia. 2018;20(3):567–79.
Article Google Scholar
Yan J, Lei Z, Yi D, Li S. Learn to combine multiple hypotheses for accurate face alignment. In: Proceedings of the IEEE International Conference on Computer Vision Workshops. 2013. pp. 392–396.
Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M. 300 faces in-the-wild challenge: The first facial landmark localization challenge. In: Proceedings of the IEEE International Conference on Computer Vision Workshops. 2013. pp. 397–403.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2016. pp. 770–778.
Sun Y, Wang X, Tang X. Deep convolutional network cascade for facial point detection. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2013. pp. 3476–3483.
Zhang Z, Luo P, Loy CC, Tang X. Learning deep representation for face alignment with auxiliary attributes. IEEE Trans Pattern Anal Mach Intell. 2016;38(5):918–30.
Article Google Scholar
Xiao S, Feng J, Xing J, Lai H, Yan S, Kassim A. Robust facial landmark detection via recurrent attentive-refinement networks. In: Proceedings of European Conference on Computer Vision. Springer 2016. pp. 57–72.
Kowalski M, Naruniec J, Trzcinski T. Deep alignment network: A convolutional neural network for robust face alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017. pp. 88–97.
Bulat A, Tzimiropoulos G. Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. pp. 3706–3714.
Bulat A, Tzimiropoulos G. How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks). In: Proceedings of the International Conference on Computer Vision. 2017. pp. 1021–1030.
Deng J, Trigeorgis G, Zhou Y, Zafeiriou S. Joint multi-view face alignment in the wild. IEEE Trans Image Process. 2019;28(7):3636–48.
Article MathSciNet Google Scholar
Wu W, Qian C, Yang S, Wang Q, Cai Y, Zhou Q. Look at boundary: A boundary-aware face alignment algorithm. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2018. pp. 2129–2138.
Valle R, Buenaposada JM, Valdes A, Baumela L. A deeply-initialized coarse-to-fine ensemble of regression trees for face alignment. In: Proceedings of the European Conference on Computer Vision. 2018. pp. 585–601.
Weng R, Lu J, Tan YP, Zhou J. Learning cascaded deep auto-encoder networks for face alignment. IEEE Trans Multimedia. 2016;18(10):2066–78.
Article Google Scholar
Deng L. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag. 2012;29(6):141–2.
Article Google Scholar
Chen D, Hua G, Wen F, Sun J. Supervised transformer network for efficient face detection. In: Proceedings of European Conference on Computer Vision. 2016. pp. 122–138.
Lin CH, Yumer E, Wang O, Shechtman E, Lucey S. St-gan: Spatial transformer generative adversarial networks for image compositing. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2018. pp. 9455–9464.
Zeiler MD, Krishnan D, Taylor GW, Fergus R. Deconvolutional networks. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2010. pp. 2528–2535.
Liu Q, Deng J, Tao D. Dual sparse constrained cascade regression for robust face alignment. IEEE Trans Image Process. 2016;25(2):700–12.
Article MathSciNet Google Scholar
Ramanan D, Zhu X. Face detection, pose estimation, and landmark localization in the wild. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2012. pp. 2879–2886.
Belhumeur PN, Jacobs DW, Kriegman DJ, Kumar N. Localizing parts of faces using a consensus of exemplars. IEEE Trans Pattern Anal Mach Intell. 2013;35(12):2930–40.
Article Google Scholar
Le V, Brandt J, Lin Z, Bourdev L, Huang TS. Interactive facial feature localization. In: Proceedings of European Conference on Computer Vision. 2012. pp. 679–692.
Sagonas C, Antonakos E, Tzimiropoulos G, Zafeiriou S, Pantic M. 300 faces in-the-wild challenge: Database and results. Image Vis Comput. 2016;47:3–18.
Article Google Scholar
Ghiasi G, Fowlkes CC. Occlusion coherence: Localizing occluded faces with a hierarchical deformable part model. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2014. pp. 2385–2392.
Zafeiriou S, Trigeorgis G, Chrysos G, Deng J, Shen J. The menpo facial landmark localisation challenge: A step towards the solution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017. pp. 170–179.
Zhang Z, Luo P, Loy CC, Tang X. Facial landmark detection by deep multi-task learning. In: Proceedings of European Conference on Computer Vision. Springer 2014. pp. 94–108.
Zhu S, Li C, Change Loy C, Tang X. Face alignment by coarse-to-fine shape searching. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2015. pp. 4998–5006.
Trigeorgis G, Snape P, Nicolaou MA, Antonakos E, Zafeiriou S. Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2016. pp. 4177–4187.
Valle R, Buenaposada JM, Valdés A, Baumela L. Face alignment using a 3d deeply-initialized ensemble of regression trees. arXiv preprint 2019. arXiv:1902.01831
Kumar A, Chellappa R. Disentangling 3d pose in a dendritic cnn for unconstrained 2d face alignment. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2018. pp. 430–439.
Dong X, Yan Y, Ouyang W, Yang Y. Style aggregated network for facial landmark detection. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2018. pp. 379–388.
Zhu M, Shi D, Zheng M, Sadiq M. Robust facial landmark detection via occlusion-adaptive deep networks. In: Proc IEEE Conf Comput Vis Pattern Recognit. 2019. pp. 3486–3496.
Dapogny A, Bailly K, Cord M. Decafa: Deep convolutional cascade for face alignment in the wild. In: Proceedings of IEEE International Conference on Computer Vision. 2019. pp. 6893–6901.
Liu X, Wang H, Zhou J, Tao L. Attention-guided coarse-to-fine network for 2D face alignment in the wild. IEEE Access. 2019;7:97196–207.
Article Google Scholar
Fan H, Zhou E. Approaching human level facial landmark localization by deep learning. Image Vis Comput. 2016;47:27–35.
Article Google Scholar
Zhou E, Fan H, Cao Z, Jiang Y, Yin Q. Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In: Proceedings of International Conference on Computer Vision Workshops. 2013. pp. 386–391.
Deng J, Liu Q, Yang J, Tao D. M3 csr: Multi-view, multi-scale and multi-component cascade shape regression. Image Vis Comput. 2016;47:19–26.
Article Google Scholar
Ghiasi G, Fowlkes CC, Irvine C. Using segmentation to predict the absence of occluded parts. In: Proceedings of British Machine Vision Conference. 2015. pp. 1–12.
Wu W, Yang S. Leveraging intra and inter-dataset variations for robust face alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017. pp. 150–159.
Kwan HK. Multiplierless designs for artificial neural network. Neural Networks and Systolic Array Design (Machine Perception and Artificial Intelligence). 2002;49:301–25.
Google Scholar
Kwan HK. Simple sigmoid-like activation function suitable for digital hardware implementation. Electron Lett. 1992;28(15):1379–80. https://doi.org/10.1049/el:19920877.
Article Google Scholar
Kwan HK, Tang CZ. Multiplierless multilayer feedforward neural network design using quantised neurons. Electron Lett. 2002;38(13):645–6. https://doi.org/10.1049/el:20020465.
Article Google Scholar
Tang CZ, Kwan HK. Multilayer feedforward neural networks with single powers-of-two weights. IEEE Trans Signal Process. 1993;41(8):2724–7. https://doi.org/10.1109/78.229903.
Article MATH Google Scholar
Kwan HK. One-layer feedforward neural network for fast maximum/minimum determination. Electron Lett. 1992;28(17):1583–5. https://doi.org/10.1049/el:19921008.
Article Google Scholar
Kwan HK, Tang CZ. Designing multilayer feedforward neural networks using simplified sigmoid activation functions and one-powers-of-two weights. Electron Lett. 1992;28(25):2343–5. https://doi.org/10.1049/el:19921510.
Article Google Scholar
Kwan HK, Tang CZ. Multiplierless multilayer feedforward neural network design suitable for continuous input-output mapping. Electron Lett. 1993;29(14):1259–60. https://doi.org/10.1049/el:19930841.
Article Google Scholar
Tang CZ, Kwan HK. Parameter effects on convergence speed and generalization capability of backpropagation algorithm. Int J Electron. 1993;74(1):35–46.
Article Google Scholar

Download references

Funding

This work was funded in part by the National Natural Science Foundation of China under Grant 61372 137, in part by the Natural Science Foundation of Anhui Province under Grant 1908085MF209 and Grant 1708085MF151, and in part by the Natural Science Foundation for the Higher Education Institutions of Anhui Province under Grant KJ2019A0036.

Author information

Authors and Affiliations

Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, Hefei, Anhui, 230601, China
Huabin Wang, Rui Cheng, Jian Zhou & Liang Tao
Department of Electrical and Computer Engineering, University of Windsor, Windsor, Ontario, N9B 3P4, Canada
Hon Keung Kwan

Authors

Huabin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Rui Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Jian Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Liang Tao
View author publications
You can also search for this author in PubMed Google Scholar
Hon Keung Kwan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huabin Wang.

Ethics declarations

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflict of Interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, H., Cheng, R., Zhou, J. et al. Multistage Model for Robust Face Alignment Using Deep Neural Networks. Cogn Comput 14, 1123–1139 (2022). https://doi.org/10.1007/s12559-021-09846-5

Download citation

Received: 28 September 2020
Accepted: 03 February 2021
Published: 07 March 2021
Issue Date: May 2022
DOI: https://doi.org/10.1007/s12559-021-09846-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multistage Model for Robust Face Alignment Using Deep Neural Networks

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

End-to-End Object Detection with Transformers

Image Matching from Handcrafted to Deep Features: A Survey

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical Approval

Conflict of Interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multistage Model for Robust Face Alignment Using Deep Neural Networks

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

End-to-End Object Detection with Transformers

Image Matching from Handcrafted to Deep Features: A Survey

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical Approval

Conflict of Interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation