Dynamic Facial Expression Recognition Using Sparse Reserved Projection Algorithm for Low Illumination Images

Li, Hui

doi:https://doi.org/10.1155/2021/2658471

Scientific Programming

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Scientific Programming for Smart Internet of Things

View this Special Issue

Research Article | Open Access

Volume 2021 | Article ID 2658471 | https://doi.org/10.1155/2021/2658471

Dynamic Facial Expression Recognition Using Sparse Reserved Projection Algorithm for Low Illumination Images

Hui Li¹

Academic Editor: Mian Ahmad Jan

Received17 Jun 2021

Revised27 Jul 2021

Accepted07 Aug 2021

Published06 Sept 2021

Abstract

In this paper, a novel approach for facial expression recognition based on sparse retained projection is proposed. The locality preserving projection (LPP) algorithm is used to reduce the dimension of face image data that ensures the local near-neighbor relationship of face images. The sparse representation method is used to solve the partial occlusion of human face and the problem of light imbalance. Through sparse reconstruction, the sparse reconstruction information of expression is retained as well as the local neighborhood information of expression, which can extract more effective and judgmental internal features from the original expression data, and the obtained projection is relatively stable. The recognition results based on CK + expression database show that this method can effectively improve the facial expression recognition rate.

1. Introduction

As an important branch of pattern recognition research, face recognition technology has become a research hotspot in today’s society. Face recognition technology is realized via the face image analysis and extraction of image characteristic data, such as those with the most discriminating information to identify a technology [1]. The current face recognition technology is based on a single image and image set recognition methods. In single image recognition, each image acts on behalf of a sample, whereas in the image set recognition method, a person’s multiple images are used to represent a sample that constitutes an image set. Face recognition technology involves many fields such as cognitive psychology, image processing, pattern recognition, computer vision, and physiology, so there are a lot of researchers involved in face recognition research, which means face recognition technology has seen rapid development [2].

Although face recognition technology has been applied in real life, the current technology has not fully reached the mature state, and it still faces many difficulties and challenges. In practical applications, the face may be affected by some natural conditions, such as the impact of light imbalance, partial shielding of the face, facial expression change, posture change, age, and many other noises, which can affect the accuracy of face recognition results. Therefore, the current research focus should be on solving these key problems that hinder the development of face recognition technology. With the attention of researchers on face recognition worldwide, face recognition technology has made great achievements. In China, the team led by researcher Shan Shiguang, as well as the team led by Professor Gao Wen and Professor Chen Xilin, made significant contributions on face recognition and detection, and they established a large-scale domestic face database and many face recognition algorithms [3]. The team led by Professors Xu Guangyou and Zhang Changshui has made significant achievements by conducting an in-depth study of face pose changes in the process of face recognition [4]. The team led by Professor Yang Jingyu has been engaged in face recognition research for a long time. The main research direction is to analyze the algebraic features of face images and propose a variety of new face recognition algorithms [5]. In addition to the research teams listed above, there are many other research teams that have been engaged in face recognition-related research for a long time and have made corresponding achievements and progress [6, 7].

The experimental face library used by authors in [8] contained sample images of 16 people, of which each person had 9 images of faces with different shooting angles, lighting conditions, and different sizes. All the images of each person showed no significant change in expression. The recognition rate can reach up to 96% under different lighting conditions, 64% for face images of different scales, and 85% for face images of different angles. The authors in [9] proposed a method based on elastic graph matching with relatively high recognition rate, which is a method of feature information extraction based on dynamic link structure. They represented the face image as a graph structure, and the nodes of the graph were some key points in the face image (such as nose, mouth, and eyes). The links to these key information points form the edges of the graph, where each node contains a number of key information that is used to form the template of the face image. In [10], the authors used a neural network approach for face recognition. They first extracted 50 key feature information from face images and then mapped that information to the five-dimensional network space through the neural network and finally recognized and classified face images through the multilayer perception system (MLP) of the neural network. Through the continuous improvement and innovation of researchers, a variety of neural network structures have been proposed successively, and the recognition rate of face images has been improved to varying degrees.

In this paper, numerous challenges encountered in the process of face recognition were studied and analyzed. Using locality preserving projection (LPP) algorithm and sparse representation method, we found that LPP algorithm used for reducing the dimension of face image can ensure the local near-neighbor relationship of face image data. This makes the neighborhood relationship between the image samples before and after dimensionality reduction to be maintained and preserved. In addition, sparse representation can well solve partial occlusion and illumination imbalance [11], and it has strong robustness to these problems. Therefore, sparse representation was used to classify the data after dimensionality reduction. Therefore, in this paper, we combine the LPP with the classification method of sparse representation, effectively solving the impact of relevant problems on the recognition process and achieving a good recognition rate.

The rest of this paper is organized as follows. In Section 2, the local retained projection algorithm is discussed, which is based on the sparse classification. In Section 3, the proposed face dynamic expression recognition method for low illumination image is discussed. In Section 4, the experimental results are provided. Finally, the paper is concluded and future research directions are provided in Section 5.

2. Local Retained Projection Algorithm Based on Sparse Classification

In this section, first the LPP algorithm used by the proposed scheme is discussed. Next, the sparse representation classification is discussed. Both these approaches are the main approaches used by dynamic facial expression recognition techniques for low illumination images.

2.1. Locality Preserving Projection (LPP) Algorithm

Locality preserving projection (LPP) algorithm [12] is an unsupervised dimensionality reduction algorithm, which mainly aims at dimensionality reduction of nonlinear data. Its characteristic is that it can maintain the original topological structure relationship of the data points of the sample. It projects the data from the high-dimensional manifold space to the low-dimensional space by finding a projection matrix, aiming at finding the low-dimensional linear space embedded in the high-dimensional manifold space. The LPP algorithm ensures that the original local neighbor relationship can still be maintained after dimensionality reduction by constructing the neighbor graph.

The general process of linear dimension reduction can be described as follows: For given sample set of k data in the high-dimensional space, we need to find a projection matrix A to map all the data in the high-dimensional space to a low-dimensional space , where , and obtain the sample point set corresponding to all k samples in the low-dimensional space, and we can get . In this way, for each , can be found to make a good low-dimensional representation of it, so as to achieve the purpose of dimensionality reduction. The LPP algorithm assumes that the high-dimensional data are embedded on the manifold plane M in the high-dimensional space.

2.1.1. Building the Nearest Neighbor Graph

LPP algorithm is the first step to build a neighbor graph G using . Consider a set of adjacent G samples, each sample point is one of the vertices; a total of k vertices, for two vertices and if they are as close neighbors, are connected by edges and , otherwise there is no boundary between and , to judge whether and are the neighbors. The relationship between them uses the following two methods:(a)ε-domain method: by setting a threshold ε, for each sample of data , calculate the Euclidian distance between it and all other samples. If , the sample points and are considered to be close neighbors, and an edge link is used between them; otherwise, they are considered to be nonclose neighbors;(b)K-nearest neighbor method: for each data sample , calculate the Euclidean distance between it and all other samples. Select the first k sample points with the smallest Euclidean distance to as the nearest sample points to and link them with an edge, respectively. Figure 1 vividly shows the two methods of finding the nearest sample points.

(a)

(b)

2.1.2. Determining the Weights of a Graph

After the construction of the nearest neighbor graph in the first step, it is necessary to assign a weight to the edge of the nearest neighbor graph to construct the weight matrix W. This weight reflects the degree of similarity between nearby samples. The greater the weight, the higher the similarity between samples. There are two methods for weight calculation:(a)Thermonuclear-based methods:(b)Method based on the 0-1 principle:

2.1.3. The Target Code

The goal of the LPP algorithm is to find a projection matrix A to project data from the high-dimensional space to the low-dimensional space. The LPP algorithm obtains the projection matrix A by minimizing the following objective function:where and are data sample points of dimensionality reduction and can be obtained by sorting out the above equation:where and are known as Laplacian matrices. In order to eliminate the arbitrariness and scaling of the above minimization problem, a restriction condition is added to the above minimization problem:

With this constraint added, the above minimization problem is relatively simple and can be transformed into the minimization problem with the following constraints:

The solution of this objective function can be regarded as the eigenvector problem of solving a generalized characteristic equation:

Calculate all eigenvalues and eigenvectors corresponding to formula (7) and sort the eigenvalues from small to large. K eigenvalues can be obtained: , and the eigenvector corresponding to it is . Take the first eigenvector to form a projection matrix , thenwhere is an l-dimensional vector, which is an matrix.

2.2. Sparse Representation Classification

The classification method is based on the degree of sparseness. All training samples form a data dictionary. The dictionary in each column represents a face image. Then for a new test sample, all training samples can be used to delinearize. In data analysis, the sparse coefficient vector is obtained according to the data dictionary.

2.2.1. Sparse Representation of Test Samples

Firstly, it is assumed that there are face images for the training samples of class i, and each face image is represented by an m-dimensional column vector . Then, all face samples of class i can be expressed as follows: , where represents the number of face samples contained in class i and represents the j face sample of the ith individual. Then, for a new test sample y of class i, it can be linear, represented by all the training face samples of i, so y can be expressed as the following linear combination:

Here, is the coefficient of a linear combination. It is assumed that y belongs to class i, because in the actual classification and recognition process, it is unknown which class y belongs to. Therefore, for all k categories (class i contains training sample), a total of n training samples can be formed into a large dictionary matrix as follows:

Then, for any test sample y, all n training samples can be used to make a linear representation of it, that is, it can be expressed in a linear combination:where x is a sparse coefficient vector. If y is a sample of class i, then a sparse coefficient vector can be obtained under ideal circumstances. It can be seen from the coefficient vector that only the coefficients corresponding to the samples belonging to the same class with y are not zero, and the other samples are all equal to zero. So, sparse coefficient vector x retained information about classes, so we can pass the sparse coefficient vectors to order a test sample y category information.

(i) Solving Sparse Solution. In the process of sparse representation, the most important problem is to solve the sparse coefficient vector x. In the practical application, we hope to find a coefficient vector x that is sparse enough and can fully represent the test sample y. For the solution of x, we can solve it by norm:

The norm problem is easy to solve by taking the pseudoinverse of the matrix A and then solving for . However, the vector obtained by norm is not easy to test samples for classification, and the solutions obtained by it are usually relatively dense, that is, the nonzero terms in vector correspond to a large number of categories, which leads to increased difficulty in classification. In order to solve the dense problem, the coefficient vector can be more strictly restricted, so the norm is used to solve:

The relatively sparse coefficient vector can be obtained by solving the optimal solution of norm above. However, the solution of norm optimization problem has always been an NP-hard problem, and there is a problem that the solution is not unique, so its solution is not as simple as the norm. Therefore, the norm optimization solution is also not good. Finally, a compromise method is adopted to solve the problem through the norm. Through the norm, relatively sparse coefficient vectors can be obtained, and in general, there is a unique solution:

The desired sparse representation can be obtained through the optimization solution of norm, and the following sample classification and recognition can be carried out through this coefficient vector.

(ii) Classification Based on Sparse Representation. In an ideal case, the nonzero term in the solved sparse coefficient vector is the coefficient corresponding to the samples of the same class of the test sample, and the coefficient term corresponding to other samples that do not belong to the same class of the test sample is zero. Since the coefficient reflects the similarity between samples, the classification and recognition of test samples can be carried out according to the sparse coefficient vector. Samples corresponding to the term with the largest coefficient in the vector can be selected and then the test samples can be classified into the same category. In this way, classification and recognition can be realized. However, in practical applications, due to the existence of noise and model error, the situation is often not so ideal, and the nonzero term in the coefficient vector obtained is not necessarily the same category as the test sample. Therefore, the method with the largest coefficient cannot be simply selected for the classification of test samples.

For better classification, recognition, and accurate results, the concept of residual error (reconstruction error) is introduced for the sparse classification. The category to which y belongs is judged by calculating the reconstruction error of the test sample of each class. The class with the smallest reconstruction error value is regarded as the category to which y belongs. For this purpose, the following classification criterion function is used:where is the reconstruction error of test sample y on class i; is a new sparse coefficient vector. It only keeps the coefficient corresponding to the item of class i training samples in the sparse coefficient vector and sets all other coefficients to zero. In this case, it is the reconstruction and recovery of test sample y on the class i. By calculating the sum of the square of the difference between the test sample y and its reconstruction recovery sample on class i, the reconstruction error on class i can be obtained.

3. Face Dynamic Expression Recognition Method for Low Illumination Image

First the face detection approach using AdaBoost algorithm is discussed in Section 3.1 followed by expression feature extraction based on sparse reserved projection algorithm in Section 3.2.

3.1. Face Detection Based on AdaBoost Algorithm

The face detection algorithm based on AdaBoost algorithm consists of three parts: Haar-like feature representation, strong classifier construction, and cascade structure processing [13]. Its detection flow chart is shown in Figure 2.

3.1.1. Haar-Like Features

There are four basic feature templates of Haar-like features, as shown in Figure 3. The template is composed of black and white rectangles, and the eigenvalue is expressed as the difference of the sum of pixels between the two rectangles. A template represents a child window used to extract local features of the face.

(a)

(b)

(c)

(d)

In Figure 3, the characteristics of class (a), (b), and (d) and the eigenvalue V are

For the features of class (c), if the number of pixels of the black and white rectangle is the same, the calculation formula of its eigenvalue is as follows:

In order to speed up the calculation speed, the face detection method based on AdaBoost algorithm calculates Haar-like eigenvalue by using integral image [14]. Save the pixel sum of the rectangle in the array to reduce duplicate operations. The integral of image I at point (x, y) and image II (x, y) is defined aswhere I (x′, y′) is the pixel value of image I at point (x′, y′). Calculation of integral image is shown in Figure 4.

As shown in Figure 4, the integral image of each point is as follows:

Then, the sum of pixels of rectangle D is

It can be seen that the Haar-like eigenvalue only needs to calculate the integral image of the rectangular endpoint, which greatly reduces the calculation amount and improves the efficiency of feature extraction.

3.1.2. Strong Classifier Construction

AdaBoost is an adaptive iterative algorithm [15]. Its basic principle is to combine multiple weak classifiers into strong classifiers. The expression for constructing the weak classifier is as follows:

In equation (21), represents the eigenvalue of the subwindow to be measured in the image corresponding to feature j, represents the threshold of the weak classifier, and represents the direction of the inequality sign. If the positive sample is classified below , then ; otherwise, .

The steps to build strong classifier are as follows: Step 1: given the training sample set , represents the positive sample and represents the negative sample. Step 2: initialize the weight of sample according to equation (21):

In equation (22), l is the number of positive samples and m is the number of negative samples.

3.1.3. Facial Detection Based on AdaBoost Algorithm

The last step of the method is to process the cascade structure of the generated strong classifier , as shown in Figure 5.

The cascading processing structure of strong classifier reflects two advantages of AdaBoost algorithm:(1)The classifier located at the front end of the cascade structure can quickly exclude the subwindows that do not belong to the facial region. If the input image subwindow is denied once, it cannot enter the next level of classifier, thereby reducing the amount of computation.(2)The results are more accurate because the input image subwindow is filtered layer-by-layer, only each classifier in the cascade structure is judged to belong to the face region and can finally be detected as a face.

3.2. Expression Feature Extraction Based on Sparse Reserved Projection Algorithm

In the traditional facial expression recognition system, the expression feature extraction module is the core module, and the effective feature extraction directly affects the final recognition result. Facial expressions are rich in features, so there are many methods for feature extraction. This paper mainly uses LBP feature extraction. Local binary pattern (LBP) is a texture feature descriptor commonly used in image processing. Its remarkable feature is that it not only extracts image features effectively but also ensures a small amount of computation. The standard LBP operator contains the gray value of 9 pixels, and its structure is shown in Figure 6.

In Figure 6, is the gray value of the center pixel of the window, which is defined as the threshold value. LBP operator feature extraction can be divided into three steps.

Step 1. Compare the 8 gray values , , ..., adjacent to the threshold with themselves in turn. If is greater than the threshold, then the position of the pixel corresponding to the gray value is marked as 1; if is less than the threshold value, the position of the corresponding pixel value is marked as 0. Thus, a window can generate 8 binaries.

Step 2. Arrange the 8 binary codes generated in Step 1 in sequence to obtain a string of 8-bit binary numbers, namely, the LBP binary code of the pixel in the center of the window.

Step 3. Convert the LBP binary code obtained in Step 2 into decimal number to obtain the LBP value of the pixel in the center of the window.
Although the standard LBP operator can extract the image texture features effectively, its coverage is small and there are some defects. The extended LBP operator and the uniform mode LBP operator evolved from the standard LBP operator, and these two operators have stronger expression ability.
Extended LBP operator is an enhancement of the standard LBP operator, which expands the window neighborhood to any scale. The structure of the extended LBP operator is shown in Figure 7. P and RLBP are used to represent the LBP value of the center pixel in the circular neighborhood, where P represents the number of sampling points and R represents the radius of the circular neighborhood. Then, the texture information of the three circular neighborhoods in Figure 7 is respectively represented as LBP_4,1, LBP_8,1, and LBP_8,2.
Let the coordinate of the central pixel be and the coordinate of the sampling point in the circular neighborhood be , thenIn order to describe the texture information of the circular neighborhood conveniently, an auxiliary function is defined as follows:By referring to the calculation method of the standard LBP operator, the LBP value of isIn equation (25), represents the gray value of P sampling points around B.

(a)

(b)

(c)

4. Experiment Results

In this section, first the parameter settings are discussed followed by results and analysis.

4.1. Parameter Settings

The parameters of the VGG-19 pretraining model were transferred to the facial expression recognition network model by transfer learning method, and the facial expression recognition network model was trained by locality preserving projection (LPP) algorithm. Freeze the parameters of the first 12 convolution layers and train and optimize the last 4 convolution layers, 2 full connection layers, and the new Softmax classification layer. The batch size is set to 48, the initial learning rate is set to 0.005, and the number of training periods epoch is set to 100, respectively.

4.2. Results and Analysis

In order to verify the effectiveness of the method presented in this paper, the expression recognition method based on sparse reserved projection algorithm and the algorithm based on Yang et al. [5] and Zhuang and Ding [6] are respectively used to carry out comparative experiments. From here onward, I refer to them as reference [5] and reference [6] respectively for the ease of understanding.

Figure 8 shows the changes of the loss function values of the two models on CK + expression datasets. It can be seen that, in the process of model learning, the network convergence speed of the algorithm in reference [5] and reference [6] is slow, and the loss function value drops to 0.8 and 0.9, respectively. It shows that the model learning ability of the method in this paper is stronger.

(a)

(b)

(c)

Table 1 shows the confusion matrices of the three models under CK + expression datasets. The recognition rates of the algorithm in reference [5] and reference [6] are 91.17% and 89.25%, respectively, while the recognition rate of the method in this paper is 94.12%. It can be seen that the sparse retained projection algorithm is used in the experiment to achieve a higher recognition rate than other deep learning methods. The proposed method achieves a high recognition rate on the dataset, which shows the rationality and effectiveness of the proposed model.

5. Conclusion

In this paper, the locality preserving projection (LPP) algorithm and sparse representation method are studied, and they are combined to carry out face recognition. By using the locality preserving projection (LPP) algorithm to reduce the dimensionality of the high-dimensional face data, this method can maintain the consistency of the local neighbor relationship between samples before and after dimensionality reduction and effectively ensure the invariance of the relationship between the high-dimensional data in the projection to the low-dimensional space. After dimensionality reduction, the sparse representation method is used to classify and match the feature data to achieve recognition effect. Experimental results show that the proposed method achieves a high recognition rate on CK + expression datasets.

Data Availability

The datasets used and/or analyzed during the current study are available from the author on reasonable request.

Conflicts of Interest

The author declares that he has no conflicts of interest.

Acknowledgments

This project was supported by the Natural Science Foundation of Guangxi, Project no. 2018GXNSFAA294085 (Circle formation control for multi-agent systems under the coupling of event-triggered and quantized communication).

References

Z. Zhang, C. Lai, H. Liu, and Y.-F. Li, “Infrared facial expression recognition via Gaussian-based label distribution learning in the dark illumination environment for human emotion detection,” Neurocomputing, vol. 409, pp. 341–350, 2020.
View at: Publisher Site | Google Scholar
M. K. Lee, D. Y. Choi, D. H. Kim, and B. C. Song, “Visual scene-aware hybrid neural network architecture for video-based facial expression recognition,” in Proceedings of the 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–8, IEEE, Lille, France, May 2019.
View at: Google Scholar
Y. Zhu, X. Ni, H. Wang, and Y. Yao, “Face recognition under low illumination based on convolutional neural network,” International Journal of Autonomous and Adaptive Communications Systems, vol. 13, no. 3, pp. 260–272, 2020.
View at: Publisher Site | Google Scholar
H. Liu, Y. Chen, W. Zhao, S. Zhang, and Z. Zhang, “Human pose recognition via adaptive distribution encoding for action perception in the self-regulated learning process,” Infrared Physics & Technology, vol. 114, Article ID 103660, 2021.
View at: Publisher Site | Google Scholar
Y. Yang, H. Zhang, D. Yuan et al., “Hierarchical extreme learning machine based image denoising network for visual Internet of things,” Applied Soft Computing, vol. 74, pp. 747–759, 2019.
View at: Publisher Site | Google Scholar
P. Zhuang and X. Ding, “Divide-and-conquer framework for image restoration and enhancement,” Engineering Applications of Artificial Intelligence, vol. 85, pp. 830–844, 2019.
View at: Publisher Site | Google Scholar
N. Wang, Y. Wang, and M. J. Er, “Review on deep learning techniques for marine object recognition: architectures and algorithms,” Control Engineering Practice, vol. 23, Article ID 104458, 2020.
View at: Google Scholar
H. F. García, M. A. Álvarez, and Á. A. Orozco, “Dynamic facial landmarking selection for emotion recognition using Gaussian processes,” Journal on Multimodal User Interfaces, vol. 11, no. 4, pp. 327–340, 2017.
View at: Publisher Site | Google Scholar
R. Siddiqui, F. Shaikh, P. Sammulal, and A. Lakshmi, An Improved Method for Face Recognition with Incremental Approach in Illumination Invariant Conditions, Springer, Singapore, 2021.
View at: Publisher Site
J. W. Soh, J. S. Park, and N. I. Cho, “Joint high dynamic range imaging and super-resolution from a single image,” IEEE Access, vol. 7, pp. 177427–177437, 2019.
View at: Publisher Site | Google Scholar
P. M. Kumar, B. Poornima, H. Nagendraswamy, M. C, and B. Rangaswamy, “A refined structure preserving image abstraction framework as a pre-processing technique for desire focusing on prominent structure and artistic stylization,” Vietnam Journal of Computer Science, vol. 10, pp. 1–55, 2021.
View at: Publisher Site | Google Scholar
S. Madhavan and N. Kumar, “Incremental methods in face recognition: a survey,” Artificial Intelligence Review, vol. 54, no. 1, pp. 253–303, 2021.
View at: Publisher Site | Google Scholar
N. Nourbakhsh Kaashki and R. Safabakhsh, “RGB-D face recognition under various conditions via 3D constrained local model,” Journal of Visual Communication and Image Representation, vol. 52, pp. 66–85, 2018.
View at: Publisher Site | Google Scholar
D. Li, D. Jiang, R. Bao, L. Chen, and M. K. Kerns, “Crack detection and recognition model of parts based on machine vision,” Journal of Engineering Science & Technology Review, vol. 12, no. 5, 2019.
View at: Publisher Site | Google Scholar
Y. Wang, W. Song, G. Fortino, L.-Z. Qi, W. Zhang, and A. Liotta, “An experimental-based review of image enhancement and image restoration methods for underwater imaging,” IEEE Access, vol. 7, pp. 140233–140251, 2019.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2021 Hui Li. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

293

Downloads

465

Citations