Dual subspace discriminative projection learning
Introduction
Feature extraction plays a vital role in a large spectrum of real-world classification tasks [1]. Common to its many applications, such as medical diagnosis, face recognition, object detection, and action recognition [2], is input data with invariably high dimensions, redundant information and noise. Feature extraction under these settings transforms the original high dimensional data to a low dimensional subspace [3].
Early feature extraction methods include principal component analysis (PCA) [4], which learns a projection matrix that preserves the main energy of the original data. Various extensions to PCA offer less sensitivity to outliers, such as locality preserving projections (LPPs) [5], and neighbour preserving embedding (NPE) [6].
Linear discriminant analysis (LDA) [7] is a well established method. By including label information, LDA learns a projection matrix to maximize both the inter-class separability and intra-class compactness of the data in a low dimensional subspace. Many variants of LDA have been proposed [8], [9], such as locality sensitive discriminant analysis (LSDA) [7] and marginal Fisher analysis (MFA) [10]. In general, conventional LDA methods offer discriminative capabilities by harnessing the power of label information, however, tend to perform poorly in noisy and high dimensional settings because they are incapable of removing redundant and irrelevant information. This problem is addressed with the inclusion of sparse constraints. For example, sparse discriminant analysis [11] extends LDA to the high dimensional setting by imposing the l1 norm penalty. Advances have also been made with the l2,1 norm [12], [13], including robust sparse linear discriminant analysis (RSLDA), where both the l2,1 norm and l1 norm are included for improved feature selection and removal of sparse noise, respectively. RSLDA also addresses the problem of losing discriminative information when projecting to a low dimensional subspace by including a PCA-based constraint with the projection learning.
Feature extraction methods by sparse coding or representation [14], [15], [16], locality-constrained linear coding [17], canonical correlation analysis [18], bag-of-visual-terms (BOV) models [19], [20], [21], and low-rank representation (LRR) [22], [23] have also drawn considerable attention. Early LRR methods capture the global structure of data, but overlook local structure and salient features. This is addressed with latent low-rank representation (LatLRR) [24] where two low-rank matrices are proposed to recover row and column space information of the original data. One of these two matrices then serve as a projection matrix for extracting more salient and discriminative features. However, LatLRR is restricted by fixed feature dimensionality, and independent learning of both low-rank matrices. These problems are addressed with approximate low-rank projection learning (ALPL) [25] and extended approximate low-rank projection learning (EALPL), respectively. The authors also demonstrate improved robustness by extending ALPL to a supervised learning scenario, by coupling projection learning with class labels of the training set via ridge regression. Regression-based methods have also been proposed [26], [27]. For example, based on elastic-net regularized linear regression [28] are strategies to increase the margins of different classes, whilst enhancing the compactness of the projection matrix. However, the aforementioned feature extraction methods are based on the assumption that discriminative features share a common subspace. An alternative LRR-based approach [29] instead learns three components from training data: class-shared information, class-specific information, and sparse noise [30], where all three components are required for classification.
Deep neural networks offer multi-layer and multi-scale feature extraction capabilities. Motivated by the success of the convolutional neural network (CNN) for image classification, various extensions have been proposed [31], [32], including deep LDA [33], and the powerful residual network (ResNet) [34], also demonstrating their success in image classification. However, aside from the advent of parallel computing systems, one of the primary reasons for the success of deep neural networks is the emergence of large-scale annotated training data [35]. For instance ResNet achieved state-of-the-art success with a training set sample size of 1.28 million images on the ImageNet 2012 classification dataset [36]. On the other hand, with low sample size training data alone, it is well recognized that deep neural networks are prone to overfitting [31], and can lead to worse performance compared with conventional supervised learning approaches [3]. While this can be addressed with data augmentation, more recent success has been demonstrated with transfer learning. The objective of transfer learning is to use knowledge of a source domain and transfer that to a new target domain [37]. However, performance can still suffer when the source domain is completely different from the target domain. Indeed, for certain medical tasks, with fundamentally different data sizes, features, and task specifications between domains, transfer learning has shown little benefit [38]. As such, classification in the presence of small training size sample sets remains a challenge.
Motivated by the above factors, we propose a dual subspace discriminative projection learning (DSDPL) framework. Our approach reflects the established notion that training data is composed of class-shared information, class-specific information, and sparse noise. Unlike existing subspace learning methods, DSDPL serves to decompose original high dimensional training data, via learned projection matrices, into class-shared and class-specific subspaces. Moreover, during training, both class-shared and sparse noise components are required to accurately learn each class-specific projection matrix. However, the class-shared and sparse noise components are not required for classification. The learned projections are jointly constrained with l2,1 sparse norm and LDA terms to select the most important, yet discriminative information. Projections are also coupled with regression-based terms to more accurately learn the most distinctive class-specific features for better classification. In addition, the proposed reconstructive properties between subspaces ensure more freedom to capture the main energy of the original data, leading to better reconstruction, reduced information loss, and a surprisingly effective means of classification from low-dimensional class-specific subspaces.
The remainder of this paper is organized as follows. Section 2 introduces some related works. The proposed DSDPL method and its implementation are described in Section 3. Experimental results on five public databases are reported in Section 4. Finally, the paper concludes in Section 5.
Section snippets
Background
In this section, we briefly introduce the background work related to our method. Since our approach is related to linear regression and LDA, we mainly focus on these techniques.
Proposed dual subspace discriminative projection learning (DSDPL)
The merits of subspace learning are well justified with feature extraction methods by virtue of a projection matrix. However, there is an underlying assumption when learning the projection matrix that discriminative features reside within a shared subspace. As such, the potential of class specific discriminative information across disjoint subspaces is overlooked. In this paper, we propose a novel dual subspace discriminative projection learning (DSDPL) framework to solve this problem.
Experimental evaluation
In this section, we evaluate the effectiveness of DSDPL on five benchmark databases: AR face database [44], Extended Yale B face database [45], PubFig83 database [46], Caltech 101 database [47], and 15-Scene database [48]. Our approach is compared with current benchmark methods including discriminative elastic-net regularized linear regression (DENRLR) [28], and supervised approximate low-rank projection learning (SALPL) [25], robust sparse linear discriminant analysis [3], marginal Fisher
Conclusion
In this paper, we propose a novel feature extraction algorithm called dual subspace discriminative projection learning (DSDPL) to address the problem of multi-class image classification with small sample size training data. Unlike traditional projection learning frameworks that assume discriminative features share a common subspace, DSDPL instead serves to decompose original high dimensional data, via learned projection matrices, into class-shared and class-specific subspaces. The learned
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Gregg Belous received the B.Eng. degree (Hons.) in electronic and computer engineering from Griffith University, Australia, in 2012. He is currently pursuing the Ph.D. degree in biomedical engineering. His research interests include pattern recognition, machine learning, computer vision, and medical imaging.
References (55)
- et al.
Learning structures of interval-based Bayesian networks in probabilistic generative model for human complex activity recognition
Pattern Recognit.
(2018) - et al.
A classification-oriented dictionary learning model: explicitly learning the particularity and commonality across categories
Pattern Recognit.
(2014) - et al.
Imagenet classification with deep convolutional neural networks
Advances in Neural Information Processing Systems
(2012) - et al.
Beyond bags of features: spatial pyramid matching for recognizing natural scene categories
2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)
(2006) - et al.
Input feature selection for classification problems
IEEE Trans. Neural Netw.
(2002) - et al.
Robust sparse linear discriminant analysis
IEEE Transactions on Circuits and Systems for Video Technology
(2018) - et al.
Application of the Karhunen-Loeve procedure for the characterization of human faces
IEEE Trans. Pattern Anal. Mach. Intell.
(1990) - et al.
Locality preserving projections
Advances in Neural Information Processing Systems
(2004) - et al.
Neighborhood preserving embedding
Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1
(2005) - et al.
Eigenfaces vs. fisherfaces: recognition using class specific linear projection
Null space versus orthogonal linear discriminant analysis
Proceedings of the 23rd International Conference on Machine Learning
Feature reduction via generalized uncorrelated linear discriminant analysis
IEEE Trans. Knowl. Data Eng.
Graph embedding and extensions: a general framework for dimensionality reduction
IEEE Trans. Pattern Anal. Mach. Intell.
Sparse discriminant analysis
Technometrics
l2, 1 regularized correntropy for robust feature selection
2012 IEEE Conference on Computer Vision and Pattern Recognition
Adaptive semi-supervised feature selection for cross-modal retrieval
IEEE Trans. Multimed.
Robust face recognition via sparse representation
IEEE Trans. Pattern Anal. Mach. Intell.
Sparse representation or collaborative representation: which helps face recognition?
Computer Vision (ICCV), 2011 IEEE International Conference on
Linear spatial pyramid matching using sparse coding for image classification
Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on
Locality-constrained linear coding for image classification
2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Discriminative extended canonical correlation analysis for pattern set matching
Mach. Learn.
Beyond bags of features: spatial pyramid matching for recognizing natural scene categories
2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)
Max-margin dictionary learning for multiclass image categorization
European Conference on Computer Vision
Kernel codebooks for scene categorization
European Conference on Computer Vision
Learning structured low-rank representations for image classification
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Low-rank matrix recovery with structural incoherence for robust face recognition
2012 IEEE Conference on Computer Vision and Pattern Recognition
Latent low-rank representation for subspace segmentation and feature extraction
2011 International Conference on Computer Vision
Cited by (21)
Relaxed multi-view discriminant analysis
2024, Engineering Applications of Artificial IntelligenceSelf-adaptive subspace representation from a geometric intuition
2024, Pattern RecognitionNoise-related face image recognition based on double dictionary transform learning
2023, Information SciencesA dual-model semi-supervised self-organizing fuzzy inference system for data stream classification
2023, Applied Soft ComputingUnified feature extraction framework based on contrastive learning
2022, Knowledge-Based SystemsFeature extraction framework based on contrastive learning with adaptive positive and negative samples
2022, Neural NetworksCitation Excerpt :Currently, high-dimensional data is widely used in pattern recognition and data mining (Cai, Zheng, & Chang, 2018; Gao et al., 2019; Zhao, Chellappa, Phillips, & Rosenfeld, 2003), which leads to high storage overhead, heavy computation, and excessive time consumption apart from causing the problem known as “curse of dimensionality”(Jain, Duin, & Mao, 2000). A significant way to address these issues is feature extraction, which transforms the original high-dimensional spatial data into a low-dimensional subspace by a projection matrix (Belous, Busch, & Gao, 2021; Dornaika & Khoder, 2020; Han et al., 2018; Lai et al., 2018; Lu et al., 2022; Yi et al., 2019). Although, the effect of feature extraction is often a little worse than that of deep learning, it has always been a research hotspot because of its strong interpretability and particularly well on any type of hardware (CPU, GPU, DSP).
Gregg Belous received the B.Eng. degree (Hons.) in electronic and computer engineering from Griffith University, Australia, in 2012. He is currently pursuing the Ph.D. degree in biomedical engineering. His research interests include pattern recognition, machine learning, computer vision, and medical imaging.
Andrew Busch received a double degree in electronic engineering and information technology and the Ph.D. degree in engineering from the Queensland University of Technology, Australia, in 1998 and 2004, respectively. He currently works as a senior lecturer with the School Engineering at Griffith University, Australia. His research interests include texture classification, multiresolution signal analysis, document analysis, and the use of imagery for biometric authentication.
Yongsheng Gao received the B.Sc. and M.Sc. degrees in electronic engineering from Zhejiang University, China, in 1985 and 1988, respectively, and the Ph.D. degree in computer engineering from Nanyang Technological University, Singapore. He is currently a Professor with the School of Engineering, Griffith University, Australia. He had been the Leader of Biosecurity Group, Queensland Research Laboratory, National ICT Australia (ARC Centre of Excellence), a consultant of Panasonic Singapore Laboratories, and an Assistant Professor in the School of Computer Engineering, Nanyang Technological University, Singapore. His research interests include smart farming, intelligent agriculture, environmental informatics, image retrieval, computer vision, pattern recognition, and medical imaging.