Elsevier

Pattern Recognition

Volume 111, March 2021, 107581
Pattern Recognition

Dual subspace discriminative projection learning

https://doi.org/10.1016/j.patcog.2020.107581Get rights and content

Highlights

  • We propose a novel feature extraction algorithm called dual subspace discriminative projection learning (DSDPL) for multi-class image classification with low sample size training data.

  • Our approach serves to decompose original high dimensional data, via learned projection matrices, into class-shared and class-specific subspaces.

  • Comprehensive experimental analysis is performed across five publicly available databases for face, object and scene classifications.

  • Our experimental results demonstrate the effectiveness of DSDPL over current benchmark subspace learning methods and deep learning models.

Abstract

In this paper, we propose a dual subspace discriminative projection learning (DSDPL) framework for multi-category image classification. Our approach reflects the notion that images are composed of class-shared information, class-specific information, and sparse noise. Unlike traditional subspace learning methods, DSDPL serves to decompose original high dimensional data, via learned projection matrices, into class-shared and class-specific subspaces. The learned projection matrices are jointly constrained with l2,1 sparse norm and LDA terms while the reconstructive properties of DSDPL reduce information loss, leading to greater stability within low dimensional subspaces. Regression-based terms are also included to facilitate a more robust classification approach, using extracted class-specific features for better classification. Our approach is examined on five different datasets for face, object and scene classifications. Experimental results demonstrate not only the superiority and versatility of DSDPL over current benchmark approaches, but also a more robust classification approach with low sample size training data.

Introduction

Feature extraction plays a vital role in a large spectrum of real-world classification tasks [1]. Common to its many applications, such as medical diagnosis, face recognition, object detection, and action recognition [2], is input data with invariably high dimensions, redundant information and noise. Feature extraction under these settings transforms the original high dimensional data to a low dimensional subspace [3].

Early feature extraction methods include principal component analysis (PCA) [4], which learns a projection matrix that preserves the main energy of the original data. Various extensions to PCA offer less sensitivity to outliers, such as locality preserving projections (LPPs) [5], and neighbour preserving embedding (NPE) [6].

Linear discriminant analysis (LDA) [7] is a well established method. By including label information, LDA learns a projection matrix to maximize both the inter-class separability and intra-class compactness of the data in a low dimensional subspace. Many variants of LDA have been proposed [8], [9], such as locality sensitive discriminant analysis (LSDA) [7] and marginal Fisher analysis (MFA) [10]. In general, conventional LDA methods offer discriminative capabilities by harnessing the power of label information, however, tend to perform poorly in noisy and high dimensional settings because they are incapable of removing redundant and irrelevant information. This problem is addressed with the inclusion of sparse constraints. For example, sparse discriminant analysis [11] extends LDA to the high dimensional setting by imposing the l1 norm penalty. Advances have also been made with the l2,1 norm [12], [13], including robust sparse linear discriminant analysis (RSLDA), where both the l2,1 norm and l1 norm are included for improved feature selection and removal of sparse noise, respectively. RSLDA also addresses the problem of losing discriminative information when projecting to a low dimensional subspace by including a PCA-based constraint with the projection learning.

Feature extraction methods by sparse coding or representation [14], [15], [16], locality-constrained linear coding [17], canonical correlation analysis [18], bag-of-visual-terms (BOV) models [19], [20], [21], and low-rank representation (LRR) [22], [23] have also drawn considerable attention. Early LRR methods capture the global structure of data, but overlook local structure and salient features. This is addressed with latent low-rank representation (LatLRR) [24] where two low-rank matrices are proposed to recover row and column space information of the original data. One of these two matrices then serve as a projection matrix for extracting more salient and discriminative features. However, LatLRR is restricted by fixed feature dimensionality, and independent learning of both low-rank matrices. These problems are addressed with approximate low-rank projection learning (ALPL) [25] and extended approximate low-rank projection learning (EALPL), respectively. The authors also demonstrate improved robustness by extending ALPL to a supervised learning scenario, by coupling projection learning with class labels of the training set via ridge regression. Regression-based methods have also been proposed [26], [27]. For example, based on elastic-net regularized linear regression [28] are strategies to increase the margins of different classes, whilst enhancing the compactness of the projection matrix. However, the aforementioned feature extraction methods are based on the assumption that discriminative features share a common subspace. An alternative LRR-based approach [29] instead learns three components from training data: class-shared information, class-specific information, and sparse noise [30], where all three components are required for classification.

Deep neural networks offer multi-layer and multi-scale feature extraction capabilities. Motivated by the success of the convolutional neural network (CNN) for image classification, various extensions have been proposed [31], [32], including deep LDA [33], and the powerful residual network (ResNet) [34], also demonstrating their success in image classification. However, aside from the advent of parallel computing systems, one of the primary reasons for the success of deep neural networks is the emergence of large-scale annotated training data [35]. For instance ResNet achieved state-of-the-art success with a training set sample size of 1.28 million images on the ImageNet 2012 classification dataset [36]. On the other hand, with low sample size training data alone, it is well recognized that deep neural networks are prone to overfitting [31], and can lead to worse performance compared with conventional supervised learning approaches [3]. While this can be addressed with data augmentation, more recent success has been demonstrated with transfer learning. The objective of transfer learning is to use knowledge of a source domain and transfer that to a new target domain [37]. However, performance can still suffer when the source domain is completely different from the target domain. Indeed, for certain medical tasks, with fundamentally different data sizes, features, and task specifications between domains, transfer learning has shown little benefit [38]. As such, classification in the presence of small training size sample sets remains a challenge.

Motivated by the above factors, we propose a dual subspace discriminative projection learning (DSDPL) framework. Our approach reflects the established notion that training data is composed of class-shared information, class-specific information, and sparse noise. Unlike existing subspace learning methods, DSDPL serves to decompose original high dimensional training data, via learned projection matrices, into class-shared and class-specific subspaces. Moreover, during training, both class-shared and sparse noise components are required to accurately learn each class-specific projection matrix. However, the class-shared and sparse noise components are not required for classification. The learned projections are jointly constrained with l2,1 sparse norm and LDA terms to select the most important, yet discriminative information. Projections are also coupled with regression-based terms to more accurately learn the most distinctive class-specific features for better classification. In addition, the proposed reconstructive properties between subspaces ensure more freedom to capture the main energy of the original data, leading to better reconstruction, reduced information loss, and a surprisingly effective means of classification from low-dimensional class-specific subspaces.

The remainder of this paper is organized as follows. Section 2 introduces some related works. The proposed DSDPL method and its implementation are described in Section 3. Experimental results on five public databases are reported in Section 4. Finally, the paper concludes in Section 5.

Section snippets

Background

In this section, we briefly introduce the background work related to our method. Since our approach is related to linear regression and LDA, we mainly focus on these techniques.

Proposed dual subspace discriminative projection learning (DSDPL)

The merits of subspace learning are well justified with feature extraction methods by virtue of a projection matrix. However, there is an underlying assumption when learning the projection matrix that discriminative features reside within a shared subspace. As such, the potential of class specific discriminative information across disjoint subspaces is overlooked. In this paper, we propose a novel dual subspace discriminative projection learning (DSDPL) framework to solve this problem.

Experimental evaluation

In this section, we evaluate the effectiveness of DSDPL on five benchmark databases: AR face database [44], Extended Yale B face database [45], PubFig83 database [46], Caltech 101 database [47], and 15-Scene database [48]. Our approach is compared with current benchmark methods including discriminative elastic-net regularized linear regression (DENRLR) [28], and supervised approximate low-rank projection learning (SALPL) [25], robust sparse linear discriminant analysis [3], marginal Fisher

Conclusion

In this paper, we propose a novel feature extraction algorithm called dual subspace discriminative projection learning (DSDPL) to address the problem of multi-class image classification with small sample size training data. Unlike traditional projection learning frameworks that assume discriminative features share a common subspace, DSDPL instead serves to decompose original high dimensional data, via learned projection matrices, into class-shared and class-specific subspaces. The learned

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Gregg Belous received the B.Eng. degree (Hons.) in electronic and computer engineering from Griffith University, Australia, in 2012. He is currently pursuing the Ph.D. degree in biomedical engineering. His research interests include pattern recognition, machine learning, computer vision, and medical imaging.

References (55)

  • J. Ye et al.

    Null space versus orthogonal linear discriminant analysis

    Proceedings of the 23rd International Conference on Machine Learning

    (2006)
  • J. Ye et al.

    Feature reduction via generalized uncorrelated linear discriminant analysis

    IEEE Trans. Knowl. Data Eng.

    (2006)
  • S. Yan et al.

    Graph embedding and extensions: a general framework for dimensionality reduction

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • L. Clemmensen et al.

    Sparse discriminant analysis

    Technometrics

    (2011)
  • R. He et al.

    l2, 1 regularized correntropy for robust feature selection

    2012 IEEE Conference on Computer Vision and Pattern Recognition

    (2012)
  • E. Yu et al.

    Adaptive semi-supervised feature selection for cross-modal retrieval

    IEEE Trans. Multimed.

    (2019)
  • J. Wright et al.

    Robust face recognition via sparse representation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2009)
  • L. Zhang et al.

    Sparse representation or collaborative representation: which helps face recognition?

    Computer Vision (ICCV), 2011 IEEE International Conference on

    (2011)
  • J. Yang et al.

    Linear spatial pyramid matching using sparse coding for image classification

    Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on

    (2009)
  • J. Wang et al.

    Locality-constrained linear coding for image classification

    2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition

    (2010)
  • O. Arandjelović

    Discriminative extended canonical correlation analysis for pattern set matching

    Mach. Learn.

    (2014)
  • S. Lazebnik et al.

    Beyond bags of features: spatial pyramid matching for recognizing natural scene categories

    2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)

    (2006)
  • X.-C. Lian et al.

    Max-margin dictionary learning for multiclass image categorization

    European Conference on Computer Vision

    (2010)
  • J.C. Van Gemert et al.

    Kernel codebooks for scene categorization

    European Conference on Computer Vision

    (2008)
  • Y. Zhang et al.

    Learning structured low-rank representations for image classification

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2013)
  • Y.-C.F. Wang et al.

    Low-rank matrix recovery with structural incoherence for robust face recognition

    2012 IEEE Conference on Computer Vision and Pattern Recognition

    (2012)
  • G. Liu et al.

    Latent low-rank representation for subspace segmentation and feature extraction

    2011 International Conference on Computer Vision

    (2011)
  • Cited by (21)

    • Relaxed multi-view discriminant analysis

      2024, Engineering Applications of Artificial Intelligence
    • Feature extraction framework based on contrastive learning with adaptive positive and negative samples

      2022, Neural Networks
      Citation Excerpt :

      Currently, high-dimensional data is widely used in pattern recognition and data mining (Cai, Zheng, & Chang, 2018; Gao et al., 2019; Zhao, Chellappa, Phillips, & Rosenfeld, 2003), which leads to high storage overhead, heavy computation, and excessive time consumption apart from causing the problem known as “curse of dimensionality”(Jain, Duin, & Mao, 2000). A significant way to address these issues is feature extraction, which transforms the original high-dimensional spatial data into a low-dimensional subspace by a projection matrix (Belous, Busch, & Gao, 2021; Dornaika & Khoder, 2020; Han et al., 2018; Lai et al., 2018; Lu et al., 2022; Yi et al., 2019). Although, the effect of feature extraction is often a little worse than that of deep learning, it has always been a research hotspot because of its strong interpretability and particularly well on any type of hardware (CPU, GPU, DSP).

    View all citing articles on Scopus

    Gregg Belous received the B.Eng. degree (Hons.) in electronic and computer engineering from Griffith University, Australia, in 2012. He is currently pursuing the Ph.D. degree in biomedical engineering. His research interests include pattern recognition, machine learning, computer vision, and medical imaging.

    Andrew Busch received a double degree in electronic engineering and information technology and the Ph.D. degree in engineering from the Queensland University of Technology, Australia, in 1998 and 2004, respectively. He currently works as a senior lecturer with the School Engineering at Griffith University, Australia. His research interests include texture classification, multiresolution signal analysis, document analysis, and the use of imagery for biometric authentication.

    Yongsheng Gao received the B.Sc. and M.Sc. degrees in electronic engineering from Zhejiang University, China, in 1985 and 1988, respectively, and the Ph.D. degree in computer engineering from Nanyang Technological University, Singapore. He is currently a Professor with the School of Engineering, Griffith University, Australia. He had been the Leader of Biosecurity Group, Queensland Research Laboratory, National ICT Australia (ARC Centre of Excellence), a consultant of Panasonic Singapore Laboratories, and an Assistant Professor in the School of Computer Engineering, Nanyang Technological University, Singapore. His research interests include smart farming, intelligent agriculture, environmental informatics, image retrieval, computer vision, pattern recognition, and medical imaging.

    View full text