Role of the secondary visual cortex in HMAX model for object recognition

doi:10.1016/j.cogsys.2020.07.001

Cognitive Systems Research

Volume 64, December 2020, Pages 15-28

https://doi.org/10.1016/j.cogsys.2020.07.001 Get rights and content

Abstract

The models inspired by visual systems of life creatures (e.g., human, mammals, etc.) have been very successful in addressing object recognition tasks. For example, Hierarchical Model And X (HMAX) effectively recognizes different objects by modeling the V1, V4, and IT regions of the human visual system. Although HMAX is one of the superior models in the field of object recognition, its implementation has been limited due to some disadvantages such as the unrepeatability of the process under constant conditions, extreme redundancy, high computational load, and time-consuming. In this paper, we aim at revising the HMAX approach by adding the model of the secondary region (V2) in the human visual system which leads to removing the mentioned drawbacks of standard HMAX. The added layer selects repeatable and more informative features that increase the accuracy of the proposed method by avoiding the redundancy existing in the conventional approaches. Furthermore, this feature selection strategy considerably reduces the huge computational load. Another contribution of our model is highlighted when a small number of training images is available where our model can efficiently cope with this issue. We evaluate our proposed approach using Caltech5 and GRAZ-02 database as two famous benchmarks for object recognition tasks. Additionally, the results are compared with standard HMAX that validate and highlight the efficiency of the proposed method.

Introduction

Systems modeling inspired by living creatures, known as cognitive science, have been paid much attention in recent decades. In this field, behavior and structure of life systems (e.g. human and animals) are investigated to mimic their functionality for artificial systems. Since the human’s organs have been evolved and optimized over thousands of years, they are considered as optimal and feasible examples to cope with several different challenges where the machine want to perform accurately and automatically. For example, models inspired by the human visual system (HVS) have shown elegant successes in computer vision tasks such as object recognition. Object recognition is very hard and challenging for computers and machines since the shape and appearance of the objects change due to their dependency on environmental factors. Light intensity, rotation, movement, size, and diversity are some of the environmental factors that turn the object recognition to be a very difficult and sensitive task. In spite of all these challenges, the HVS comfortably and accurately recognizes different objects while the environmental factors have already change the appearance of objects. Similarly, biologically inspired models have also been very successful in addressing the mentioned challenges. Tang and Qiao (2014) improves invariant visual classification inspired by the biological mechanism. In order to realize the rapid target detection, a progressive enhancement SAR targets detection approach is proposed in Gao et al. (2016) which is inspired by the visual cortex mechanisms. Motivated by retina architecture, (Rajalakshmi & Prince, 2016) enhances medical images for diagnostic purposes. On the other hand, the mathematical models (Bay et al., 2008, Lowe, 2004, Dalal and Triggs, 2005) have been unable to cope with many of the existing challenges that they cannot perform as good as the HVS. Moreno, Marín-Jiménez, Bernardino, Santos-Victor, and de la Blanca (2007) provides a comparison between a model based on the HVS and a functional mathematical model known as the Scale Invariant Feature Transform (SIFT) system that shows the HVS based model outperforms SIFT. The mentioned cases imply that human inspired models overwhelm mathematical models.

Researches conducted by Hubel and Wiesel (1968), established the fundamental framework for most of the cognitive-based models. Their studies on the visual cortex demonstrated that the visual cortex consists of some hierarchical layers. These layers are composed of simple and complex cells. Simple cells operate as a filter while complex cells operate as a maximum finder operator. Processing in the cortex includes a feed-forward architecture in which the neuron’s receptive field increases by moving from the first layers of vision to the upper layers. At the end of the processing path, the Inferior Temporal (IT) cortex operates as a classifier where its neurons are triggered only by a particular pattern. When an image is received by the retina, it is passed to the Lateral Geniculate Nucleus (LGN) for infusing the two received images from the left and right eyes (Ghodrati, Khaligh-Razavi, & Lehky, 2017). The result is sent to the visual cortex where the main process is utilized on the images. The visual cortex is composed of two main processing path: dorsal and ventral paths. The dorsal path, consisting of V1, V2, V3, and V5 layers, processes the location of components and their spatial relation. The ventral path, consisting of V1, V2, V4, IT layers, has the duty of object recognition (Fig. 1). A comprehensive review of ventral path and its functionality is presented in Serre et al., 2005, Siegel and Sapru, 2006. In González-Casillas et al. (2018), authors try to provide a model for the ventral path. The relation between dorsal and ventral paths is discussed in Cloutman (2013). A good comparison of deep neural networks to the human visual object recognition is done in Cichy, Khosla, Pantazis, Torralba, and Oliva (2016).

One of the most famous biologically inspired models is HMAX, which was initially proposed by Riesenhuber and Poggio (1999). An improved version of HMAX was introduced as standard HMAX in which simple cells are modeled by Gabor filters and Radial Based Functions (RBF) (Serre, Wolf, Bileschi, Riesenhuber, & Poggio, 2007). Furthermore, complex cells are modeled by a max-pooling operation. Generally, this model efforts to mimic the hierarchical feed-forward structure of the HVS. Due to the superior performance of HMAX, several ideas have been developed to improve the standard HMAX. For example, (Seifzadeh, Rezaei, & Farahbakhsh, 2017) implements an extreme learning machine with a feed-forward structure rather than a Support Vector Machine (SVM) utilized in the standard HMAX. The proposed approach with its structure enables us to achieve higher accuracy compared to the standard HMAX. In Zhang, Lu, Kang, and Lim (2016), the extracted features from the images are represented by local binary pattern introduction. As a result, it enjoys a higher speed compared to standard HMAX. In order to increase the speed, (Hu, Zhang, Li, & Zhang, 2014) adds a clustering step to standard HMAX, which is more highlighted when one deals with high-resolution images. In Li, Wu, Zhang, and Li (2015), important parts of the image are extracted, then they are clustered. This work focuses on modeling the sixth layer of the cortex. Theriault et al., 2013, Mishra and Jenkins, 2010 improve HMAX by modification in filter bank in S1 layer. The results show slight improvements in both accuracy and performance. An approach is proposed by Walther and Koch (2007) that focuses only on some parts of the images, where this strategy leads to improve the performance of the standard HMAX. In Sufikarimi and Mohammadi (2017), the similarity of the extracted features are compared, and the similar features are removed to increase both speed and performance of the process without loss of accuracy. The model proposed in Jazlaeiyan and Shahhoseini (2016) suggests some modifications in both extraction and selection of the features. Ghodrati et al., 2012, Mutch and Lowe, 2008, Lu, Kang, Zhang, and Lim (2015) also study feature extraction and its impact on the performance of the standard model. Extensive researches in the field of feature extraction indicate their importance and their considerable effects on the performance of the HMAX models. Yang et al., 2013, Zhang et al., 2012 attempt to enhance the accuracy rate of the standard HMAX by utilizing color information. The computational load becomes three times larger than the grayscale process. However, the accuracy improvement is negligible.

In the HVS, low-level features are extracted in the primary visual cortex, V1. Then, the secondary visual cortex, V2, extracts high-level features by combining low-level features. The result of Hegd and Van Essen (2000) indicates that the V2 region has a great role in object recognition by extracting the complex shape information. Furthermore, experimental results conducted by Biederman (1987) indicate that high-level features are dominant factors in object recognition performed by the HVS. In spite of having a critical role in the ventral path, the V2 region is not modeled within standard HMAX. Most of the existing versions of HMAX randomly extract the features which are associated with some weaknesses such as (1) non-repeatability even under the same conditions (2) the existence of redundancy in the saved features (3) sensitivity to rotation (4) high computational load (slow processing). To cope with these problems, some of the previous researches aim to model the secondary visual cortex. In Lee, Ekanadham, and Ng (2008), the V1 is modeled by a Gabor filter and V2 is modeled by a corner and junction detector. By integrating multiple firing k-means into the HMAX model, (Wang & Deng, 2016) emulates the V2 neural responses. Using a hierarchical K-means, (Hu, Zhang, Qi, & Zhang, 2014) models the V2 area. González-Casillas et al. (2018) describes feature extraction in V1 and V2(such as lines, angles, and contours).

However, to the best knowledge of the authors, the role of the V2 region for object recognition in biologically inspired models has not been addressed yet.

This paper aims at improving the conventional HMAX model using inspiration from the HVS. To this end, a new layer is added to the standard HMAX that models the structure and functionality of the V2 layer in the HVS. This modification helps to find the most useful and informative features that are less sensitive to changes in objects (e.g., scale, rotation, movement, light, etc.). These features are called Non-Accidental Properties (NAP). As an example of NAP, we can mention the corners and edges of the image that are robust to changes in scale, rotation, and light. The added layer, V2, is a feature extractor which is located after the C1 layer. By integrating the results of S1, C1, the proposed approach generates a salient map of images. In the generated map, the key-points such as corners, edges are obviously bold. These key-points and their surrounding pixels, known as a patch, are extracted as features. Consequently, this invariant feature extraction let the proposed HMAX to be a repeatable process and more reliable than the existing methods. In addition, the integration strategy in the V2 layer reduces computational load without loss of accuracy rate. Therefore, color information could be processed without the fear of computational load.

In general, the proposed model tries to compensate for the weaknesses of standard HMAX. Our proposed approach provides several advantages compared to the standard approach:

•
Repeatability and reliability: one of the most important advantages of the proposed approach is to provide an acceptable level of reliability and repeatability. In a reliable approach, the results must be exactly repeated in similar conditions. In our approach, the repeatability is guaranteed due to repeatable feature extraction strategy inspired by the functionality of the V2 region in the HVS. However, since the standard HMAX randomly selects the features, there is a slight chance to select certain features multiple times. This weakness of the standard HMAX is extremely highlighted whenever a small number of features or training images are available.

•
Low computational load: Two mechanisms have been employed to reduce the computational load. First, the integration in the V2 layer compresses the generated data in the V1 layer to a quarter. Second, the high redundancy in the standard HMAX is eliminated by the key-points extraction method. As it is confirmed by the provided results, our proposed approach avoids any kind of redundancy. However, the standard approach dramatically suffers from the computational load. In HMAX, one part of the image may be extracted multiple times or several features may be selected from uniform parts of the image. These features do not contain any discriminative data, whereas they cause increasing the computational load. On the other hand, our approach focuses on salient points that contain the most discriminative information. Additionally, each candidate parts of the image is extracted once. Therefore, our approach completely avoids redundancy in feature selection and consequently, it prevents the huge computational load.

•
High Accuracy: Accuracy of a recognition process extremely depends on the quality of the extracted features that represent an image. Discriminative and informative features empower the classifier to accurately recognize an image. According to the biological studies, borders, corners and edges contain more information compared to the ordinary points. We develop our approach based on this biological fact which focuses on more informative features. In the proposed approach, the functionality of the secondary visual cortex is modeled by a new layer called V2 resulting in more informative feature selection and consequently, the more accurate version of HMAX.

In Section 2, the standard HMAX is briefly reviewed. We explain our proposed method in Section 3. Section 4 constitutes the numerical results that validated the efficiency of the proposed model. The conclusion is provided in Section 5.

Section snippets

Brief review of the standard HMAX

The first version of the HMAX was presented in Riesenhuber and Poggio (1999). Then, an upgraded version, which was more similar to the HVS, was presented in Serre et al. (2007). This version is known as the standard HMAX. It consists of four consecutive layers and a classifier. This model has a hierarchical feed-forward structure similar to the structure of the HVS. HMAX’s structure ensures that speed and transferring of the information are correctly modeled. According to biological researches,

Proposed model

Recent studies have shown that the HVS with the help of the secondary visual cortex, V2, focuses on the key-points (i.e. edges and corners) relative to the other spots of an image. For instance, if corners of an object are available while the regular parts of the image are removed, the HVS is still able to recognize the image with a high probability. However, if the corners are removed while the other parts remain, recognition turns to an erroneous task (Biederman, 1987). HMAX relatively mimics

Data-set

For verifying the proposed model, we used two challenging databases in object recognition: Caltech5 and GRAZ-02 databases. Caltech5 consists of five object categories (Airplane, leave, car, face, motorbike (Fig. 9)). Caltech5 contains high inter-class diversity which makes the recognition very difficult. Besides, GRAZ-02 contains three objects (bicycle, car, human (Fig. 10)) including clutter and background diversity.

It should be noted that all images in these databases are colorful whereas the

Conclusion

In this paper, we model the secondary region of the visual cortex to be applied in the standard HMAX model where provides a more comprehensive model of the HVS. The main difference of our proposed approach is to use corners, edges, and salience points as deterministic features. This modification leads to avoiding non-repeatability of results given by standard HMAX. Additionally, the proposed approach improves the performance of the standard approach when the number of training images is not a

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

We would like to thank Hazhar Sufi Karimi (Ph.D. candidate at Kansas State University, USA) for his helpful comments and generously editing this manuscript.

References (45)

H. Bay et al.
Speeded-up robust features (surf)
Computer Vision and Image Understanding
(2008)
L.L. Cloutman
Interaction between dorsal and ventral processing streams: where, when and how?
Brain and Language
(2013)
M. Ghodrati et al.
Towards building a more complex view of the lateral geniculate nucleus: recent advances in understanding its role
Progress in Neurobiology
(2017)
A. González-Casillas et al.
Towards a model of visual recognition based on neurosciences
Procedia Computer Science
(2018)
A. González-Casillas et al.
Towards a model of visual recognition based on neurosciences
Procedia Computer Science
(2018)
X. Hu et al.
Modeling response properties of v2 neurons using a hierarchical k-means model
Neurocomputing
(2014)
Y. Li et al.
Enhanced hmax model with feedforward feature learning for multiclass categorization
Frontiers in Computational Neuroscience
(2015)
Y.-F. Lu et al.
Dominant orientation patch matching for hmax
Neurocomputing
(2016)
T. Rajalakshmi et al.
Retinal model-based visual perception: Applied for medical image processing
Biologically Inspired Cognitive Architectures
(2016)
T. Tang et al.
Improving invariance in visual classification with biologically inspired mechanism
Neurocomputing
(2014)

D.B. Walther et al.

Attention in hierarchical models of object recognition

Progress in Brain Research

(2007)

Y. Wang et al.

Modeling object recognition in visual cortex using multiple firing k-means and non-negative sparse coding

Signal Processing

(2016)

H.-Z. Zhang et al.

B-hmax: A fast binary biologically inspired model for object recognition

Neurocomputing

(2016)

Al Maashri, A., DeBole, M., Yu, C. -L., Narayanan, V., Chakrabarti, C. (2011). A hardware architecture for accelerating...

I. Biederman

Recognition-by-components: A theory of human image understanding

Psychological Review

(1987)

M.N. Cherloo et al.

An enhanced hmax model in combination with sift algorithm for object recognition

Signal, Image and Video Processing

(2019)

R.M. Cichy et al.

Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence

Scientific Reports

(2016)

Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C. (2004). Visual categorization with bags of keypoints. In...

Dalal, N., Triggs, B. (2005.) Histograms of oriented gradients for human detection. In: Computer Vision and Pattern...

F. Gao et al.

Biologically inspired progressive enhancement target detection from heavy cluttered sar images

Cognitive Computation

(2016)

M. Ghodrati et al.

How can selection of biologically inspired features improve the performance of a robust object recognition model?

PloS One

(2012)

J. Hegd et al.

Selectivity for complex shapes in primate visual area v2

Journal of Neuroscience

(2000)

Cited by (7)

Comparing HMAX and BoVW models for large-scale image classification
2021, Procedia Computer Science
Image classification is one of the most important topics in computer vision. It became crucial for large image datasets. In the literature, several image classification approaches are proposed. In this context, Bag-of-Visual Words (BoVW) model has been widely used. The BoVW model relies on building visual vocabulary and images are represented as histograms of visual words. However, recently, attention has been shifted to the use of complex architectures which are characterized by multilevel processing. HMAX (Hierarchical Max-pooling model) model has attracted a great deal of attention in image classification, due to its architecture, which alternates layers of feature extraction with layers of pooling. This paper aims at comparing bags of visual words model to HMAX model for image classification using large datasets. To achieve this goal, we study the use of image features obtained by BoVW model with SIFT (Scale-Invariant Feature Transform) descriptors, and we compare them to HMAX features. Image classification is performed by using the support vector machine (SVM) classifiers. Both HMAX and BoVW models are tested on ImageNet and OpenImages datasets and results have shown that the classification performance obtained by HMAX model outperforms the classification using BoVW model.
CREATING A METHOD FOR IDENTIFYING OBJECTS IN IMAGE WITH THEIR DIFFERENT ORIENTATIONS
2023, International Journal on Technical and Physical Problems of Engineering
Computational Model for Image Processing in the Minds of People with Visual Agnosia Using Fuzz Cognitive Map
2023, Journal of Information Systems and Telecommunication
An Enhanced HMAX Model to Improve Object Recognition
2023, Proceeding - 2023 IEEE 3rd International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering, MI-STA 2023
Surface Defect Recognition of Solar Panel Based on Percolation-Based Image Processing and Serre Standard Model
2023, IEEE Access
Novel patch selection based on object detection in HMAX for natural image classification
2022, Signal, Image and Video Processing

View all citing articles on Scopus

View full text

Role of the secondary visual cortex in HMAX model for object recognition

Abstract

Introduction

Section snippets

Brief review of the standard HMAX

Proposed model

Data-set

Conclusion

Declaration of Competing Interest

Acknowledgment

Computer Vision and Image Understanding

Brain and Language

Progress in Neurobiology

Procedia Computer Science

Procedia Computer Science

Neurocomputing

Frontiers in Computational Neuroscience

Neurocomputing

Biologically Inspired Cognitive Architectures

Neurocomputing

Progress in Brain Research

Signal Processing

Neurocomputing

Recognition-by-components: A theory of human image understanding

Psychological Review

An enhanced hmax model in combination with sift algorithm for object recognition

Signal, Image and Video Processing

Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence

Scientific Reports

Biologically inspired progressive enhancement target detection from heavy cluttered sar images

Cognitive Computation

How can selection of biologically inspired features improve the performance of a robust object recognition model?

PloS One

Selectivity for complex shapes in primate visual area v2

Journal of Neuroscience