1 Introduction

The real-time implementation of a computer vision system on IoT based surveillance system is the need of the hour for contemporary society. Pattern recognition is one of the ground breaking recognition techniques to serve major applications such as biometric security, forensic investigation, Quick Response (QR) code and smart door locking systems, etc. [19]. The major challenges for developing a feature recognition system based on IoT applications are computational efficiency, accuracy, power consumption, and portability. There are a lot of existing techniques for pattern recognition are local binary pattern (LBP) and its variants, principle component analysis (PCA), and linear discrimination analysis (LDA) developed by the various researcher. Among these techniques, the local binary pattern technique is the most popular, investigated, and scrutinized due to its quality features such as tolerance against illumination changes, ease of implementation, computational simplicity, and fast response [31]. LBP, along with its variants, are investigated by the authors for classification problems [2]. The face image is divided into 8 × 8 or 16 × 16 regions, and then LBP feature distributions are extracted. The histogram of such features is computed region-wise, and a global concatenated histogram is used as a face descriptor. The performance of the proposed method is evaluated under different challenges [14]. The main idea of the EVBP descriptor is based on Virtual Electric Field (VEF). Authors combined Local Binary Pattern (LBP) based on the VEF. The neighbourhood of each pixel is assumed as a grid of virtual electric charges that are electrostatically balanced. The LBP concept is applied to the neighbourhood to generate the EVBP based representation of the face. This representation is computed for all four directions using the corresponding four electrical interactions [9]. A novel face feature extraction approach based on LBP and Two Dimensional Locality Preserving Projections (2DLPP) is explored. This approach aims to enhance the texture features without disturbing the space structure properties of a face image. LBP nullifies the variation in illumination and noise due to which the detailed texture characteristics of face images are enhanced. 2DLPP is performed to keep prominent features and decrease the feature size. In the proposed mechanism, the Nearest Neighbourhood Classifier (NNC) is used to classify the faces [45]. A new approach named the Two Directional Multi-level Threshold-LBP Fusion (2D–MTLBP-F) is proposed to illuminate invariant face recognition. The Threshold Local Binary Pattern (TLBP), combined with the Discrete Cosine Transform (DCT), is investigated. The LBP with different thresholds and neighbourhoods can be used to generate information. This information can be used to enhance the recognition rate. In the proposed method face images are normalized using DCT normalization technique, the resultant images are transformed into 61 levels of TLBP with different thresholds, and then the normalized DCT image is fused into these TLBP layers and face recognition is performed using the sparse sensing classifier (SRC) [3]. A novel technique called Weber Local Binary Image Cosine Transform (WLBI-CT) merges the frequency components of images obtained through Weber local descriptor and local binary descriptor in frequency domain [15]. These frequency components are invariant to multi-scale and multi-orientation facial images for facial expressions. Selection of significant and prominent feature set is the key to highly accurate face recognition, texture classification [16, 18] and scene classification [35, 42]. Despite exotic properties and applications of LBP, its extracted features are very sensitive to the image noise. In any image small variations may drastically modify the LBP features [22]. The number of LBP codes occurs very significantly thereby infrequent features are difficult to measure from the particular histogram bin and compact features of the image are difficult to calculate and become almost incomprehensible for the system.

Further, Uniform LBP is used for the dimensions reduction [12]. For the binary codes contain less than three transitions from 1 to 0 and vice versa are called uniform patterns. It has been observed that the uniform patterns are less than 90% of total patterns for (8, 1) neighbourhood and almost 70% for (16, 2) neighbourhood but still the further reduction of the dimensions of the image poses a serious challenge. To target the issue of significant dimensional reduction of LBP descriptor, many subspace approaches are reported in literature [25]. The Principal Component Analysis (PCA) approach is reported to remove co-occurrence features [11]. Still, PCA is hypersensitive to the noise, and its suitability is restricted to the small data analysis, and the recognition rate remains insufficient [4]. LDA is another useful method to reduce the feature dimensions but its computational complexity and rotation variant approach limits its uses [28]. In order to address computational complexity and for computational load reduction other prominent techniques reported are Power Method [33], QR factorization [21] and subspace iteration methods [8] but these approaches suffer from slow convergence under situations, such as low signal to noise ratio and unknown subspace dimensions.

The algorithms and their variants as shown in Table 1 achieve optimum accuracy. For IoT based surveillance systems these methods are too complex. The complexity is in terms of computation time and run time memory requirements. Multimodal biometric identification approach is proposed for human verification based on voice and face recognition fusion, for the surveillance system voice recognition module is difficult to implement [1]. Reducing the effect of noise due to the illumination on face database has been proposed for face recognition [6].Hybrid feature extraction (HFE) technique is proposed for overcome the anti ageing effect of face recognition. Results of the algorithm is proved on different database but the complexity in terms of training and testing is not suitable for IoT based fast recognition systems [30]. A multi-feature fusion framework is proposed in literature with Gabor and deep feature for small sample face recognition, accuracy and performance of algorithm is up to the mark still extraction of feature process is lengthy and time consuming [46]. Also, there are various deep learning algorithms involving neural networks like Convolution Neural Network (CNN) and its variants [13, 29] which were explored by various researchers in past few years. These algorithms are computationally expensive and require specialized hardware like GPU’s for their development and deployment [32]. These algorithms are not suitable for IoT applications in real time where limited resources and power are available. Thus, there is a requirement of an approach which can address the above mentioned issues efficiently and is also suitable for feature dimensionality reduction in a short span of time.

Table 1 Comparison of different algorithms with respect to database and real time implementation

Therefore, a low power and less computational method need to be evolved for IoT based surveillance systems. Some authors proposed fast space decomposition [37, 38] to perform feature dimensional reduction. It is useful for optimum compact feature extraction. Face and texture datasets can be used for the validation of these methods. Recently, many authors are using Raspberry Pi board for IoT based surveillance systems as it is available at an affordable price for prototyping of the systems. The major contribution in the paper are:

  1. 1.

    An efficient framework is proposed for surveillance systems for smart cities using IoT devices. The proposed framework works well for real time applications in industries for employee identification and surveillance systems in smart cities.

  2. 2.

    Efficient fast subspace decomposition over Chi Square transformation is proposed. This transformation has yield better recognition rates over various datasets.

  3. 3.

    Least error rate is achieved by the proposed technique for AR database, LFW, O2FN and Daynmic texture database. Further higher TAR is achieved using proposed method as compared to existing recognition techniques for varying values of FAR.

The paper is divided into four sections. The first section has already introduced the research problem and presented the literature survey. The second section of the paper explores and explains the algorithm of real time face recognition including face detection, face image enhancement, features extraction, dimensional reduction, face classification and other recognition applications. The penultimate section of the paper presents the experimental setup, results, and their discussion; and the last section concludes the paper.

2 Methodology

The proposed algorithm is elucidated in Fig. 1. Face detection; alignment and enhancement are achieved by using standard algorithms available in OpenCV library. The face image is normalized as preprocessing step to convert the complete dataset in the common range. Normalization process helps to handle various image datasets having different size and format of face images. Face identification is accomplished by extracting the required features using modified LBP. The LBP features are further reduced using proposed technique and the reduced features are stored along with Face ID. The stored features are uploaded on cloud or on IoT device for real time deployment. Once system is deployed and any face is detected by the camera the features are extracted and reduced in real time for recognition. If the face is recognized from the database, the name and identity are displayed otherwise the face may be registered in the database for future recognition. The authors tested this system on raspberry Pi board but this can be further extended to other IoT devices.

Fig. 1
figure 1

Block of real time vision system on Raspberry Pi board

Pseudo code of the proposed algorithm is as follows:

Training Phase:

  • Step 1: Normalize the dataset to convert the dataset by substracting the mean image and divide by variance. Also divide the image into 8 × 8 bins for computation of Local Binary Patterns

  • Step 2: Apply Local Binary Pattern of each image using uniform LBP.

  • Step 3: Create histogram of each bin and concatenate the histograms to get global histogram

  • Step 4: Apply Chi Square transformation of the resultant image to achieve Gaussian distribution

  • Step 5: Store the transformed data into the requisite format into the csv/xml file.

Testing/Deployment Phase:

  • Step 1: Read the stored csv/xml file.

  • Step 2: Input image from Camera/DataSet for recognition

  • Step 3: Apply LBP, Chi Square Transformation as per training on this single image.

  • Step 4:Compute Chi Square Distance of test image to all the images in the dataset and Classify the image to the minimum error class.

Testing/Deployment can be done on a local Machine or on IoT devices. In case of IoT device csv/xml file and trained model need to be deployed on IoT device.

2.1 Optimize computational efficiency

The proposed Chi-square transformed fast subspace LBP algorithm described as: Initially, uniform \( {LBP}_{8,2}^{u^2} \) is extracted the feature of a query image where subscript 8,2 represents eight neighbours at a distance of 2. Superscript u2 stands for using codes for uniform patterns and one code for all other patterns. The central pixel denoted as (xc, yc), P denotes the

$$ {LBP}_{P,R}\left({x}_c,{y}_c\right)={\sum}_{P=0}^{P-1}S\left({i}_c-{i}_p\right){2}^P $$
(1)

sampling points on a circle with radius R, ic and iP denotes, gray-scale values of the central pixel respectively [3]. Thresholding function S(a) may be defined as

$$ S(a)=\left\{\begin{array}{c}1, if\ a\ge 0\\ {}\kern0.5em 0\ otherwise\end{array}\right\} $$
(2)
$$ {h}_{LBP}=\sum \limits_aS\left({f}_{LBP}(a)\right),P $$
(3)

The hLBP is features histogram calculated by the standard LBP algorithm. Further, Chi-Square transformation is performed to make the distribution of the PDF of LBP as Gaussian thereby optimum usage of extracted LBP features is achieved.

This Chi square transformation is performed by taking two samples of LBP features denoted as ‘a’ and ‘b’. These samples further introduce another feature vector x = {x1, x2, x3………. xd} where each element of xi is represented as:

$$ {x}_i=\frac{a_i-{b}_i}{\sqrt{a_i+{b}_i}} $$
(4)

To evaluate the Chi squared distance the normalization of ‘x’ is performed as:

$$ {x}^Tx={\sum}_{i=0}^d\frac{{\left({a}_i-{b}_i\right)}^2}{a_i-{b}_i} $$
(5)

Now, the fast sub space decomposition is applied on the input LBP feature ‘x’ for the dimensional reduction as:

$$ x(t)=A\left(\theta \right)s(t)+n(t) $$
(6)

Where s(t) is the LBP feature histogram, A(θ) is the subspace span with the dimension ‘d’, n(t) represents additive noise and x(t) is the array output observed at time ‘′t ′  = 1, …. , N. In order to remove the unreliable features, the co-variance matrices of signal x(t) is calculated as:

$$ {W}_x=E\left\{x(t){x}^H(t)\right\}=A\left(\theta \right){W}_s{A}^H\left(\theta \right) $$
(7)

Where Ws is the co-variance matrix of the signal and the decomposition of Wx signal for the finite number of features ‘N’ (say) can be written as:

$$ {\hat{W}}_x=\frac{1}{N}\sum \limits_{t=1}^Nx(t){x}^H(t) $$
(8)

\( {\hat{W}}_x \) is the signal subspace and its dimension is calculated from the ‘d’ eigenvectors {e1, ……., ed} of \( {\hat{W}}_x \). Now the task is to calculate the optimal value of ‘d’ so that non-repeated feature of the LBP histogram can be extracted. The length of ‘d’ can be evaluated using the non-repeated Eigen values of the co-variance matrices [37]. This optimal length of ‘d’ is calculated by taking new statistics in consideration as reported in [37]. The extracted features of the trained data set are reduced by the signal subspace vector. Thereafter, the trained data set is stored in the system memory and the signal subspace vector extracts the optimal features for all the testing samples.

2.2 Recognition

The proposed approach reduces features of histogram of the trained data set stored in the system memory. For the recognition of the given query image, the reduced feature histogram can be computed. Thereafter minimum distance of the features is calculated by Chi square distance.

Various feature similarity approximation techniques between the test image features histogram and stored trained image feature histogram such as log-likelihood, Euclidean distance, histogram intersection technique and Chi square distance are probed. In the proposed work, Chi square distance calculation is used for recognition. Further authors substantiate that after applying weights to the unique features on the image gives better results in terms of accuracy and time complexity. The extracted feature image and the histogram vector is shown in Fig. 2.

Fig. 2
figure 2

a Extracted feature image, b Histogram vector

2.3 Proposed architecture for IoT applications

The proposed architecture is shown in Fig. 3. The High end server in the architecture is used to store dataset, module training. The trained model and the computed features are then stored in the common dataset. This common dataset is either on cloud or inside IoT device memory. The IoT Gadget is used to deploy the model in real time. The gadget is also connected to cctv/web camera for real time input. The trained model inside the gadget will work as Identfication module for all applications like Employee Identification use Face, Security Surveillance in Industries and Security device for Vehicles. The computed decision can further be communicated to mobile device for further actions.

Fig. 3
figure 3

Proposed architecture

3 Results and discussions

The experiments are performed and validated on desktop and Raspberry Pi. The desktop machine is used with Octa core i5 processor of 2.7 GHz, 4 GB-DDR3 RAM, and Linux (Ubuntu 16.04) operating system using OpenCV (version 3.2.0). Raspberry Pi board having Quad core 1.2 GHz Broadcom BCM2837 64 bit CPU, 1 GB RAM with 8GB memory card. The proposed approach is validated on four different databases and the cross-validation technique gives the performance of the algorithm. The analysis and comparison of proposed algorithm is mentioned below for each dataset.

  1. (a)

    Analysis of AR dataset for face recognition

The AR database [5] has 4000 images of 126 different faces in which 56 female and 70 male faces are included and few samples are showen in Fig. 4. The images are normalized to 150 × 130 pixel and further divided into 8 × 8 blocks. The extracted features of LBP are 59 × 8 × 8 = 3776 dimension. The proposed algorithm has been implemented on features for further reduction. Comparison of the proposed algorithm with existing approaches is shown in Table 2. It is observed that the performance of proposed algorithm is better than the existing approaches except PmSVM-Chi2 and PmSVM-HI as both of these are error free on the given dataset.

Fig. 4
figure 4

AR dataset [5]

Table 2 Comparison of the existing approaches on different database
  1. (b)

    Analysis of O2FN Mobile dataset for face recognition

The O2FN mobile dataset [23] contains 2000 images of 50 different faces of 144 × 176 pixels and few samples of database are shown in Fig. 5. This database is chosen to corroborate the mobile face recognition. Comparison of proposed method with the extant approaches for the same dataset is shown in Table 2. It is observed that the performance of the proposed algorithm is better than the existing approaches.

Fig. 5
figure 5

O2FN mobile dataset [23]

  1. (c)

    Analysis of LFW dataset for face recognition

The LFW dataset [41] contains 13,233 images of faces of 5749 different persons and eight diffenernt faces are shown in Fig. 6. In this dataset all the face images collected from internet showcase variation of expression, posture and illumination. The high dimension LBP feature gives robust performance than baseline LBP feature and baseline HOG feature [40]. Comparison of proposed algorithm with various existing approaches on the basis of percentage error rate is shown in Table 2. It is observed that the proposed approach performed better than extant subspace approaches although memory consumption and computation cost is quite high for this data set.

Fig. 6
figure 6

LFW dataset [41]

  1. (d)

    Analysis of DynTex++ database for dynamic texture recognition.

The DynTex++ database [20] contains 36 classes and every class has 100 sequence of 50 × 50 × 50 size. This dataset is widely using for dynamic texture recognition and it has large dimension as compared to face databases because of this it consumes relatively more memory and computation cost also soars. For the validation of approaches, the test bench is designed as five cross average validation; 80 sequences for training set, rest 20 sequences for testing. The same experiment is repeated for 5 time and average results are taken into consideration. Comparative analysis of proposed algorithm with existing approaches is shown in Table 2. It is clear that proposed approach performed better as compared to the existing approaches.

In order to get better physical insight of the proposed technique a comparison analysis of error rate percentage with respect to reduced feature percentage and error rate percentage for standard algorithms have been performed. By using the proposed feature reduction technique, the percentage change in error rate with respect to the percentage change in reduction of features is studied for the standard databases as represented in Fig. 7. Less than 3% error rate is achieved with 27% reduction of features for the entire tested database except LFW in which it is less than 10%. In comparison to the existing recognition and detection algorithms the proposed technique exhibits least error rate with maximum feature reduction in minimum time for all the standard databases. Moreover, the proposed technique is dynamic in nature.

Fig. 7
figure 7

Percentage reduced features with respect to percentage error rate

Further, the error rate performance of the existing algorithms for the standard databases has been compared to the proposed algorithm as represented in Fig. 8. It is observed that the percentage error rate is lowest for the proposed algorithm for all the standard databases as compared to existing algorithms.

Fig. 8
figure 8

Performance of different algorithms for standard databases

Further, the proposed method is also contrasted for standard datasets with different algorithms. The parameters including precision, sensitivity and F-measure of the intended technique are compared with other existing algorithms for standard datasets as represented in Fig. 9a-d. It is observed that the precision of the proposed algorithm is comparable to the existing algorithm, but on the other hand, the sensitivity and F-measure is much larger, which proves the efficacy of the retrieved features through the use of proposed technique. Therefore, by applying the proposed technique the computation time for recognition as well as the memory usage has been reduced significantly. This makes the proposed algorithm suitable for real time applications and memory devices like Raspberry Pi etc.

Fig. 9
figure 9

a Analysis of precision, sensitivity and F-measure of AR dataset of the proposed approach with existing algorithms. b Analysis of precision, sensitivity and F-measure of O2FN mobile dataset of the proposed approach with existing algorithms. c Analysis of precision, sensitivity and F-measure of LFW dataset of the proposed approach with existing algorithms. d Analysis of precision, sensitivity and F-measure of Dynamic Texture dataset of the proposed approach with existing algorithms

False acceptance rates (FAR) and the true acceptance rates (TAR) are significant parameters for all surveillance related applications. As face recognition is now a days gaining popularity in surveillance environment so a comparison of FAR & TAR has been performed for all readily available datasets with the existing algorithms as shown in Fig. 10a-d. It is observed that TAR for varying for the proposed algorithm is higher than the existing recognition techniques for varying values of FAR which makes it highly efficacious for potential security and forensic investigation applications.

Fig. 10
figure 10

a TAR Vs FAR comparative analysis of Dynamic Texture database of existing and proposed algorithms. b TAR Vs FAR comparative analysis of LFW database of existing and proposed algorithms. c TAR Vs FAR comparative analysis of O2FN database database of existing and proposed algorithms. d TAR Vs FAR comparative analysis of AR database of existing and proposed algorithms

For verification of performance and accuracy of the proposed algorithm, it is compared with existing algorithms for standard databases as represented in Fig. 11. It is noticed that the proposed technique is as accurate as the other algorithms even after significant reduction in features. This shows that the features which are dropped or neglected were redundant and have no impact on accuracy. Therefore, the reduction in features while maintaining the accuracy of the technique saves time, memory and power consumption.

Fig. 11
figure 11

Percentage accuracy comparisons

For further verification of the performance of the suggested method the feature dimensionality is compared with the existing techniques as illustrated in Table 3. It is noticed that the proposed algorithm exhibits maximum dimensionality reduction as compared to existing algorithm. Therefore, the proposed technique is capable of performing visual recognition efficiently with minimum feature size in minimum time span. This capability of the suggested approach makes it suitable for the real time implementation on the Raspberry Pi board for the potential uses in IoT applications such as forensic applications, identification in banking sector, in AADHAR database and texture recognition applications. The power consumption of the board is optimum due to better efficiency of the algorithm.

Table 3 Feature comparison of the proposed algorithm with the existing approaches in the literature

3.1 Real time implementation of vision recognition system

The proposed algorithm is validated through experimental results shown in results section. The features have been reduced effectively so that deployment of the algorithm on IoT devices is achieved in real time. For the real time application it can be implemented on the suitable IoT devices for prototype of the vision system. The proposed system is implemented using open source library OpenCV in C on Raspberry Pi running Ubuntu with USB camera.

4 Conclusion

Effective dimensionality diminution by utilising fast subspace technique with Chi square transformation for Smart City Surveillance using IoT device. This technique is applied to the extracted feature histogram of local binary pattern for further reduction of redundant features. A reduction of 13,476 features is achieved in comparison to the basic LBP algorithm. The reduction of unreliable features improves the capacity of the system memory and reduces the response time of the system which is desired for IoT applications. The proposed algorithm is verified and validated on the sample face of author himself by using Raspberry Pi as the hardware development kit. The same steps can be implemented on other IoT devices like Arduino, RoboCV etc. The proposed algorithm exhibits minimum error rate with maximum feature reduction in minimum time for all the standard databases maintaining the accuracy as much as of the existing techniques. These characteristics of the proposed scheme prove it useful for real time implementation of face and other recognition for IoT based surveillance system.

In future, this method can be explored further in consideration with potential deep learning techniques for implementation of real time IoT applications. The same architecture and algorithm be deployed and tested for any visual recognition problem. The proposed architecture and algorithm is generic enough as shown in the results section that it works well on face as well as texture recognition. The real time speed to problems like highway surveillance may be a bottleneck and may need further investigation. The further improvement can be investigated in three areas. First to further reduce the computation complexity so that the frames per seconds of the system can be increased. Secondly, the power consumption factor needs to be investigated and reported for proposed architecture. In future, the proposed architecture can be extended to apply on datasets where human faces are having face masks in post COVID-19 era for person identification. Also this work can be utilized for automatic attendance during online sessions as in the pandemic. Furthermore, this scheme can be explored in doing fingerprint and iris recognition for complete biometric verification in banking or other high security services.