Automatic Speaker Recognition from Speech Signals Using Self Organizing Feature Map and Hybrid Neural Network

https://doi.org/10.1016/j.micpro.2020.103264Get rights and content

ABSTRACT

This paper explains a new hybrid method for Automatic Speaker Recognition using speech signals based on the Artificial Neural Network (ANN). ASR performance characteristics is regarded as the foremost challenge and necessitated to be improved. This research work mainly focusses on resolving the ASR problems as well as to improve the accuracy of the prediction of a speaker.. Mel Frequency Cepstral Coefficient (MFCC) is greatly exploited for signal feature extraction.The input samples are created using these extracted features and its dimensions have been reduced using Self Organizing Feature Map (SOFM). Finally, using the reduced input samples, recognition is performed using Multilayer Perceptron (MLP) with Bayesian Regularization.. The training of the network has been accomplished and verified by means of real speech datasets from the Multivariability speaker recognition database for 10 speakers. The proposed method is validated by performance estimation as well as classification accuracies in contradiction to other models.The proposed method gives better recognition rate and 93.33% accuracy is attained.

Section snippets

INTRODUCTION

Speech is a mode of communication for humans. Without seeing their face, a person's identity can be recognized from speech. Automatic speaker recognition is a method of recognizing a person automatically using their distinctive biological characteristics [1]. Every individual has unique voices because of their different vocal tract shape, larynx sizes, and other sound production mechanism. It also allows one to differentiate a person from their voice greatly based on other characteristics of

DATABASE

The selection of the database is a highly important step to implement any proposed technique by researchers. In this research, the Indian scenario named as IITG Multivariability Speaker Recognition Database is used[26]. Based on various recording conditions, this database organization is accomplished into four phases.

They are, IITG-MV phase-I, IITG-MV phase-II, IITG-MV phase-III and IITG-MV phase-IV speaker recognition databases. Another time IITG-MV phase-IV is further split into three parts.

3. Featured extraction

Extracting the best parametric features of the acoustic signals is a significant task for designing a speaker recognition and thus it extensively affects the performance of the system. In feature extraction, we will be converting the input waveform into an acoustic feature vectors series. Each vector represents signal information in a small time window. In this paper, the Mel-frequency cepstral coefficient (MFCC) is employed for feature extraction as it is a commonly used feature extraction

4. Dimensionality reduction using sofm (self organizing feature map)

Self Organizing Feature Map (SOFM) is another methodology utilized for creation of input samples through these extracted features besides reduction of its dimensions. This issue is mitigated through performing dimensionality reduction first on extracted features (which are 12 cepstral coefficients per signal) using the SOFM. The SOFM is an unsupervised, competitive and self-organized neural network called a Self-Organized Map (SOM), introduced by Kohonen in 1982 [28]. SOFM performs the mapping

RECOGNITION USING MLP-BP

The reduced input samples from SOFM will be used as input to MLP

We used the neural network called the Multilayer Perceptron (MLP). The basic structure of the neural network is shown in Fig. 3. The network comprises three layers: input, hidden, and output. These layers are connected by synaptic weights. The learning of the network is done by the back-propagation (BP) algorithmis a supervised learning method that is on the basis of error-correction principle. The following are the parameters used

EXPERIMENTAL RESULTS

MATLAB R2018a is utilized in this research for performing simulation. The nntool of Matlab is a neural network graphical user interface tool that is used for training and simulation.

COMPARISON

Our proposed method of MLP-BR is compared with two standard algorithms for the recognition of the speaker. Table 2 shows the proposed method in contradiction with other two methods pertaining to no. of iterations and performance. The details of the other two methods are shown below.

ANALYSIS

For very high dimensional samples, to correctly train a neural network is difficult. The size of our features vector/input vector i.e. the output from MFCC is 44,868 × 156. The size of this vector is large and required reduction to correctly trained the network and correctly do the recognition. For this, the optimization technique is implemented first for the input samples using the SOFM.The proposed algorithm and the other two standard methods are trained and tested without using SOFM to prove

CONCLUSION

This research presents a novel automatic speaker recognition using speech signals. Feature extraction from the signal is done using MFCC. The input samples are created using extracted MFCC feature vectors and the dimensions have been decreased using Self Organizing Neural Network (SOFM) so that the supervised classifier can be trained correctly. Finally, using the reduced input samples recognition is achieved through Multilayer Perceptron (MLP) by means of Bayesian Regularization. The training

Declaration of Competing Interest

None.

ACKNOWLEDGEMENT

The authors would like to acknowledge that the work was carried out at the SMDP-C2SD Laboratory of NIT Manipur, funded by the Ministry of Electronics and Information Technology (MeitY), Government of India. We would also like to express our gratitude to the System Intelligence group of Department of Electronics and Communication Engineering, NIT Manipur.

Dr. Kharibam Jilenkumari Devi is presently working as Lecturer in NIT Manipur in the Dept. of ECE. She received her Ph.D. from the NIT Manipur, India, M.Tech (ECE) from NERIST, Arunachal Pradesh, India. Her area of interest Includes Image Processing, Signal Processing, and Neural Networks.

REFERENCES (35)

  • Werner Endres et al.

    Voice spectrograms as a function of age, voice disguise, and voice imitation

    J. Acoust. Soc. Am.

    (1971)
  • Sadaoki Furui

    An analysis of long-term variation of feature parameters of speech and its application to talker recognition

    Electronics and Communications in Japan

    (1974)
  • M. Phythian et al.

    Effects of speech coding on text-dependent speaker recognition

  • Lawrence Rabiner et al.

    An introduction to hidden Markov models

    In IEEE ASSP MAGAZINE

    (1986)
  • Herbert Gish et al.

    Segregation of speakers for speech recognition and speaker identification

    In International Conference on Acoustics, Speech, and Signal Processing. (ICASSP-91)

    (1991)
  • M.-.H. Siu et al.

    An unsupervised, sequential learning algorithm for the segmentation of speech waveforms with multiple speakers

  • Namrata Dave

    Feature Extraction Methods LPC, PLP, and MFCC In Speech Recognition

    Int. J. For Advance Research in Engineering and Technology

    (2013)
  • Cited by (22)

    • A local knit pattern-based automated fault classification method for the cooling system of the data center

      2021, Applied Acoustics
      Citation Excerpt :

      There are many sound-based studies in the literature. It is seen that sound signals are used for many purposes such as classification of environmental sound classification [11,12], fault diagnosis [13], disease detection [14], the acoustic scene [15], speaker recognition [16]. Machine learning (ML) methods have been used for automatic fault diagnosis in industrial applications [14,17,18].

    • Integration of AI and traditional medicine in drug discovery

      2021, Drug Discovery Today
      Citation Excerpt :

      An example of DNN is the multilayer perceptron (MLP), a network with more than three layers that can perform complex nonlinear modeling using hyperbolic tangent or logistic regression as an activation function [26]. MLP is being used for natural language processing in speech recognition, handwriting recognition, and machine translation [27–29]. An upgraded version of MLP is a convolutional neural network (CNN), where two or more MLPs are sequentially connected.

    • Application of radar signal processing and image display algorithm based on computer hardware system in intelligence processing

      2021, Microprocessors and Microsystems
      Citation Excerpt :

      However, the waveform reflected from the all-encompassing target is sent with the objective amplitude potential [11]. A story repeat calculation is proposed to accept channels with the ultimate goal of improving waveforms and extending positional execution [12,14]. Comparable recurrence calculations are additionally created for a situation where the objective drive reaction's insights or the vulnerability set can be accessed [13].

    View all citing articles on Scopus

    Dr. Kharibam Jilenkumari Devi is presently working as Lecturer in NIT Manipur in the Dept. of ECE. She received her Ph.D. from the NIT Manipur, India, M.Tech (ECE) from NERIST, Arunachal Pradesh, India. Her area of interest Includes Image Processing, Signal Processing, and Neural Networks.

    Dr. Ngangbam Herojit Singh is Assistant Professor in NIT Mizoram in the Dept. of CSE. He received his ph. D from the NIT Manipur, India, and M.Tech (ECE) fromVel Tech Multi Tech Dr. R.S Engineering College, Chennai. His-area of interest includes Machine learning, Advance learning, Neural networks, Mobile Robotics, Fuzzy systems

    Dr. Khelchandra Thongam is an Assistant Professor in NIT Manipur in the Dept. of CSE. He received his ph. D from the University of Aizu, Japan, M.Tech (CSE) from the University of Aizu, Japan. His-area of interest includes Intelligent System Design, Soft Computing, Hybrid Intelligent System, and Robotics.

    View full text