Automatic Speaker Recognition from Speech Signals Using Self Organizing Feature Map and Hybrid Neural Network
Section snippets
INTRODUCTION
Speech is a mode of communication for humans. Without seeing their face, a person's identity can be recognized from speech. Automatic speaker recognition is a method of recognizing a person automatically using their distinctive biological characteristics [1]. Every individual has unique voices because of their different vocal tract shape, larynx sizes, and other sound production mechanism. It also allows one to differentiate a person from their voice greatly based on other characteristics of
DATABASE
The selection of the database is a highly important step to implement any proposed technique by researchers. In this research, the Indian scenario named as IITG Multivariability Speaker Recognition Database is used[26]. Based on various recording conditions, this database organization is accomplished into four phases.
They are, IITG-MV phase-I, IITG-MV phase-II, IITG-MV phase-III and IITG-MV phase-IV speaker recognition databases. Another time IITG-MV phase-IV is further split into three parts.
3. Featured extraction
Extracting the best parametric features of the acoustic signals is a significant task for designing a speaker recognition and thus it extensively affects the performance of the system. In feature extraction, we will be converting the input waveform into an acoustic feature vectors series. Each vector represents signal information in a small time window. In this paper, the Mel-frequency cepstral coefficient (MFCC) is employed for feature extraction as it is a commonly used feature extraction
4. Dimensionality reduction using sofm (self organizing feature map)
Self Organizing Feature Map (SOFM) is another methodology utilized for creation of input samples through these extracted features besides reduction of its dimensions. This issue is mitigated through performing dimensionality reduction first on extracted features (which are 12 cepstral coefficients per signal) using the SOFM. The SOFM is an unsupervised, competitive and self-organized neural network called a Self-Organized Map (SOM), introduced by Kohonen in 1982 [28]. SOFM performs the mapping
RECOGNITION USING MLP-BP
The reduced input samples from SOFM will be used as input to MLP
We used the neural network called the Multilayer Perceptron (MLP). The basic structure of the neural network is shown in Fig. 3. The network comprises three layers: input, hidden, and output. These layers are connected by synaptic weights. The learning of the network is done by the back-propagation (BP) algorithmis a supervised learning method that is on the basis of error-correction principle. The following are the parameters used
EXPERIMENTAL RESULTS
MATLAB R2018a is utilized in this research for performing simulation. The nntool of Matlab is a neural network graphical user interface tool that is used for training and simulation.
COMPARISON
Our proposed method of MLP-BR is compared with two standard algorithms for the recognition of the speaker. Table 2 shows the proposed method in contradiction with other two methods pertaining to no. of iterations and performance. The details of the other two methods are shown below.
ANALYSIS
For very high dimensional samples, to correctly train a neural network is difficult. The size of our features vector/input vector i.e. the output from MFCC is 44,868 × 156. The size of this vector is large and required reduction to correctly trained the network and correctly do the recognition. For this, the optimization technique is implemented first for the input samples using the SOFM.The proposed algorithm and the other two standard methods are trained and tested without using SOFM to prove
CONCLUSION
This research presents a novel automatic speaker recognition using speech signals. Feature extraction from the signal is done using MFCC. The input samples are created using extracted MFCC feature vectors and the dimensions have been decreased using Self Organizing Neural Network (SOFM) so that the supervised classifier can be trained correctly. Finally, using the reduced input samples recognition is achieved through Multilayer Perceptron (MLP) by means of Bayesian Regularization. The training
Declaration of Competing Interest
None.
ACKNOWLEDGEMENT
The authors would like to acknowledge that the work was carried out at the SMDP-C2SD Laboratory of NIT Manipur, funded by the Ministry of Electronics and Information Technology (MeitY), Government of India. We would also like to express our gratitude to the System Intelligence group of Department of Electronics and Communication Engineering, NIT Manipur.
Dr. Kharibam Jilenkumari Devi is presently working as Lecturer in NIT Manipur in the Dept. of ECE. She received her Ph.D. from the NIT Manipur, India, M.Tech (ECE) from NERIST, Arunachal Pradesh, India. Her area of interest Includes Image Processing, Signal Processing, and Neural Networks.
REFERENCES (35)
Speaker identification and verification using Gaussian mixture speaker models
Speech Commun
(1995)- et al.
An improved i-vector extraction for speaker verification
EURASIP J. Audio, Speech and Music Processing
(2015) - et al.
Performance Evaluation of Different Modeling Methods and Classifiers with MFCC and IHC Features for Speaker Recognition
Procedia Comput Sci
(2017) - et al.
Using SOM and PCA for analyzing and interpreting data from a P-removal SBR
Eng Appl Artif Intell
(2008) - et al.
Biometric Evidence in Forensic Automatic Speaker Recognition
- et al.
A review on automatic speech recognition architecture and approaches
Int. J. Signal Processing, Image Processing, and Pattern Recognition
(2016) Speaker recognition: a tutorial
Proceedings of the IEEE
(1997)- et al.
Automatic speaker identification through robust time-domain features and hierarchical classification approach
In Proceed. international conference on data processing and applications
(2018) Pattern Matching Procedure for Automatic Talker Recognition
J. Acoust. Soc. Am.
(1963)A method of speaker verification
J. Acoust. Soc. Am.
(1971)
Voice spectrograms as a function of age, voice disguise, and voice imitation
J. Acoust. Soc. Am.
An analysis of long-term variation of feature parameters of speech and its application to talker recognition
Electronics and Communications in Japan
Effects of speech coding on text-dependent speaker recognition
An introduction to hidden Markov models
In IEEE ASSP MAGAZINE
Segregation of speakers for speech recognition and speaker identification
In International Conference on Acoustics, Speech, and Signal Processing. (ICASSP-91)
An unsupervised, sequential learning algorithm for the segmentation of speech waveforms with multiple speakers
Feature Extraction Methods LPC, PLP, and MFCC In Speech Recognition
Int. J. For Advance Research in Engineering and Technology
Cited by (22)
Optimizing building spatial morphology to alleviate human thermal stress
2024, Sustainable Cities and SocietyDiscover botnets in IoT sensor networks: A lightweight deep learning framework with hybrid self-organizing maps
2023, Microprocessors and MicrosystemsA local knit pattern-based automated fault classification method for the cooling system of the data center
2021, Applied AcousticsCitation Excerpt :There are many sound-based studies in the literature. It is seen that sound signals are used for many purposes such as classification of environmental sound classification [11,12], fault diagnosis [13], disease detection [14], the acoustic scene [15], speaker recognition [16]. Machine learning (ML) methods have been used for automatic fault diagnosis in industrial applications [14,17,18].
Integration of AI and traditional medicine in drug discovery
2021, Drug Discovery TodayCitation Excerpt :An example of DNN is the multilayer perceptron (MLP), a network with more than three layers that can perform complex nonlinear modeling using hyperbolic tangent or logistic regression as an activation function [26]. MLP is being used for natural language processing in speech recognition, handwriting recognition, and machine translation [27–29]. An upgraded version of MLP is a convolutional neural network (CNN), where two or more MLPs are sequentially connected.
Application of radar signal processing and image display algorithm based on computer hardware system in intelligence processing
2021, Microprocessors and MicrosystemsCitation Excerpt :However, the waveform reflected from the all-encompassing target is sent with the objective amplitude potential [11]. A story repeat calculation is proposed to accept channels with the ultimate goal of improving waveforms and extending positional execution [12,14]. Comparable recurrence calculations are additionally created for a situation where the objective drive reaction's insights or the vulnerability set can be accessed [13].
Dr. Kharibam Jilenkumari Devi is presently working as Lecturer in NIT Manipur in the Dept. of ECE. She received her Ph.D. from the NIT Manipur, India, M.Tech (ECE) from NERIST, Arunachal Pradesh, India. Her area of interest Includes Image Processing, Signal Processing, and Neural Networks.
Dr. Ngangbam Herojit Singh is Assistant Professor in NIT Mizoram in the Dept. of CSE. He received his ph. D from the NIT Manipur, India, and M.Tech (ECE) fromVel Tech Multi Tech Dr. R.S Engineering College, Chennai. His-area of interest includes Machine learning, Advance learning, Neural networks, Mobile Robotics, Fuzzy systems
Dr. Khelchandra Thongam is an Assistant Professor in NIT Manipur in the Dept. of CSE. He received his ph. D from the University of Aizu, Japan, M.Tech (CSE) from the University of Aizu, Japan. His-area of interest includes Intelligent System Design, Soft Computing, Hybrid Intelligent System, and Robotics.