Automatic Speaker Recognition from Speech Signals Using Self Organizing Feature Map and Hybrid Neural Network

doi:10.1016/j.micpro.2020.103264

Microprocessors and Microsystems

Volume 79, November 2020, 103264

https://doi.org/10.1016/j.micpro.2020.103264 Get rights and content

ABSTRACT

This paper explains a new hybrid method for Automatic Speaker Recognition using speech signals based on the Artificial Neural Network (ANN). ASR performance characteristics is regarded as the foremost challenge and necessitated to be improved. This research work mainly focusses on resolving the ASR problems as well as to improve the accuracy of the prediction of a speaker.. Mel Frequency Cepstral Coefficient (MFCC) is greatly exploited for signal feature extraction.The input samples are created using these extracted features and its dimensions have been reduced using Self Organizing Feature Map (SOFM). Finally, using the reduced input samples, recognition is performed using Multilayer Perceptron (MLP) with Bayesian Regularization.. The training of the network has been accomplished and verified by means of real speech datasets from the Multivariability speaker recognition database for 10 speakers. The proposed method is validated by performance estimation as well as classification accuracies in contradiction to other models.The proposed method gives better recognition rate and 93.33% accuracy is attained.

Section snippets

INTRODUCTION

Speech is a mode of communication for humans. Without seeing their face, a person's identity can be recognized from speech. Automatic speaker recognition is a method of recognizing a person automatically using their distinctive biological characteristics [1]. Every individual has unique voices because of their different vocal tract shape, larynx sizes, and other sound production mechanism. It also allows one to differentiate a person from their voice greatly based on other characteristics of

DATABASE

The selection of the database is a highly important step to implement any proposed technique by researchers. In this research, the Indian scenario named as IITG Multivariability Speaker Recognition Database is used[26]. Based on various recording conditions, this database organization is accomplished into four phases.

They are, IITG-MV phase-I, IITG-MV phase-II, IITG-MV phase-III and IITG-MV phase-IV speaker recognition databases. Another time IITG-MV phase-IV is further split into three parts.

3. Featured extraction

Extracting the best parametric features of the acoustic signals is a significant task for designing a speaker recognition and thus it extensively affects the performance of the system. In feature extraction, we will be converting the input waveform into an acoustic feature vectors series. Each vector represents signal information in a small time window. In this paper, the Mel-frequency cepstral coefficient (MFCC) is employed for feature extraction as it is a commonly used feature extraction

4. Dimensionality reduction using sofm (self organizing feature map)

Self Organizing Feature Map (SOFM) is another methodology utilized for creation of input samples through these extracted features besides reduction of its dimensions. This issue is mitigated through performing dimensionality reduction first on extracted features (which are 12 cepstral coefficients per signal) using the SOFM. The SOFM is an unsupervised, competitive and self-organized neural network called a Self-Organized Map (SOM), introduced by Kohonen in 1982 [28]. SOFM performs the mapping

RECOGNITION USING MLP-BP

The reduced input samples from SOFM will be used as input to MLP

We used the neural network called the Multilayer Perceptron (MLP). The basic structure of the neural network is shown in Fig. 3. The network comprises three layers: input, hidden, and output. These layers are connected by synaptic weights. The learning of the network is done by the back-propagation (BP) algorithmis a supervised learning method that is on the basis of error-correction principle. The following are the parameters used

EXPERIMENTAL RESULTS

MATLAB R2018a is utilized in this research for performing simulation. The nntool of Matlab is a neural network graphical user interface tool that is used for training and simulation.

COMPARISON

Our proposed method of MLP-BR is compared with two standard algorithms for the recognition of the speaker. Table 2 shows the proposed method in contradiction with other two methods pertaining to no. of iterations and performance. The details of the other two methods are shown below.

ANALYSIS

For very high dimensional samples, to correctly train a neural network is difficult. The size of our features vector/input vector i.e. the output from MFCC is 44,868 × 156. The size of this vector is large and required reduction to correctly trained the network and correctly do the recognition. For this, the optimization technique is implemented first for the input samples using the SOFM.The proposed algorithm and the other two standard methods are trained and tested without using SOFM to prove

CONCLUSION

This research presents a novel automatic speaker recognition using speech signals. Feature extraction from the signal is done using MFCC. The input samples are created using extracted MFCC feature vectors and the dimensions have been decreased using Self Organizing Neural Network (SOFM) so that the supervised classifier can be trained correctly. Finally, using the reduced input samples recognition is achieved through Multilayer Perceptron (MLP) by means of Bayesian Regularization. The training

Declaration of Competing Interest

None.

ACKNOWLEDGEMENT

The authors would like to acknowledge that the work was carried out at the SMDP-C2SD Laboratory of NIT Manipur, funded by the Ministry of Electronics and Information Technology (MeitY), Government of India. We would also like to express our gratitude to the System Intelligence group of Department of Electronics and Communication Engineering, NIT Manipur.

Dr. Kharibam Jilenkumari Devi is presently working as Lecturer in NIT Manipur in the Dept. of ECE. She received her Ph.D. from the NIT Manipur, India, M.Tech (ECE) from NERIST, Arunachal Pradesh, India. Her area of interest Includes Image Processing, Signal Processing, and Neural Networks.

REFERENCES (35)

Douglas A. Reynolds
Speaker identification and verification using Gaussian mixture speaker models
Speech Commun
(1995)
Wei Li et al.
An improved i-vector extraction for speaker verification
EURASIP J. Audio, Speech and Music Processing
(2015)
Suma Paulose et al.
Performance Evaluation of Different Modeling Methods and Classifiers with MFCC and IHC Features for Speaker Recognition
Procedia Comput Sci
(2017)
D. Aguado et al.
Using SOM and PCA for analyzing and interpreting data from a P-removal SBR
Eng Appl Artif Intell
(2008)
A. Drygajlo et al.
Biometric Evidence in Forensic Automatic Speaker Recognition
S. Karpagavalli et al.
A review on automatic speech recognition architecture and approaches
Int. J. Signal Processing, Image Processing, and Pattern Recognition
(2016)
Joseph P Campbell
Speaker recognition: a tutorial
Proceedings of the IEEE
(1997)
R. Jahangir et al.
Automatic speaker identification through robust time-domain features and hierarchical classification approach
In Proceed. international conference on data processing and applications
(2018)
Sandra Pruzansky
Pattern Matching Procedure for Automatic Talker Recognition
J. Acoust. Soc. Am.
(1963)
G.R. Doddington
A method of speaker verification
J. Acoust. Soc. Am.
(1971)

Werner Endres et al.

Voice spectrograms as a function of age, voice disguise, and voice imitation

J. Acoust. Soc. Am.

(1971)

Sadaoki Furui

An analysis of long-term variation of feature parameters of speech and its application to talker recognition

Electronics and Communications in Japan

(1974)

M. Phythian et al.

Effects of speech coding on text-dependent speaker recognition

Lawrence Rabiner et al.

An introduction to hidden Markov models

In IEEE ASSP MAGAZINE

(1986)

Herbert Gish et al.

Segregation of speakers for speech recognition and speaker identification

In International Conference on Acoustics, Speech, and Signal Processing. (ICASSP-91)

(1991)

M.-.H. Siu et al.

An unsupervised, sequential learning algorithm for the segmentation of speech waveforms with multiple speakers

Namrata Dave

Feature Extraction Methods LPC, PLP, and MFCC In Speech Recognition

Int. J. For Advance Research in Engineering and Technology

(2013)

Cited by (22)

Optimizing building spatial morphology to alleviate human thermal stress
2024, Sustainable Cities and Society
Achieving urban cooling from a sustainable perspective requires strategic planning of building area (S) and height (H). However, there is a lack of human thermal stress assessment and it is not clear how to optimize the layout of building spatial morphology to alleviate human thermal stress. We simulated the Universal Thermal Climate Index (UTCI), characterizing high spatial resolution human comfort, by machine learning, and analyzed the relationship between building spatial morphology and UTCI to determine the feasible layout of building spatial morphology. Our findings indicated that the study area experienced poor human thermal comfort, with residents facing high thermal stress (average UTCI of 36 °C). Zoning analysis revealed that an increase in S resulted in a simultaneous rise in UTCI, while an increase in H leaded to a trend of UTCI that initially rose and then declined. An increase in S-rating had a more pronounced impact on elevating UTCI (0.29 °C on average) compared to an increase in H-rating (0.11 °C on average). To maintain UTCI within the UTCI threshold that characterized ideal human comfort, a trade-off relationship between S and H should be maintained, which was further influenced by the stationary and plunge intervals in their relationship curve. The findings have the potential to provide valuable insights for policymakers and stakeholders, aiding them in making informed decisions in urban planning to alleviate human thermal stress.
The category identification and transformation mechanism of rural regional function based on SOFM model: A case study of Central Plains Urban Agglomeration, China
2023, Ecological Indicators
Regional function identification and transformation mechanism exploration provide an entry point for a comprehensive understanding of the pattern and direction of rural development. This paper proposes a new analytical framework by establishing first, the rural production-living-ecological evaluation index; second, the optimal functional clustering of 186 counties in the Central Plains Urban Agglomeration area of China by combining the Self Organizing Feature Map network, Geodetector model, and Mann-Kendall test method; and third, the transformation mechanism by comparing the 2-phase zoning results with an interval of 10 years. An overall positive trend was found in the transformation of the rural function. The proportion of lagging development functional areas decreased by 20.5 %. The proportions of dominant function areas and composite coordination function areas increased by 5.4 % and 15.1 %, respectively. There was a synergistic relationship between production and living functions and an opposing relationship between ecological functions and the other two. The functional area with a synergistic relationship had the highest conversion rate, and most of the functional areas with trade-off relationships remained unchanged. The transfer-out rate of functional areas with synergistic and trade-off relations is negatively correlated with its lagging functional inferiority. The degree of difficulty in transitioning to functional areas with synergistic or trade-off relations is similar. The research reveals the mechanism and law of the evolution of rural regional functions and provides a methodology reference and policy planning basis for further research on rural areas.
Discover botnets in IoT sensor networks: A lightweight deep learning framework with hybrid self-organizing maps
2023, Microprocessors and Microsystems
In recent years, we have witnessed a massive growth of intrusion attacks targeted at Internet of things (IoT) devices. Due to inherent security vulnerabilities, it has become an easy target for hackers to target these devices. Recent studies have focused on deploying intrusion detection systems at the network's edge within IoT devices to localize threat mitigation and avoid computational expenses. Intrusion detection systems based on machine learning and deep learning algorithm has demonstrated the potential to detect zero-day attacks where traditional signature-based detection falls short. Thus, the purpose of the paper is to present a lightweight and robust deep learning framework for intrusion detection that has computational potential to be efficiently scaled down and deployed as a localized threat detection within IoT devices. The paper's methodology to demonstrate the scalability and threat detection performance is to train and test intrusion datasets such as NSL-KDD (Network Security Laboratory - Knowledge Discovery in Databases) and N-BaIoT (Network-Based Anomaly Internet of Things) to assess anomaly detection performance. In addition, the proposed Hybrid model is compared against a benchmark Artificial Neural Network model. The evaluation metrics are training time, precision, recall, accuracy, and f1-score, along with their macro and weighted averages. Significant findings show a 948% decrease in model training time and a 41.87% increase in f1-score when comparing the proposed Hybrid Self Organizing Maps (HSOM) model with the Artificial Neural Network model. Additionally, scaling down the nodes in the proposed Self Organizing Maps (SOM) model demonstrated a reduction of 955% in training time and a 27% increase in macro averages of precision, recall, and f1-score. A significant implication of this study would be adopting the proposed SOM model as localized IoT threat detection, as the research proves the increase in detection performance after scaling down the model's input and output nodes. The contribution of the research is a scalable and high-performant IoT threat detection framework suited for localized IoT deployment.
A local knit pattern-based automated fault classification method for the cooling system of the data center
2021, Applied Acoustics
Citation Excerpt :
There are many sound-based studies in the literature. It is seen that sound signals are used for many purposes such as classification of environmental sound classification [11,12], fault diagnosis [13], disease detection [14], the acoustic scene [15], speaker recognition [16]. Machine learning (ML) methods have been used for automatic fault diagnosis in industrial applications [14,17,18].
Many automated sound-based fault diagnosis or classification methods have been presented in the literature. A novel automatic fault diagnosis method is presented by using sounds for the cooling system of the data center. A novel feature generator and an iterative feature selector are used together to present an automated data center cooling system (DCCS) fault diagnosing method. A new feature generator is proposed inspired by knitting hence, it is called a local knit pattern (LKP). A multiple pooling based decomposition method is presented as a preprocessor. The LKP generates features from each signal. Iterative neighborhood component analysis (INCA) feature selector selects the most discriminative. Twelve classifiers are calculated in the classification phase. The selected classifiers were achieved greater than 90.0% classification accuracies, and the best-resulted classifier is Quadratic SVM. It reached 96.40% classification accuracy. Results show that new generation automated sound fault diagnosis applications can also be developed as novel sound-based fault detection applications.
Integration of AI and traditional medicine in drug discovery
2021, Drug Discovery Today
Citation Excerpt :
An example of DNN is the multilayer perceptron (MLP), a network with more than three layers that can perform complex nonlinear modeling using hyperbolic tangent or logistic regression as an activation function [26]. MLP is being used for natural language processing in speech recognition, handwriting recognition, and machine translation [27–29]. An upgraded version of MLP is a convolutional neural network (CNN), where two or more MLPs are sequentially connected.
Application of radar signal processing and image display algorithm based on computer hardware system in intelligence processing
2021, Microprocessors and Microsystems
Citation Excerpt :
However, the waveform reflected from the all-encompassing target is sent with the objective amplitude potential [11]. A story repeat calculation is proposed to accept channels with the ultimate goal of improving waveforms and extending positional execution [12,14]. Comparable recurrence calculations are additionally created for a situation where the objective drive reaction's insights or the vulnerability set can be accessed [13].
Hardware system based radar signal detection, specific progress and current limitations. As indicated by the radar signal's characteristics, the radar signal's plan and arrangement are introduced with the irregularities and favorable conditions of the mirror mark. The radar signal's multidisciplinary preparatory invention is equipped with a versatile radar signal cycle, beat signal administration, computerized sifting signal mode, and Doppler strategy indicating nuances. An image display algorithm based on the radar signal's transmission cycle is extracted, which includes the transmission stages of the radar signal, the factors affecting the radar signal transmission and the radar data screening. Planning technology for radar signal and related signal characteristics will be evaluated to improve implementation. Radar signal calibration strategy and related impact factors are additionally isolated and described. Radar signal handling is described in detail, including the Innovation Multidisciplinary Innovation compound. Versatile radar signal cycle beat pressure executives and computerized separating Doppler strategy are very effective specialized techniques with interesting features. Finally, the difficulties of future testing methods and the progress of radar signals are proposed.

View all citing articles on Scopus

Dr. Ngangbam Herojit Singh is Assistant Professor in NIT Mizoram in the Dept. of CSE. He received his ph. D from the NIT Manipur, India, and M.Tech (ECE) fromVel Tech Multi Tech Dr. R.S Engineering College, Chennai. His-area of interest includes Machine learning, Advance learning, Neural networks, Mobile Robotics, Fuzzy systems

Dr. Khelchandra Thongam is an Assistant Professor in NIT Manipur in the Dept. of CSE. He received his ph. D from the University of Aizu, Japan, M.Tech (CSE) from the University of Aizu, Japan. His-area of interest includes Intelligent System Design, Soft Computing, Hybrid Intelligent System, and Robotics.

View full text

Automatic Speaker Recognition from Speech Signals Using Self Organizing Feature Map and Hybrid Neural Network

ABSTRACT

Section snippets

INTRODUCTION

DATABASE

3. Featured extraction

4. Dimensionality reduction using sofm (self organizing feature map)

RECOGNITION USING MLP-BP

EXPERIMENTAL RESULTS

COMPARISON

ANALYSIS

CONCLUSION

Declaration of Competing Interest

ACKNOWLEDGEMENT

Speech Commun

EURASIP J. Audio, Speech and Music Processing

Procedia Comput Sci

Eng Appl Artif Intell

Biometric Evidence in Forensic Automatic Speaker Recognition

A review on automatic speech recognition architecture and approaches

Int. J. Signal Processing, Image Processing, and Pattern Recognition

Speaker recognition: a tutorial

Proceedings of the IEEE

Automatic speaker identification through robust time-domain features and hierarchical classification approach

In Proceed. international conference on data processing and applications

Pattern Matching Procedure for Automatic Talker Recognition

J. Acoust. Soc. Am.

A method of speaker verification

J. Acoust. Soc. Am.

Voice spectrograms as a function of age, voice disguise, and voice imitation

J. Acoust. Soc. Am.

An analysis of long-term variation of feature parameters of speech and its application to talker recognition

Electronics and Communications in Japan

Effects of speech coding on text-dependent speaker recognition

An introduction to hidden Markov models

In IEEE ASSP MAGAZINE

Segregation of speakers for speech recognition and speaker identification

In International Conference on Acoustics, Speech, and Signal Processing. (ICASSP-91)

An unsupervised, sequential learning algorithm for the segmentation of speech waveforms with multiple speakers

Feature Extraction Methods LPC, PLP, and MFCC In Speech Recognition

Int. J. For Advance Research in Engineering and Technology