An efficient algorithm for recognition of emotions from speaker and language independent speech using deep learning

Singh, Youddha Beer; Goel, Shivani

doi:10.1007/s11042-020-10399-2

An efficient algorithm for recognition of emotions from speaker and language independent speech using deep learning

Published: 20 January 2021

Volume 80, pages 14001–14018, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

574 Accesses
7 Citations
Explore all metrics

Abstract

Automatic emotion recognition from speech is a demanding and challenging problem. It is difficult to differentiate between the emotional states of humans. The major problem with this task is to extract the important features from the speech in case of hand-crafted features. The accuracy for emotion recognition can be increased using deep learning approaches which use high level features of speech signals. In this work, an algorithm is proposed using deep learning to extract the high-level features from raw data with high accuracy irrespective of language and speakers (male/females) of speech corpora. For this, the .wav files are converted into the RGB spectrograms (images) and normalized to size (224x224x3) for fine-tuning these for Deep Convolutional Neural Network (DCNN) to recognize emotions. DCNN model is trained in two stages. From stage-1 the optimal learning rate is identified using the Learning Rate (LR) range test and then the model is trained again with optimal learning rate in stage-2. Special stride is used for down-sampling the features with reduced model size. The emotions considered are happiness, sadness, anger, fear, disgust, boredom/surprise and neutral. The proposed algorithm is tested on three popular public speech corpora EMODB (German), EMOVO (Italian), and SAVEE (British English). The accuracy of emotion recognition reported is better as compared to the existing studies for different languages and speakers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech Emotion Recognition Using Deep Learning

Speech Emotion Recognition Using Mel Frequency Log Spectrogram and Deep Convolutional Neural Network

A lightweight 2D CNN based approach for speaker-independent emotion recognition from speech with new Indian Emotional Speech Corpora

Article 21 February 2023

References

Anagnostopoulos C, Iliou T, Giannoukos I (2012) Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artificial Intell Rev 43:155–177. https://doi.org/10.1007/s10462-012-9368-5
Article Google Scholar
Badshah A, Rahim N, Ullah N, Ahmad J, Muhammad K, Lee M et al (2019) Deep features-based speech emotion recognition for smart affective services. Multimed Tools Appl 78:5571–5589. https://doi.org/10.1007/s11042-017-5292-7
Article Google Scholar
Bitouk D, Verma R, Nenkova A (2010) Class-level spectral features for emotion recognition. Speech Commun 52:613–625. https://doi.org/10.1016/j.specom.2010.02.010
Article Google Scholar
Bou-Ghazale S, Hansen J (2000) A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Trans Speech Audio Process. 8:429–442. https://doi.org/10.1109/89.848224.
Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B (2005) A database of German emotional speech, in: Inninth European Conference On Speech Communication And Technology
Chauhan A, Koolagudi SG, Kafley S, Rao KS (2010, April) Emotion recognition using LP residual. In: 2010 IEEE Students Technology Symposium (TechSym). IEEE, pp 255–261. https://doi.org/10.1109/TECHSYM.2010.5469162
Chen M, He X, Yang J, Zhang H (2018) 3-D convolutional recurrent neural networks with attention model for speech emotion recognition. IEEE Signal Process Lett 25(10):1440–1444. https://doi.org/10.1109/LSP.2018.2860246
Article Google Scholar
Costantini G, Iaderola I, Paoloni A, Todisco M (2014) Emovo corpus: an italian emotional speech database, in. In international Conference on Language Resources And Evaluation, European Language Resources Association (ELRA), 3501–3504.
Cowie R, Cornelius RR (2003) Describing the emotional states that are expressed in speech. Speech Commun 40(1-2):5–32. https://doi.org/10.1016/s0167-6393(02)00071-7
Article MATH Google Scholar
Dai K, Fell HJ, MacAuslan J (2008) Recognizing emotion in speech using neural networks. Telehealth and Assistive Technologies 31:38–43
Google Scholar
Daneshfar F, Kabudian SJ (2020) Speech emotion recognition using discriminative dimension reduction by employing a modified quantum behaved particle swarm optimization algorithm. Multimed Tools Appl 79(1):1261–1289. https://doi.org/10.1007/s11042-019-08222-8
Article Google Scholar
Deng J, Xu X, Zhang Z, Frühholz S, Schuller B (2017) Universum autoencoder-based domain adaptation for speech emotion recognition. IEEE Signal Process Lett 24(4):500–504. https://doi.org/10.1109/lsp.2017.2672753
Article Google Scholar
El Ayadi M, Kamel M, Karray F (2011) Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition 44:572–587. https://doi.org/10.1016/j.patcog.2010.09.020
Article MATH Google Scholar
Farrús M, Hernando J (2009) Using Jitter and Shimmer in speaker verification. IET Signal Process 3:247. https://doi.org/10.1049/iet-spr.2008.0147
Article Google Scholar
Fayek H M, Lech M and Cavedon L (2015) Towards real-time speech emotion recognition using deep neural networks. In 2015 9th international conference on signal processing and communication systems (ICSPCS) 1–5. IEEE. https://dpi.org/https://doi.org/10.1109/ICSPCS.2015.7391796
SA Firoz, SA Raji, AP Babu (2009) Automatic Emotion Recognition from Speech Using Artificial Neural Networks with Gender-Dependent Databases. International Conference on Advances in Computing, Control, and Telecommunication Technologies, Trivandrum, Kerala 162–164. https://doi.org/10.1109/ACT.2009.49.
Han K, Yu D, Tashev I (2014) Speech emotion recognition using deep neural network and extreme learning machine. In Fifteenth annual conference of the international speech communication association 223–227.
Haq S, Jackson PJ (2011) Multimodal emotion recognition. In: Machine audition: principles, algorithms and systems. IGI Global, pp 398–423. https://doi.org/10.4018/978-1-61520-919-4.ch017
Huang Z, Dong M, Mao Q and Zhan Y (2014) Speech emotion recognition using CNN. In Proceedings of the 22nd ACM international conference on Multimedia 801–804. https://doi.org/10.1145/2647868.2654984
Jawarkar N (2007) Emotion recognition using prosody features and a fuzzy min-max neural classifier. IETE Technical Rev 24:369–373
Google Scholar
Khanchandani KB (2009) MA Hussain, Emotion recognition using multilayer perceptron and generalized feed forward neural network. CSIR 68:367–371 http://hdl.handle.net/123456789/3787
Google Scholar
Kim Y, Lee H, Provost EM (2013) Deep learning for robust feature generation in audio visual emotion recognition. IEEE Int Conference Acoustics, Speech Signal Process, Vancouver, pp 3687–3691. https://doi.org/10.1109/ICASSP.2013.6638346
Book Google Scholar
Koolagudi S, Maity S, Kumar V, Chakrabarti S, Rao K (2009) IITKGP-SESC: speech database for emotion analysis. In international Conference On Contemporary Computing. 485–492.
Koolagudi SG, Reddy R, Rao KS (2010, July) Emotion recognition from speech signal using epoch parameters. In: 2010 international conference on signal processing and communications (SPCOM). IEEE, pp 1–5. https://doi.org/10.1109/SPCOM.2010.5560541
Koolagudi S, Murthy Y, Bhaskar S (2018) Choice of a classifier, based on properties of a dataset: case study-speech emotion recognition. Int J Speech Technol 21:167–183. https://doi.org/10.1007/s10772-018-9495-8
Article Google Scholar
Kwon OW, Chan K, Hao J, Lee TW (2003) Emotion recognition by speech signals. In Eighth European Conference on Speech Communication and Technology
Le Cun Y, Bengio Y, Hinton G (2015) Deep learning, Nature
Lee MC, Chiang SY, Yeh SC, Wen TF (2020) Study on emotion recognition and companion Chatbot using deep neural network. Multimed Tools Appl 79:19629–19657. https://doi.org/10.1007/s11042-020-08841-6
Article Google Scholar
Livingstone S, Russo F (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLOS ONE 13:e0196391. https://doi.org/10.1371/journal.pone.0196391
Article Google Scholar
Martin O, Kotsia I, Macq B, Pitas I (2006, April) The eNTERFACE'05 audio-visual emotion database. In: 22nd International Conference on Data Engineering Workshops (ICDEW'06) Atlanta, GA, USA, pp 8–8. https://doi.org/10.1109/ICDEW.2006.145
Motamed S, Setayeshi S, Rabiee A (2017) Speech emotion recognition based on a modified brain emotional learning model. Biol Inspired Cognitive Architect 19:32–38. https://doi.org/10.1016/j.bica.2016.12.002
Article Google Scholar
Neiberg D, Elenius K, Laskowski K (2006) Emotion recognition in spontaneous speech using GMMs. In Ninth international conference on spoken language processing
Nwe TL, Wei FS, De Silva LC (2001) Speech based emotion classification. Proceedings of IEEE Region 10th International Conference on Electrical and Electronic Technology. TENCON, Singapore, pp 297–301. https://doi.org/10.1109/TENCON.2001.949600
Book Google Scholar
Özseven T (2019) A novel feature selection method for speech emotion recognition. Applied Acoustics 146:320–326. https://doi.org/10.1016/j.apacoust.2018.11.028
Article Google Scholar
Parry J, Palaz D, Clarke G, Lecomte P, Mead, R., Berger M, Hofer G (2019) Analysis of Deep Learning Architectures for Cross-corpus Speech Emotion Recognition. Proc Interspeech:1656–1660. https://doi.org/10.21437/Interspeech.2019-2753
Partila P, Voznak M (2013) Speech emotions recognition using 2-d neural classifier. In Nostradamus 2013: Prediction, modeling and analysis of complex systems (pp. 221–231). Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00542-3_23
Pervaiz A, Hussain F, Israr H, Tahir MA, Raja FR, Baloch NK, … Zikria YB (2020) Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data. Sensors 20(8):2326
Article Google Scholar
Polzehl T, Sundaram S, Ketabdar H, Wagner M, Metze F (2009) Emotion classification in children's speech using fusion of acoustic and linguistic features. in: Intenth Annual Conference of The International Speech Communication Association
Prasomphan S (2015) Improvement of speech emotion recognition with neural network classifier by using speech spectrogram. International Conference on Systems, Signals and Image Processing (IWSSIP), London, pp 73–76. https://doi.org/10.1109/IWSSIP.2015.7314180
Book Google Scholar
Rao K S, Reddy R, Maity S, Koolagudi SG (2010) Characterization of emotions using the dynamics of prosodic features. InSpeech Prosody Fifth International Conference
Rao K, Koolagudi S, Vempada R (2013) Emotion recognition from speech using global and local prosodic features. Int J Speech Technol. 16:143–160. https://doi.org/10.1007/s10772-012-9172-2.
Razak AA, Komiya R, Izani M, Abidin Z (2005) Comparison between fuzzy and NN method for speech emotion recognition. Third International Conference on Information Technology and Applications (ICITA'05), Sydney, pp 297–302. https://doi.org/10.1109/ICITA.2005.101
Book Google Scholar
Sato N, Obuchi Y (2007) Emotion Recognition using Mel-Frequency Cepstral Coefficients. J Nat Language Process 14:83–96. https://doi.org/10.5715/jnlp.14.4_83
Article Google Scholar
Shen P, Changjun Z, Chen X (2011, August) Automatic speech emotion recognition using support vector machine. In: Proceedings of 2011 International Conference on Electronic & Mechanical Engineering and Information Technology, vol 2. IEEE, pp 621–625. https://doi.org/10.1109/EMEIT.2011.6023178
Singh YB, Goel S (2018) Survey on Human Emotion Recognition: Speech Database, Features and Classification. International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), Greater Noida (UP), India, pp 298–301. https://doi.org/10.1109/ICACCCN.2018.8748379
Book Google Scholar
Steidl S, Batliner A, Seppi D, Schuller B (2010) On the impact of children's emotional speech on acoustic and language models. EURASIP J Audio, Speech Music Process 2010(1):783954. https://doi.org/10.1186/1687-4722-2010-783954
Article Google Scholar
Stuhlsatz A, Meyer C, Eyben F, Zielke T, Meier G, Schuller B (2011) Deep neural networks for acoustic emotion recognition: Raising the benchmarks. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague 5688–5691. https://doi.org/10.1109/ICASSP.2011.5947651.
Tang H, Chu SM, Hasegawa M, Johnson HTS (2009) Emotion recognition from speech VIA boosted Gaussian mixture models. IEEE International Conference on Multimedia and Expo, New York, pp 294–297. https://doi.org/10.1109/ICME.2009.5202493
Book Google Scholar
Trigeorgis G, Ringeval F, Brueckner R, Marchi E, Nicolaou MA, Schuller B, Zafeiriou S (2016) Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, pp 5200–5204. https://doi.org/10.1109/ICASSP.2016.7472669
Book Google Scholar
Ververidis D, Kotropoulos C (2006) Emotional speech recognition: Resources, features, and methods. Speech Commun 48:1162–1181. https://doi.org/10.1016/j.specom.2006.04.003
Article Google Scholar
Wahlster W (2013) Verbmobil: Foundations of Speech-to-Speech Translation, Springer Science & Business Media
Wang K, An N, Li BN, Zhang Y, Li L (2015) Speech Emotion Recognition Using Fourier Parameters. IEEE Trans Affect Comput 6:69–75. https://doi.org/10.1109/TAFFC.2015.2392101
Article Google Scholar
Wu S, Falk TH, Chan W-Y (2011) Automatic speech emotion recognition using modulation spectral features. Speech Commun 53(5):768–785. https://doi.org/10.1016/j.specom.2010.08.013
Article Google Scholar
Yu D, Deng L (2016) Automatic Speech Recognition. London Limited, Springer
MATH Google Scholar
Zhao J, Mao X, Chen L (2019) Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomed Signal Process Control 47:312–323. https://doi.org/10.1016/j.bspc.2018.08.035
Article Google Scholar
Zheng W, Xin M, Wang X, Wang B (2014) A novel speech emotion recognition method via incomplete sparse least square regression. IEEE Signal Process Lett 21(5):569–572. https://doi.org/10.1109/lsp.2014.2308954
Article Google Scholar
Zhou J, Wang G, Yang Y, Chen P (2006) Speech Emotion Recognition Based on Rough Set and SVM. 5th IEEE International Conference on Cognitive Informatics, Beijing, pp 53–61. https://doi.org/10.1109/COGINF.2006.365676
Book Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Engineering Department, School of Engineering and Applied Sciences, Bennett University, Greater Noida, 201310, India
Youddha Beer Singh & Shivani Goel
Department of Computer Science and Engineering, Galgotias College of Engineering and Technology, Greater Noida, UP, India
Youddha Beer Singh

Authors

Youddha Beer Singh
View author publications
You can also search for this author in PubMed Google Scholar
Shivani Goel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Youddha Beer Singh.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Singh, Y.B., Goel, S. An efficient algorithm for recognition of emotions from speaker and language independent speech using deep learning. Multimed Tools Appl 80, 14001–14018 (2021). https://doi.org/10.1007/s11042-020-10399-2

Download citation

Received: 11 November 2019
Revised: 11 September 2020
Accepted: 22 December 2020
Published: 20 January 2021
Issue Date: April 2021
DOI: https://doi.org/10.1007/s11042-020-10399-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient algorithm for recognition of emotions from speaker and language independent speech using deep learning

Abstract

Access this article

Similar content being viewed by others

Speech Emotion Recognition Using Deep Learning

Speech Emotion Recognition Using Mel Frequency Log Spectrogram and Deep Convolutional Neural Network

A lightweight 2D CNN based approach for speaker-independent emotion recognition from speech with new Indian Emotional Speech Corpora

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An efficient algorithm for recognition of emotions from speaker and language independent speech using deep learning

Abstract

Access this article

Similar content being viewed by others

Speech Emotion Recognition Using Deep Learning

Speech Emotion Recognition Using Mel Frequency Log Spectrogram and Deep Convolutional Neural Network

A lightweight 2D CNN based approach for speaker-independent emotion recognition from speech with new Indian Emotional Speech Corpora

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation