Real-Time Implementation of Speaker Diarization System on Raspberry PI3 Using TLBO Clustering Algorithm

Dabbabi, Karim; Hajji, Salah; Cherif, Adnen

doi:10.1007/s00034-020-01357-2

Real-Time Implementation of Speaker Diarization System on Raspberry PI3 Using TLBO Clustering Algorithm

Published: 01 February 2020

Volume 39, pages 4094–4109, (2020)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Karim Dabbabi¹,
Salah Hajji² &
Adnen Cherif¹

383 Accesses
3 Citations
Explore all metrics

Abstract

In the recent years, extensive researches have been performed on various possible implementations of speaker diarization systems. These systems require efficient clustering algorithms in order to improve their performances in real-time processing. Teaching–learning-based optimization (TLBO) is such clustering algorithm which can be used to resolve the problem to the optimum clustering in a reasonable time. In this paper, a real-time implementation of speaker diarization (SD) system on raspberry pi 3 (RPi 3) using TLBO technique as classifier has been performed. This system has been evaluated on broadcasting radio dataset (NDTV), and the experimental tests have shown that this technique has succeeded to achieve acceptable performances in terms of diarization error rate (DER = 21.90% and 35% in single- and cross-show diarization, respectively), accuracy (87.30%), and real-time factor (RTF = 2.40). Also, we have tested TLBO technique on a 2.4 GHz Intel Core i5 processor using REPERE corpus. Thus, ameliorated results have been obtained in terms of execution time (xRT) and DER in both tasks of single- and cross-show speaker diarization (0.08 and 0.095, and 18.50% and 26.30%, respectively).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Survey on Virtual Assistant: Google Assistant, Siri, Cortana, Alexa

Milestones in speaker recognition

Article Open access 15 February 2024

References

C. Anandaraman, An improved sheep flock heredity algorithm for job shop scheduling and flow shop scheduling problems. Int. J. Ind. Eng. Comput. 2(4), 749–764 (2011)
Google Scholar
X. Anguera et al., Speaker diarization: a review of recent research. IEEE Trans. Audio Speech Lang. Process. 20(2), 356–370 (2012)
Article Google Scholar
X. Anguera, C. Wooters, B. Peskin, M. Aguilo, Robust speaker segmentation for meetings: the ICSI-SRI spring 2005 diarization system, in International Workshop on Machine Learning for Multimodal Interaction, (Springer, Heidelberg, 2005), pp. 402–414
K. Asanovic, R. Bodik, B.C. Catanzaro, J.J. Gebis, P. Husbands, K. Keutzer, D.A. Patterson, W.L. Plishker, J. Shalf, S.W. Williams, K.A. Yelick, The landscape of parallel computing research: a view from berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley (2006)
C. Barras, X. Zhu, S. Meignier, J. Gauvain, Multistage speaker diarization of broadcast news. IEEE Trans. Audio Speech Lang. Process. 14(5), 1505–1512 (2006)
Article Google Scholar
A. Baykasoğlu, A. Hamzadayi, S.Y. Köse, Testing the performance of teaching–learning based optimization (TLBO) algorithm on combinatorial problems: flow shop and job shop scheduling cases. Inf. Sci. 276, 204–218 (2014)
Article MathSciNet Google Scholar
D. Charlet, C. Barras, J.-S. Lienard, Impact of overlapping speech detection on speaker diarization for broadcast news and debates, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE, 2013), pp. 7707–7711
S.S. Chen, P. S. Gopalakrishnan, Speaker, environment and channel change detection and clustering via the bayesian information criterion, in Proceedings of DARPA Broadcast News Transcription and Understanding Workshop (1998), pp. 127–132
S. Cheng, H. Min Wang, H. Fu, BIC-based speaker segmentation using divide-and-conquer strategies with application to speaker diarization. IEEE Trans. Audio Speech Lang. Process. 18(1), 141–157 (2009)
Article Google Scholar
J. Chong, E. Gonina, Y. Yi, K. Keutzer, A fully data parallel WFST-based large vocabulary continuous speech recognition on a graphics processing unit, in Tenth Annual Conference of the International Speech Communication Association (2009)
J. Chong, Y. Yi, N.R.S.A. Faria, K. Keutzer, Data-parallel large vocabulary continuous speech recognition on graphics processors, in Proceedings of the 1st Annual Workshop on Emerging Applications and Many Core Architecture (2008), pp. 23–35
K. Church, W. Zhu, J. Vopicka, J. Pelecanos, D. Dimitriadis, P. Fousek, Speaker diarization: a perspective on challenges and opportunities from theory to practice, in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (IEEE, 2017), pp. 4950–4954
K. Dabbabi, S. Hajji, A. Cherif, Integration of evolutionary computation algorithms and new AUTO-TLBO technique in the speaker clustering stage for speaker diarization of broadcast news. EURASIP J. Audio Speech Music Process. 2017(1), 21 (2017)
Article Google Scholar
G. Dahl, Yu. Dong, D. Li, A. Alex, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2011)
Article Google Scholar
H. Delgado, X. Anguera, C. Fredouille, J. Serrano, Fast single-and cross-show speaker diarization using binary key speaker modeling. IEEE/ACM Trans. Audio Speech Lang. Process. 23(12), 2286–2297 (2015)
Article Google Scholar
D. Dimitriadis, P. Fousek, Y. Heights, Developing on-line speaker diarization system, in INTERSPEECH (2017), pp. 2739–2743
P.R. Dixon, T. Oonishi, S. Furui, Fast acoustic computations using graphics processors, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan (2009)
H. Do, H. Silverman, SRP-PHAT methods of locating simultaneous multiple talkers using a frame of microphone array data, in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, (IEEE, 2010), pp. 125–128
G. Dupuy, S. Meignier, P. Deléglise, Y. Estève, Recent improvements on ILP-based clustering for broadcast news speaker diarization (2014)
R.J. Edd, M. Rziza, D. Aboutajdine, M. Gelgon, J. Martinez, Fast incremental clustering of Gaussian mixture speaker models for scaling up retrieval in on-line broadcast, in 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, (IEEE, 2006), p. V
A. Firoozabadi, H. Abutalebi, Combination of nested microphone array and subband processing for multiple simultaneous speaker localization, in 6th International Symposium on Telecommunications (IST), (IEEE, 2012), pp. 907–912
A. Firoozabadi, H. Abutalebi, Localization of multiple simultaneous speakers by combining the information from different subbands. in 2013 21st Iranian Conference on Electrical Engineering (ICEE), (IEEE, 2013), pp. 1–6
O. Galibert, J. Kahn. The first official repere evaluation, in First Workshop on Speech, Language and Audio in Multimedia (2013)
T. Giannakopoulos, pyaudioanalysis: an open-source python library for audio signal analysis. PLoS ONE 10(12), e0144610 (2015)
Article Google Scholar
A. Giraudel, M. Carré, V. Mapelli, J. Kahn, O. Galibert, L. Quintard, The REPERE Corpus: a multimodal corpus for person recognition, in LREC (2012), pp. 1102–1107
E. Gonina, G.Friedland, H. Cook, K. Keutzer, Fast speaker diarization using a high-level scripting language, in 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, (IEEE, 2011), pp. 553–558
H. Gyulyustan, S. Enkov, Experimental speech recognition system based on Raspberry Pi 3. IOSR J. Comput. Eng. (IOSR-JCE) 19(3), 107–112 (2017)
Article Google Scholar
T. Herbig, F. Gerl, W. Minker, Self-learning speaker identification for enhanced speech recognition. Comput. Speech Lang. 26(3), 210–227 (2012)
Article Google Scholar
S. Ishikawa, K. Yamabana, R. Isotani, A. Okumura, Parallel LVCSR algorithm for cellphone-oriented multicore processors, in 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, (IEEE, 2006), p. l
K.R. Krishnamachari, R.E. Yantorno, D.S. Benincasa, S.J. Wenndt, Spectral autocorrelation ratio as a usability measure of speech segments under co-channel conditions, in IEEE International Symposium on Intelligent Signal Processing and Communication Systems (2000), pp. 710–713
N. Kumar, S. Satoor, I. Buck, Fast parallel expectation maximization for Gaussian mixture models on GPUs using CUDA, in 2009 11th IEEE International Conference on High Performance Computing and Communications, (IEEE, 2009), pp. 103–109
S. Kwon, S. Narayanan, A study of generic models for unsupervised on-line speaker indexing, in 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No. 03EX721), (IEEE, 2003), pp. 423–428
S. Kwon, S. Narayanan, Unsupervised speaker indexing using generic models. IEEE Trans. Speech Audio Process. 13(5), 1004–1013 (2005)
Article Google Scholar
J.P. LeBlanc, P.L. De Leon, Speech separation by kurtosis maximization, in Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP’98 (Cat. No. 98CH36181), (IEEE, 1998), pp. 1029–1032
M. Li, K.J. Han, S. Narayanan, Automatic speaker age and gender recognition using acoustic and prosodic level information fusion. Comput. Speech Lang. 27(1), 151–167 (2013)
Article Google Scholar
L. Linna, W. Weng, Sh. Fujimura, An improved teaching-learning-based optimization algorithm to solve job shop scheduling problems, in 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS), (IEEE, 2017), pp. 797–801
H.K. Maganti, P. Motlicek, D. Gatica-Perez, Unsupervised speech/non-speech detection for automatic speech recognition in meeting rooms, in 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, (IEEE, 2007), pp. IV-1037–IV-1040
K. Markov, S. Nakamura, Never-ending learning system for on-line speaker diarization, in 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), (IEEE, 2007), pp. 699–704
M. Moattar, M. Homayounpour, A review on speaker diarization systems and approaches. Speech Commun. 54(10), 1065–1103 (2012)
Article Google Scholar
A. Noulas, B.J.A. Krose, On-line multi-modal speaker diarization, in Proceedings of 9th International Conference on Multimodal Interfaces (2007), pp. 350–357
G. Onwubolu, D. Davendra, Scheduling flow shops using differential evolution algorithm. Eur. J. Oper. Res. 171(2), 674–692 (2006)
Article Google Scholar
D. Pelleg, A. Moore, Extending k-means with efficient estimation of the number of clusters, in ICML, (2000), pp. 727–734
T. Pfau, D. Ellis, A. Stolcke, Multispeaker speech activity detection for the ICSI meeting recorder, in IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU’01, (IEEE, 2001), pp. 107–110
S.A. Rahat, A. Imteaj, T. Rahman, An IoT based interactive speech recognizable robot with distance control using Raspberry Pi, in 2018 International Conference on Innovations in Science, Engineering and Technology (ICISET), (IEEE, 2018), pp. 480–485
R. Ravipudi, V. Vimal, J. Savsani, D.P. Vakharia, Teaching–learning-based optimization: a novel method for constrained mechanical design optimization problems. Comput. Aided Des. 43(3), 303–315 (2011)
Article Google Scholar
D. Reynolds, P. Torres-Carrasquillo, Approaches and applications of audio diarization, in Proceedings. (ICASSP’05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. (IEEE, 2005), pp. v/953–v/956 Vol. 5
M. Rouvier, G. Dupuy, P. Gay, E. Khoury, T. Merlin, S. Meignier, An open-source state-of-the-art toolbox for broadcast news diarization (2013)
J. Schmalenstroeer, M. Kelling, V. Leutnant, R. Haeb-Umbach, Fusing audio and video information for online speaker diarization, in Tenth Annual Conference of the International Speech Communication Association (2009)
M. Taghizadeh, P. Garner, H. Bourlard, H. Abutalebi, A. Asaei, An integrated framework for multi-channel multi-source localization and voice activity detection, in 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays, (IEEE, 2011), pp. 92–97
S. Thiyagarajan, G. Saravana Kumar, E. Praveen Kumar, G. Sakana, Implementation of optical character recognition using Raspberry Pi for visually challenged person. Int. J. Eng. Technol. 7(3.34), 65–67 (2018)
Article Google Scholar
P. Tiawongsombat, M.-H. Jeong, J.-S. Yun, B.-J. You, S.-R. Oh, Robust visual speakingness detection using bi-level HMM. Pattern Recogn. 45(2), 783–793 (2012)
Article Google Scholar
C. Vaquero, O. Vinyals, G. Friedland, A hybrid approach to online speaker diarization, in Eleventh Annual Conference of the International Speech Communication Association (2010)
D. Vijayasenan, F. Valente, H. Bourlard, An information theoretic approach to speaker diarization of meeting data. IEEE Trans. Audio Speech Lang. Process. 17(7), 1382–1393 (2009)
Article Google Scholar
J. Walsh, Y. Kim, T. Doll, Joint iterative multi-speaker identification and source separation using expectation propagation, in 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, (IEEE, 2007), pp. 283–286
Q. Wang, C. Downey, Li. Wan, Ph. Andrew, M. Ignacio, L. Moreno, Speaker diarization with LSTM, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2018), pp. 5239–5243
C. Wooters, M. Huijbregts, The ICSI RT07s speaker diarization system, in Multimodal Technologies for Perception of Humans. Springer, Berlin, Heidelberg, (2007), pp. 509–519
S.N. Wrigley, G.J. Brown, V. Wan, S. Renals, Speech and crosstalk detection in multichannel audio. IEEE Trans. Speech Audio Process. 13(1), 84–91 (2004)
Article Google Scholar
K. You, J. Chong, Y. Yi, E. Gonina, C. Hughes, Y. Chen, W. Sung, K. Keutzer, Parallel scalability in speech recognition. IEEE Signal Process. Mag. 26(6), 124–135 (2009)
Article Google Scholar
K. You, Y. Lee, W. Sung, OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system, in 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, (IEEE, 2009), pp. 621–624
E. Yucesoy, V. Nabiyev, Gender identification of a speaker from voice source, in 2013 21st Signal Processing and Communications Applications Conference (SIU), (IEEE, 2013), pp. 1–4
M. Zelenak, C. Segura, J. Luque, J. Hernando, Simultaneous speech detection with spatial features for speaker diarization. IEEE Trans. Audio Speech Lang. Process. 20(2), 436–446 (2012)
Article Google Scholar
W. Zhu, J. Pelecanos, Online speaker diarization using adapted i-vector transforms, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2016), pp. 5045–5049

Download references

Author information

Authors and Affiliations

Research Unite of Processing and Analysis of Electrical and Energetic Systems, Faculty of Sciences of Tunis, University Tunis El-Manar, 2092, Tunis, Tunisia
Karim Dabbabi & Adnen Cherif
National School of Engineers of Tunis, University Tunis El-Manar, 2092, Tunis, Tunisia
Salah Hajji

Authors

Karim Dabbabi
View author publications
You can also search for this author in PubMed Google Scholar
Salah Hajji
View author publications
You can also search for this author in PubMed Google Scholar
Adnen Cherif
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karim Dabbabi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dabbabi, K., Hajji, S. & Cherif, A. Real-Time Implementation of Speaker Diarization System on Raspberry PI3 Using TLBO Clustering Algorithm. Circuits Syst Signal Process 39, 4094–4109 (2020). https://doi.org/10.1007/s00034-020-01357-2

Download citation

Received: 30 January 2019
Revised: 21 January 2020
Accepted: 23 January 2020
Published: 01 February 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s00034-020-01357-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-Time Implementation of Speaker Diarization System on Raspberry PI3 Using TLBO Clustering Algorithm

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Survey on Virtual Assistant: Google Assistant, Siri, Cortana, Alexa

Milestones in speaker recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Real-Time Implementation of Speaker Diarization System on Raspberry PI3 Using TLBO Clustering Algorithm

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Survey on Virtual Assistant: Google Assistant, Siri, Cortana, Alexa

Milestones in speaker recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation