Abstract
Zero frequency filtering (ZFF) is a well-explored technique for detecting glottal closure instants (GCIs) from the speech signal. The features extracted from the ZFF of the speech signal have also been used for many speech-based applications. Zero frequency resonators used in the infinite impulse response (IIR) realization of ZFF are unstable filters. Consequently, the filter output grows or decays in an unbounded manner. In the finite impulse response (FIR) realization of ZFF, the filter instability problem is address by using more adders and multipliers. In this paper, a simplified stable IIR realization of ZFF (SIIR-ZFF) is proposed for efficient hardware implementation. The SIIR-ZFF requires significantly less hardware than the earlier reported IIR-ZFF as well as FIR-ZFF. Performance of the proposed approach for the task of detecting GCIs from speech signal is found to be almost equivalent to that of the conventional IIR-ZFF. The hardware architecture for the proposed approach is designed and implemented on FPGA.
Similar content being viewed by others
References
V.L.R. da Costa, H.V. Schettino, Â. Camponogara, F.P. de Campos, M.V. Ribeiro, Digital filters for clustered-OFDM-based PLC systems: design and implementation. Digit. Signal Proc. 70, 166–177 (2017)
K.T. Deepak, B.D. Sarma, S.R.M. Prasanna, Foreground speech segmentation using zero frequency filtered signal, in INTERSPEECH (2012), pp. 1512–1515
K.T. Deepak, S. Prasanna, Epoch extraction using zero band filtering from speech signal. Circuits Syst. Signal Process. 34(7), 2309–2333 (2015)
N. Dhananjaya, B. Yegnanarayana, Voiced/nonvoiced detection based on robustness of voiced epochs. IEEE Signal Process. Lett. 17(3), 273–276 (2010)
S.H. Dumpala, K.V. Sridaran, S.V. Gangashetty, B. Yegnanarayana, Analysis of laughter and speech-laugh signals using excitation source information, in Acoustics, Speech and Signal Processing (2014), pp. 975–979
P. Gangamohan, S.R. Kadiri, B. Yegnanarayana, Analysis of emotional speech at subsegmental level, in INTERSPEECH (2013), pp. 1916–1920
P. Gangamohan, B. Yegnanarayana, A robust and alternative approach to zero frequency filtering method for epoch extraction, in Proceedings of the INTERSPEECH (2017), pp. 2297–2300
D. Govind, S. Prasanna, Epoch extraction from emotional speech, in International Conference on Signal Processing and Communications (SPCOM) (2012), pp. 1–5
J. Kominek, A.W. Black, The CMU arctic speech databases, in Fifth ISCA Workshop on Speech Synthesis (2004)
K.S. Kumar, M.S.H. Reddy, K.S.R. Murty, B. Yegnanarayana, Analysis of laugh signals for detecting in continuous speech, in Proceedings of the INTERSPEECH (2009), pp. 1591–1594
M. Lopez-Ramirez, L.M. Ledesma-Carrillo, E. Cabal-Yepez, G. Botella, C. Rodriguez-Donate, S. Ledesma, FPGA-based methodology for depth-of-field extension in a single image. Digit. Signal Proc. 70, 14–23 (2017)
V.K. Mittal, B. Yegnanarayana, Study of changes in glottal vibration characteristics during laughter, in INTERSPEECH (2014), pp. 1777–1781
V.K. Mittal, B. Yegnanarayana, Effect of glottal dynamics in the production of shouted speech. J. Acoust. Soc. Am. 133(5), 3050–3061 (2013)
K.S.R. Murthy, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
K.S.R. Murty, B. Yegnanarayana, M.A. Joseph, Characterization of glottal activity from speech signals. IEEE Signal Process. Lett. 16(6), 469–472 (2009)
B. Pattanayak, J.K. Rout, G. Pradhan, Adaptive spectral smoothening for development of robust keyword spotting system. IET Signal Process. 13, 544–550 (2019)
G. Pradhan, S.R.M. Prasanna, Speaker verification by vowel and nonvowel like segmentation. IEEE Trans. Audio Speech Lang. Process. 21(4), 854–867 (2013)
G. Pradhan, B. Haris, S.R.M. Prasanna, R. Sinha, Speaker verification in sensor and acoustic environment mismatch conditions. Int. J. Speech Technol. 15(3), 381–392 (2012)
S.R.M. Prasanna, D. Govind, K.S. Rao, B. Yegnanarayana, Fast prosody modification using instants of significant excitation, in Proceedings of the Speech Prosody (2010)
S.R.M. Prasanna, G. Pradhan, Significance of vowel-like regions for speaker verification under degraded conditions. IEEE Trans. Audio Speech Lang. Process. 19(8), 2552–2565 (2011)
K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14(3), 972–980 (2006)
G. Seshadri, B. Yegnanarayana, Performance of an event-based instantaneous fundamental frequency estimator for distant speech signals. IEEE Trans. Audio Speech Lang. Process. 19(7), 1853–1864 (2011)
S. Shahnawazuddin, N. Adiga, H.K. Kathania, Effect of prosody modification on children’s ASR. IEEE Signal Process. Lett. 24(11), 1749–1753 (2017)
S. Shahnawazuddin, N. Adiga, H.K. Kathania, G. Pradhan, R. Sinha, Studying the role of pitch-adaptive spectral estimation and speaking-rate normalization in automatic speech recognition. Digit. Signal Proc. 79, 142–151 (2018)
N. Srinivas, G. Pradhan, P.K. Kumar, FPGA implementation of zero frequency filter, in Conference on Information and Communication Technology (CICT) (2018), pp. 1–5
N. Srinivas, K. Srinivas, G. Pradhan, P.K. Kumar, FPGA implementation for real-time epoch extraction in speech signal, in International Conference on Advances in Computing and Data Sciences (Springer, 2018), pp. 392–400
K.S. Srinivas, K. Prahallad, An FIR implementation of zero frequency filtering of speech signals. IEEE Trans. Audio Speech Lang. Process. 20(9), 2613–2617 (2012)
N. Srinivas, G. Pradhan, P.K. Kumar, Detection of vowel-like speech: an efficient hardware architecture and it’s FPGA prototype. Microsyst. Technol. 25, 1333–1343 (2018)
N. Srinivas, G. Pradhan, P.K. Kumar, An efficient hardware architecture for detection of vowel-like regions in speech signal. Integration 63, 185–195 (2018)
S.A. Thati, K.S. Kumar, B. Yegnanarayana, Synthesis of laughter by modifying excitation characteristics. J. Acoust. Soc. Am. 133(5), 3072–3082 (2013)
A. Varga, H.J.M. Steeneken, Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)
A. Vuppala, J. Yadav, S. Chakrabarti, K.S. Rao, Vowel onset point detection for low bit rate coded speech. IEEE Trans. Audio Speech Lang. Process. 20(6), 1894–1903 (2012)
J. Yadav, K.S. Rao, Detection of vowel offset point from speech signal. IEEE Signal Process. Lett. 20(4), 299–302 (2013)
B. Yegnanarayana, S.R.M. Prasanna, Analysis of instantaneous \(f_{0}\) contours from two speakers mixed signal using zero frequency filtering, in Acoustics Speech and Signal Processing (2010), pp. 5074–5077
B. Yegnanarayana, K.S.R. Murty, Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Trans. Audio Speech Lang. Process. 17(4), 614–624 (2009)
Acknowledgements
This research work is a sub-module of the project “Development of Speech Based Person Authentication System in FPGA” under SMDP-C2SD (9(I)/2014-MDD) program and is supported by the Ministry of Electronic and Information Technology (MeitY), Government of India.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Srinivas, N., Pradhan, G. & Govind, D. A Simplified Realization of Zero Frequency Filter for Hardware Implementation. Circuits Syst Signal Process 39, 4717–4729 (2020). https://doi.org/10.1007/s00034-020-01369-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-020-01369-y