Skip to main content
Log in

Minimum of Information Divergence Criterion for Signals with Tuning to Speaker Voice in Automatic Speech Recognition

  • Published:
Radioelectronics and Communications Systems Aims and scope Submit manuscript

Abstract

It is considered a problem of automatic speech recognition at basic, phonetic level of speech signal processing. It is researched a problem of noise-immunity increase. For its solution it is proposed a criterion of minimum information divergence of the signals with tuning to a speaker voice and automatic scaling of speech template to thin structure of observed (current) speech frame. An example of its practical realization is considered, efficiency characteristics are researched. Applying the author’s software we carry out an experiment and obtain qualitative estimations of threshold signals gain in case of application of proposed criterion. It is shown than this gain can be 10 dB and greater under certain conditions. Obtained results and drawn conclusions are intended it to their application for development and modernization of existent systems and techniques of automatic processing and recognition of speech intended it to operation in conditions of intensive noise effect.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. L. R. Rabiner, R. W. Shafer, Theory and Applications of Digital Speech Processing (Pearson, Boston, 2010). URI: https://www.pearson.com/us/higher-educatio/program/Rabiner-Theory-and-Applications-of-Digital-Speech-Processing/PGM130812.html.

    Google Scholar 

  2. I. B. Tampel, “Automated speech recognition - the main stages over last 50 years,” Sci. Tech. J. Information Technol., Mech. Optics15, No. 6, 957 (2015). DOI: https://doi.org/10.17586/2226-1494-2015-15-6-957-968.

    Google Scholar 

  3. M. Schuster, “Speech recognition for mobile devices at Google,” in: B. T. Zhang, M. A. Orgun (eds.), PRICAI 2010: Trends in Artificial Intelligence. PRICAI 2010. Lecture Notes in Computer Science (Springer, Berlin, Heidelberg, 2010), Vol. 6230, pp. 8–10. DOI: https://doi.org/10.1007/978-3-642-15246-7_3.

    Chapter  Google Scholar 

  4. V. V. Savchenko, A. V. Savchenko, “Information-theoretic analysis of efficiency of the phonetic encoding-decoding method in automatic speech recognition,” J. Commun. Technol. Electronics61, No. 4, 430 (2016). DOI: https://doi.org/10.1134/S1064226916040112.

    Article  Google Scholar 

  5. Z. Wu, Information Hiding in Speech Signals for Secure Communication (Elsevier Science, 2015). DOI: https://doi.org/10.1016/C2013-0-19179-9.

  6. R. Rammohan, N. Dhanabalsamy, V. Dimov, J. Frank, “Eidelman smartphone conversational agents (Apple Siri, Google, Windows Cortana) and questions about allergy and asthma emergencies,” J. Allergy Clinical Immunology139, No. 2, ab250 (2017). DOI: https://doi.org/10.1016/j.jaci.2016.12.804.

    Article  Google Scholar 

  7. M. B. Akçay, K. Oğuzb, “Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities and classifiers,” Speech Commun.116, No. 1, 56 (2020). DOI: https://doi.org/10.1016/j.specom.2019.12.001.

    Article  Google Scholar 

  8. V. V. Savchenko, “A method of measuring the index of acoustic voice quality based on an information-theoretic approach,” Meas. Tech.61, No. 1, 79 (2018). DOI: https://doi.org/10.1007/s11018-018-1391-8.

    Article  Google Scholar 

  9. V. V. Savchenko, “Itakura-Saito divergence as an element of the information theory of speech perception,” J. Commun. Technol. Electron.64, No. 6, 590 (2019). DOI: https://doi.org/10.1134/S1064226919060093.

    Article  Google Scholar 

  10. V. V. Savchenko, “Criterion for minimum of mean information deviation for distinguishing random signals with similar characteristics,” Radioelectron. Commun. Syst.61, No. 9, 419 (2018). DOI: https://doi.org/10.3103/S0735272718090042.

    Article  Google Scholar 

  11. S. M. Qaisar, N. Hammad, R. Khan, R. Asfour, “A speech to machine interface based on perceptual linear prediction and classification,” Proc. of Int. Conf. on Advances in Science and Engineering Technology, 26 Mar.-10 Apr. 2019, Dubai, UAE (IEEE, 2019). DOI: https://doi.org/10.1109/ICASET.2019.8714304.

    Google Scholar 

  12. V. N. Zvaritch, B. G. Marchenko, “Linear autoregressive processes with periodic structures as models of information signals,” Radioelectron. Commun. Syst.54, No. 7, 367 (2011). DOI: https://doi.org/10.3103/S0735272711070041.

    Article  Google Scholar 

  13. F. Castanié, Digital Spectral Analysis: Parametric, Non-Parametric and Advanced Methods (Wiley-ISTE, 2011). DOI: https://doi.org/10.1002/9781118601877.

  14. V. V. Savchenko, A. V. Savchenko, “Criterion of significance level for selection of order of spectral estimation of entropy maximum,” Radioelectron. Commun. Syst.62, No. 5, 223 (2019). DOI: https://doi.org/10.3103/S0735272719050042.

    Article  Google Scholar 

  15. R. M. Gray, A. Buzo, A. H. Gray, Y. Matsuyama, “Distortion measures for speech processing,” IEEE Trans. Acoust., Speech Signal Processing28, No. 4, 367 (1980). DOI: https://doi.org/10.1109/TASSP.1980.1163421.

    Article  Google Scholar 

  16. O. D. Eva, A. M. Lazar, “Feature extraction and classification methods for a motor task brain computer interface: a comparative evaluation for two databases,” Int. J. Advanced Computer Sci. Appl.8, No. 8, 263 (2017). DOI: https://doi.org/10.14569/IJACSA.2017.080834.

    Google Scholar 

  17. S. S. Rachel, U. Snekhalatha, K. Vedhasorubini, D. Balakrishnan, “Spectral analysis of speech signal characteristics: a comparison between healthy controls and laryngeal disorder,” Proc. of Int. Conf. on Intelligent Computing and Applications (Springer, Singapore, 2018), Vol. 632, pp. 333–334. DOI: https://doi.org/10.1007/978-981-10-5520-1_31.

    Chapter  Google Scholar 

  18. V. V. Savchenko, “Words phonetic decoding method with the suppression of background noise,” J. Commun. Technol. Electron.62, No. 7, 788 (2017). DOI: https://doi.org/10.1134/S1064226917070099.

    Article  Google Scholar 

  19. E. Hossain, M. S. A. Zilany, E. Davies-Venn, “On the feasibility of using a bispectral measure as a nonintrusive predictor of speech intelligibility,” Computer Speech Lang.57, 59 (2019). DOI: https://doi.org/10.1016/j.csl.2019.02.003.

    Article  Google Scholar 

  20. H. Ding, T. Lee, I. Y. Soon, C. K. Yeo, P. Dai, G. Dan, “Objective measures for quality assessment of noise-suppressed speech,” Speech Commun.71, 62 (2015). DOI: https://doi.org/10.1016/j.specom.2015.02.001.

    Article  Google Scholar 

  21. A. A. Borovkov, Mathematic Statistics [in Russian] (Lan’, St. Petersburg, 2010).

    Google Scholar 

  22. S. Kullback, Information Theory and Statistics (Dover Pub., N.Y., 1997).

    MATH  Google Scholar 

  23. E. Estrada, H. Nazeran, F. Ebrahimi, M. Mikaeili, “Symmetric Itakura distance as an EEG signal feature for sleep depth determination,” Proc. of ASME Bioengineering Conf., 17–21 Jun. 2009, Lake Tahoe, USA (2009), pp. 723–724. DOI: https://doi.org/10.1115/SBC2009-206233.

  24. A. A. Gharbali, S. Najdi, J. M. Fonseca, “Investigating the contribution of distance-based features to automatic sleep stage classification,” Comput. Biology Medicine96, 8 (2017). DOI: https://doi.org/10.1016/j.comp-biomed.2018.03.001.

    Article  Google Scholar 

  25. B. R. Levin, Theoretic Principles of Statistic Radioengineering [in Russian] (Radio i Svyaz’, Moscow, 1989).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. V. Savchenko.

Additional information

Conflict of Interest

The authors declare that they have no conflict of interest.

Russian Text © The Author(s), 2020, published in Izvestiya Vysshikh Uchebnykh Zavedenii, Radioelektronika, 2020, Vol. 63, No. 1, pp. 55–68.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Savchenko, V.V. Minimum of Information Divergence Criterion for Signals with Tuning to Speaker Voice in Automatic Speech Recognition. Radioelectron.Commun.Syst. 63, 42–54 (2020). https://doi.org/10.3103/S0735272720010045

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0735272720010045

Navigation