Skip to main content
Log in

Acoustic Variability of Voice Signal as Factor of Information Security for Automatic Speech Recognition Systems with Tuning to User Voice

  • Published:
Radioelectronics and Communications Systems Aims and scope Submit manuscript

Abstract

The phenomenon of the voice signal acoustic variability in automatic speech recognition systems is considered. There are two varieties—intra- and inter-speaker speech variability. The probabilistic cluster model of minimal speech units in the Kullback–Leibler information metric is used for their mathematical description and comparison in magnitude. On its basis, theoretical estimates of the voice signal acoustic variability for each of its varieties are obtained separately. The effect of information security in systems with tuning to the authorized user voice is described and quantitatively characterized. The intra-speaker variability is negligible in comparison with the inter-speaker variability of speech, and therefore does not have a noticeable harmful effect on the effectiveness of automatic speech recognition. The computational experiment is set up to confirm and develop the theoretical research results, where two speech streams from two different speakers are considered. The author’s software is used for its implementation. According to the experimental results we find that the level of inter-speaker speech variability in a number of cases goes beyond the inter-phonemic differences within a homogeneous speech flow. Therefore, in systems with tuning to the speaker voice, the effect of voice signal acoustic variability is not only unambiguously generally positive, namely: it is an information protection from unauthorized access, but also it is significant in terms of probability-theoretic relation. The obtained results are intended for the development of new and modernization of existing systems for automatic speech recognition, designed to work in a standalone mode.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.

Similar content being viewed by others

Notes

  1. Here we mean the obvious circumstance that the reference signals x k,r in general case can have a dimension that is much larger than the dimension of the speech frame x [12], but in this case it is not essential.

  2. The program website: https://sites.google.com/site/frompldcreators/produkty-1/phonemetraining .

  3. Sound file “5. Sounds and letters” on the site [in Russian] https://mixmuz.ru/mp3/%D0%B3%D0%BB%D0%B0%D1%81%D0%BD%D1%8B%D0%B5%20%D0%B7%D0%B2%D1%83%D0%BA%D0%B8 .

  4. Recommendations site P.1120: https://www.itu.int/ITU-T/recommendations/rec.aspx?id=13176 .

References

  1. L. Rabiner, R. Schafer, Theory and Applications of Digital Speech Processing (Pearson, Boston, 2010). URI: https://www.amazon.com/Theory-Applications-Digital-Speech-Processing/dp/0136034284.

    Google Scholar 

  2. I. B. Tampel, "Automatic speech recognition – the main stages over last 50 years," Sci. Tech. J. Inf. Technol. Mech. Opt., v.100, n.6, p.957 (2015). DOI: https://doi.org/10.17586/2226-1494-2015-15-6-957-968.

    Article  Google Scholar 

  3. D. Yu, L. Deng, Automatic Speech Recognition (Springer London, London, 2015). DOI: https://doi.org/10.1007/978-1-4471-5779-3.

    Book  MATH  Google Scholar 

  4. A. Rogowski, "Industrially oriented voice control system," Robot. Comput. Manuf., v.28, n.3, p.303 (2012). DOI: https://doi.org/10.1016/j.rcim.2011.09.010.

    Article  Google Scholar 

  5. M. Schuster, "Speech Recognition for Mobile Devices at Google," in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Springer, Berlin, Heidelberg, 2010). DOI: https://doi.org/10.1007/978-3-642-15246-7_3.

    Chapter  Google Scholar 

  6. R. Rammohan, N. Dhanabalsamy, V. Dimov, F. J. Eidelman, "Smartphone Conversational Agents (Apple Siri, Google, Windows Cortana) and Questions about Allergy and Asthma Emergencies," J. Allergy Clin. Immunol., v.139, n.2, p.AB250 (2017). DOI: https://doi.org/10.1016/j.jaci.2016.12.804.

    Article  Google Scholar 

  7. V. V. Savchenko, A. V. Savchenko, "Information-theoretic analysis of efficiency of the phonetic encoding–decoding method in automatic speech recognition," J. Commun. Technol. Electron., v.61, n.4, p.430 (2016). DOI: https://doi.org/10.1134/S1064226916040112.

    Article  Google Scholar 

  8. R. A. Ustinov, "Specific features of modern voice protection systems," Bezop. Inf. Tehnol., v.24, n.4, p.71 (2017). DOI: https://doi.org/10.26583/bit.2017.4.08.

    Article  Google Scholar 

  9. Z. Wu, Information Hiding in Speech Signal for Secure Communication (Elsevier, Amsterdam, 2015). DOI: https://doi.org/10.1016/C2013-0-19179-9.

    Book  Google Scholar 

  10. S. M. Qaisar, N. Hainmad, R. Khan, R. Asfour, "A Speech to Machine Interface Based on Perceptual Linear Prediction and Classification," in 2019 Advances in Science and Engineering Technology International Conferences (ASET) (IEEE, Washington, 2019). DOI: https://doi.org/10.1109/ICASET.2019.8714304.

    Chapter  Google Scholar 

  11. R. González Hautamäki, M. Sahidullah, V. Hautamäki, T. Kinnunen, "Acoustical and perceptual study of voice disguise by age modification in speaker verification," Speech Commun., v.95, p.1 (2017). DOI: https://doi.org/10.1016/j.specom.2017.10.002.

    Article  Google Scholar 

  12. V. V. Savchenko, "Minimum of Information Divergence Criterion for Signals with Tuning to Speaker Voice in Automatic Speech Recognition," Radioelectron. Commun. Syst., v.63, n.1, p.42 (2020). DOI: https://doi.org/10.3103/S0735272720010045.

    Article  Google Scholar 

  13. S. Heald, S. Klos, H. Nusbaum, "Understanding Speech in the Context of Variability," in Neurobiology of Language (Academic Press, Cambridge, MA, 2016). DOI: https://doi.org/10.1016/B978-0-12-407794-2.00017-1.

    Chapter  Google Scholar 

  14. I. A. Sieber, G. A. Moroz, "Estimating the Acoustic Variation of s via Principal Component Analysis," NSU Vestnik. Ser. Linguist. Intercult. Commun., v.17, n.1, p.49 (2019). DOI: https://doi.org/10.25205/1818-7935-2019-17-1-49-64.

    Article  Google Scholar 

  15. J. H. L. Hansen, H. Bořil, "On the issues of intra-speaker variability and realism in speech, speaker, and language recognition tasks," Speech Commun., v.101, n.0, p.94 (2018). DOI: https://doi.org/10.1016/j.specom.2018.05.004.

    Article  Google Scholar 

  16. N. А. Krasheninnikova, "Main Factors Interfering with Recognition of Speech Commands," Simbirsk Sci. Bull., v.0, n.1, p.201 (2011).

    Google Scholar 

  17. V. V. Savchenko, L. V. Savchenko, "Method for Measuring the Intelligibility of Speech Signals in the Kullback–Leibler Information Metric," Meas. Tech., v.62, n.9, p.832 (2019). DOI: https://doi.org/10.1007/s11018-019-01702-1.

    Article  Google Scholar 

  18. O. F. Krivnova, "Prosodic phrasing in spoken text: localization of breathing pauses," in Computational linguistics and intelligent technologies: based on the materials of the international conference (Dialog, Moscow, 2016). URI: http://www.dialog-21.ru/media/3404/krivnovaof.pdf.

    Google Scholar 

  19. V. V. Savchenko, "Itakura–Saito Divergence as an Element of the Information Theory of Speech Perception," J. Commun. Technol. Electron., v.64, n.6, p.590 (2019). DOI: https://doi.org/10.1134/S1064226919060093.

    Article  Google Scholar 

  20. V. V. Savchenko, "Estimation of the Phonetic Speech Quality Using the Information Theoretic Approach," J. Commun. Technol. Electron., v.63, n.1, p.53 (2018). DOI: https://doi.org/10.1134/S1064226918010126.

    Article  Google Scholar 

  21. S. Kullback, Information Theory and Statistics (Dover Publications, New York, 1997). URI: https://www.amazon.com/Information-Theory-Statistics-Dover-Mathematics/dp/0486696847.

    MATH  Google Scholar 

  22. V. V. Savchenko, "Criterion for Minimum of Mean Information Deviation for Distinguishing Random Signals with Similar Characteristics," Radioelectron. Commun. Syst., v.61, n.9, p.419 (2018). DOI: https://doi.org/10.3103/S0735272718090042.

    Article  Google Scholar 

  23. V. V. Savchenko, A. V. Savchenko, "Criterion of Significance Level for Selection of Order of Spectral Estimation of Entropy Maximum," Radioelectron. Commun. Syst., v.62, n.5, p.223 (2019). DOI: https://doi.org/10.3103/S0735272719050042.

    Article  Google Scholar 

  24. H. B. Dwight, Tables of Integrals and Other Mathematical Data (Macmillan, New York, 1961).

    MATH  Google Scholar 

  25. "Linear prediction," in Springer Handbook of Speech Processing (Springer Berlin Heidelberg, Berlin, Heidelberg, 2008). DOI: https://doi.org/10.1007/978-3-540-49127-9.

    Chapter  Google Scholar 

  26. P. H. Müller, P. Neumann, R. Storm, "Tafeln der mathematischen Statistik," VEB Fachbuchverlag, v.0, n.0, p.279 (1973). URI: http://doi.wiley.com/10.1002/bimj.19740160816.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. V. Savchenko.

Ethics declarations

ADDITIONAL INFORMATION

V. V. Savchenko

The author declares that he has no conflict of interest.

The initial version of this paper in Russian is published in the journal “Izvestiya Vysshikh Uchebnykh Zavedenii. Radioelektronika,” ISSN 2307-6011 (Online), ISSN 0021-3470 (Print) on the link http://radio.kpi.ua/article/view/S0021347020100039 with DOI: https://doi.org/10.20535/S0021347020100039

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Savchenko, V.V. Acoustic Variability of Voice Signal as Factor of Information Security for Automatic Speech Recognition Systems with Tuning to User Voice. Radioelectron.Commun.Syst. 63, 532–542 (2020). https://doi.org/10.3103/S0735272720100039

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0735272720100039

Navigation