Acoustic Variability of Voice Signal as Factor of Information Security for Automatic Speech Recognition Systems with Tuning to User Voice

Savchenko, V. V.

doi:10.3103/S0735272720100039

Acoustic Variability of Voice Signal as Factor of Information Security for Automatic Speech Recognition Systems with Tuning to User Voice

Published: 14 December 2020

Volume 63, pages 532–542, (2020)
Cite this article

Radioelectronics and Communications Systems Aims and scope Submit manuscript

V. V. Savchenko ORCID: orcid.org/0000-0003-3045-3337¹

58 Accesses
9 Citations
Explore all metrics

Abstract

The phenomenon of the voice signal acoustic variability in automatic speech recognition systems is considered. There are two varieties—intra- and inter-speaker speech variability. The probabilistic cluster model of minimal speech units in the Kullback–Leibler information metric is used for their mathematical description and comparison in magnitude. On its basis, theoretical estimates of the voice signal acoustic variability for each of its varieties are obtained separately. The effect of information security in systems with tuning to the authorized user voice is described and quantitatively characterized. The intra-speaker variability is negligible in comparison with the inter-speaker variability of speech, and therefore does not have a noticeable harmful effect on the effectiveness of automatic speech recognition. The computational experiment is set up to confirm and develop the theoretical research results, where two speech streams from two different speakers are considered. The author’s software is used for its implementation. According to the experimental results we find that the level of inter-speaker speech variability in a number of cases goes beyond the inter-phonemic differences within a homogeneous speech flow. Therefore, in systems with tuning to the speaker voice, the effect of voice signal acoustic variability is not only unambiguously generally positive, namely: it is an information protection from unauthorized access, but also it is significant in terms of probability-theoretic relation. The obtained results are intended for the development of new and modernization of existing systems for automatic speech recognition, designed to work in a standalone mode.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Method for Measuring the Indicator of Acoustic Quality of Audio Recordings Prepared for Registration and Processing in the Unified Biometric System

Article 01 March 2020

Investigation of Speech Signal Parameters Reflecting the Truth of Transmitted Information

Mobile microphone robust acoustic feature identification using coefficient of variance

Article Open access 02 August 2021

Notes

Here we mean the obvious circumstance that the reference signals x _k,r in general case can have a dimension that is much larger than the dimension of the speech frame x [12], but in this case it is not essential.
The program website: https://sites.google.com/site/frompldcreators/produkty-1/phonemetraining .
Sound file “5. Sounds and letters” on the site [in Russian] https://mixmuz.ru/mp3/%D0%B3%D0%BB%D0%B0%D1%81%D0%BD%D1%8B%D0%B5%20%D0%B7%D0%B2%D1%83%D0%BA%D0%B8 .
Recommendations site P.1120: https://www.itu.int/ITU-T/recommendations/rec.aspx?id=13176 .

References

L. Rabiner, R. Schafer, Theory and Applications of Digital Speech Processing (Pearson, Boston, 2010). URI: https://www.amazon.com/Theory-Applications-Digital-Speech-Processing/dp/0136034284.
Google Scholar
I. B. Tampel, "Automatic speech recognition – the main stages over last 50 years," Sci. Tech. J. Inf. Technol. Mech. Opt., v.100, n.6, p.957 (2015). DOI: https://doi.org/10.17586/2226-1494-2015-15-6-957-968.
Article Google Scholar
D. Yu, L. Deng, Automatic Speech Recognition (Springer London, London, 2015). DOI: https://doi.org/10.1007/978-1-4471-5779-3.
Book MATH Google Scholar
A. Rogowski, "Industrially oriented voice control system," Robot. Comput. Manuf., v.28, n.3, p.303 (2012). DOI: https://doi.org/10.1016/j.rcim.2011.09.010.
Article Google Scholar
M. Schuster, "Speech Recognition for Mobile Devices at Google," in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Springer, Berlin, Heidelberg, 2010). DOI: https://doi.org/10.1007/978-3-642-15246-7_3.
Chapter Google Scholar
R. Rammohan, N. Dhanabalsamy, V. Dimov, F. J. Eidelman, "Smartphone Conversational Agents (Apple Siri, Google, Windows Cortana) and Questions about Allergy and Asthma Emergencies," J. Allergy Clin. Immunol., v.139, n.2, p.AB250 (2017). DOI: https://doi.org/10.1016/j.jaci.2016.12.804.
Article Google Scholar
V. V. Savchenko, A. V. Savchenko, "Information-theoretic analysis of efficiency of the phonetic encoding–decoding method in automatic speech recognition," J. Commun. Technol. Electron., v.61, n.4, p.430 (2016). DOI: https://doi.org/10.1134/S1064226916040112.
Article Google Scholar
R. A. Ustinov, "Specific features of modern voice protection systems," Bezop. Inf. Tehnol., v.24, n.4, p.71 (2017). DOI: https://doi.org/10.26583/bit.2017.4.08.
Article Google Scholar
Z. Wu, Information Hiding in Speech Signal for Secure Communication (Elsevier, Amsterdam, 2015). DOI: https://doi.org/10.1016/C2013-0-19179-9.
Book Google Scholar
S. M. Qaisar, N. Hainmad, R. Khan, R. Asfour, "A Speech to Machine Interface Based on Perceptual Linear Prediction and Classification," in 2019 Advances in Science and Engineering Technology International Conferences (ASET) (IEEE, Washington, 2019). DOI: https://doi.org/10.1109/ICASET.2019.8714304.
Chapter Google Scholar
R. González Hautamäki, M. Sahidullah, V. Hautamäki, T. Kinnunen, "Acoustical and perceptual study of voice disguise by age modification in speaker verification," Speech Commun., v.95, p.1 (2017). DOI: https://doi.org/10.1016/j.specom.2017.10.002.
Article Google Scholar
V. V. Savchenko, "Minimum of Information Divergence Criterion for Signals with Tuning to Speaker Voice in Automatic Speech Recognition," Radioelectron. Commun. Syst., v.63, n.1, p.42 (2020). DOI: https://doi.org/10.3103/S0735272720010045.
Article Google Scholar
S. Heald, S. Klos, H. Nusbaum, "Understanding Speech in the Context of Variability," in Neurobiology of Language (Academic Press, Cambridge, MA, 2016). DOI: https://doi.org/10.1016/B978-0-12-407794-2.00017-1.
Chapter Google Scholar
I. A. Sieber, G. A. Moroz, "Estimating the Acoustic Variation of s via Principal Component Analysis," NSU Vestnik. Ser. Linguist. Intercult. Commun., v.17, n.1, p.49 (2019). DOI: https://doi.org/10.25205/1818-7935-2019-17-1-49-64.
Article Google Scholar
J. H. L. Hansen, H. Bořil, "On the issues of intra-speaker variability and realism in speech, speaker, and language recognition tasks," Speech Commun., v.101, n.0, p.94 (2018). DOI: https://doi.org/10.1016/j.specom.2018.05.004.
Article Google Scholar
N. А. Krasheninnikova, "Main Factors Interfering with Recognition of Speech Commands," Simbirsk Sci. Bull., v.0, n.1, p.201 (2011).
Google Scholar
V. V. Savchenko, L. V. Savchenko, "Method for Measuring the Intelligibility of Speech Signals in the Kullback–Leibler Information Metric," Meas. Tech., v.62, n.9, p.832 (2019). DOI: https://doi.org/10.1007/s11018-019-01702-1.
Article Google Scholar
O. F. Krivnova, "Prosodic phrasing in spoken text: localization of breathing pauses," in Computational linguistics and intelligent technologies: based on the materials of the international conference (Dialog, Moscow, 2016). URI: http://www.dialog-21.ru/media/3404/krivnovaof.pdf.
Google Scholar
V. V. Savchenko, "Itakura–Saito Divergence as an Element of the Information Theory of Speech Perception," J. Commun. Technol. Electron., v.64, n.6, p.590 (2019). DOI: https://doi.org/10.1134/S1064226919060093.
Article Google Scholar
V. V. Savchenko, "Estimation of the Phonetic Speech Quality Using the Information Theoretic Approach," J. Commun. Technol. Electron., v.63, n.1, p.53 (2018). DOI: https://doi.org/10.1134/S1064226918010126.
Article Google Scholar
S. Kullback, Information Theory and Statistics (Dover Publications, New York, 1997). URI: https://www.amazon.com/Information-Theory-Statistics-Dover-Mathematics/dp/0486696847.
MATH Google Scholar
V. V. Savchenko, "Criterion for Minimum of Mean Information Deviation for Distinguishing Random Signals with Similar Characteristics," Radioelectron. Commun. Syst., v.61, n.9, p.419 (2018). DOI: https://doi.org/10.3103/S0735272718090042.
Article Google Scholar
V. V. Savchenko, A. V. Savchenko, "Criterion of Significance Level for Selection of Order of Spectral Estimation of Entropy Maximum," Radioelectron. Commun. Syst., v.62, n.5, p.223 (2019). DOI: https://doi.org/10.3103/S0735272719050042.
Article Google Scholar
H. B. Dwight, Tables of Integrals and Other Mathematical Data (Macmillan, New York, 1961).
MATH Google Scholar
"Linear prediction," in Springer Handbook of Speech Processing (Springer Berlin Heidelberg, Berlin, Heidelberg, 2008). DOI: https://doi.org/10.1007/978-3-540-49127-9.
Chapter Google Scholar
P. H. Müller, P. Neumann, R. Storm, "Tafeln der mathematischen Statistik," VEB Fachbuchverlag, v.0, n.0, p.279 (1973). URI: http://doi.wiley.com/10.1002/bimj.19740160816.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Nizhny Novgorod State Linguistic University, Nizhny Novgorod, Russia
V. V. Savchenko

Authors

V. V. Savchenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to V. V. Savchenko.

Ethics declarations

ADDITIONAL INFORMATION

V. V. Savchenko

The author declares that he has no conflict of interest.

The initial version of this paper in Russian is published in the journal “Izvestiya Vysshikh Uchebnykh Zavedenii. Radioelektronika,” ISSN 2307-6011 (Online), ISSN 0021-3470 (Print) on the link http://radio.kpi.ua/article/view/S0021347020100039 with DOI: https://doi.org/10.20535/S0021347020100039

About this article

Cite this article

Savchenko, V.V. Acoustic Variability of Voice Signal as Factor of Information Security for Automatic Speech Recognition Systems with Tuning to User Voice. Radioelectron.Commun.Syst. 63, 532–542 (2020). https://doi.org/10.3103/S0735272720100039

Download citation

Received: 05 March 2020
Revised: 14 July 2020
Accepted: 13 October 2020
Published: 14 December 2020
Issue Date: October 2020
DOI: https://doi.org/10.3103/S0735272720100039

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions