当前位置: X-MOL 学术Radioelectron. Commun. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Acoustic Variability of Voice Signal as Factor of Information Security for Automatic Speech Recognition Systems with Tuning to User Voice
Radioelectronics and Communications Systems Pub Date : 2020-12-14 , DOI: 10.3103/s0735272720100039
V. V. Savchenko

Abstract

The phenomenon of the voice signal acoustic variability in automatic speech recognition systems is considered. There are two varieties—intra- and inter-speaker speech variability. The probabilistic cluster model of minimal speech units in the Kullback–Leibler information metric is used for their mathematical description and comparison in magnitude. On its basis, theoretical estimates of the voice signal acoustic variability for each of its varieties are obtained separately. The effect of information security in systems with tuning to the authorized user voice is described and quantitatively characterized. The intra-speaker variability is negligible in comparison with the inter-speaker variability of speech, and therefore does not have a noticeable harmful effect on the effectiveness of automatic speech recognition. The computational experiment is set up to confirm and develop the theoretical research results, where two speech streams from two different speakers are considered. The author’s software is used for its implementation. According to the experimental results we find that the level of inter-speaker speech variability in a number of cases goes beyond the inter-phonemic differences within a homogeneous speech flow. Therefore, in systems with tuning to the speaker voice, the effect of voice signal acoustic variability is not only unambiguously generally positive, namely: it is an information protection from unauthorized access, but also it is significant in terms of probability-theoretic relation. The obtained results are intended for the development of new and modernization of existing systems for automatic speech recognition, designed to work in a standalone mode.



中文翻译:

语音信号的声音变异性是自动语音识别系统的信息安全性,并且需要调整到用户语音

摘要

考虑了自动语音识别系统中语音信号声学可变性的现象。有两种变体-扬声器内和扬声器间的语音可变性。Kullback-Leibler信息量度中最小语音单位的概率聚类模型用于数学描述和大小比较。在此基础上,分别获得每种语音信号的声音变异性的理论估计。描述并定量表征了在调整到授权用户语音的系统中信息安全的影响。与语音的说话者之间的可变性相比,说话者内部的可变性可以忽略,因此对自动语音识别的有效性没有明显的有害影响。建立计算实验是为了确认和发展理论研究结果,其中考虑了来自两个不同说话者的两个语音流。作者的软件用于其实现。根据实验结果,我们发现在许多情况下,说话者之间语音差异的程度超出了同质语音流中的音素差异。因此,在调谐扬声器语音的系统中,语音信号声学可变性的影响通常不仅明确无疑是积极的,即:它是防止未经授权访问的信息保护,而且在概率-理论关系方面也很重要。获得的结果旨在用于自动语音识别的现有系统的新型和现代化开发,

更新日期:2020-12-14
down
wechat
bug