Abstract
In this paper it is considered the problem of reduction or reduction of the order p ≫ 1 of an autoregressive model (AR-model) of a speech signal by the criterion of minimum loss of useful information. The problem is formulated as an optimization problem in terms of discrete spectral modeling. It is indicated that the most acute problem in solving is the necessity to scale the AR-model parameters for the simulated signal at each step of iterative calculation process. To overcome this problem, it is proposed to use the measure of information divergence of signals in the frequency domain with the property of scale invariance as the goal functional. On its basis, a new method of the AR-model reduction is developed where the scaling operation exceeds the limits of the iterative optimization procedure. The effectiveness of the proposed method is substantiated theoretically and researched experimentally. It is shown that the main component of the achieved effect is the gain in accuracy of the reduced AR-model in the Kullback–Leibler information metric. The results obtained are addressed to researchers and developers of systems and technologies for digital speech transmission over low-speed communication channels.
Similar content being viewed by others
Notes
https//www.itu.int/rec/T-REC-G/en.
https//dic.academic.ru/dic.nsf/ruwiki/614146.
GOST R 50840-95 "Speech transmission over communication channels. Methods for assessing quality, legibility and recognition".
GOST R 51061-97 Systems of low-speed speech transmission over digital channels. Speech quality options and measurement methods.
https://sites.google.com/site/frompldcreators/produkty-1/phonemetraining.
http://www.itu.int/rec/T-REC-G.728-201206-I/en.
References
G. Kitagawa, Introduction to Time Series Modeling (Chapman and Hall/CRC, 2020). DOI: https://doi.org/10.1201/9780429197963.
L. Tan, J. Jiang, "Introduction to digital signal processing," in Digital Signal Processing (Elsevier, 2019). DOI: https://doi.org/10.1016/B978-0-12-815071-9.00001-4.
L. R. Rabiner, R. W. Schafer, "Introduction to digital speech processing," Found. Trends® Signal Process., v.1, n.1–2, p.1 (2007). DOI: https://doi.org/10.1561/2000000001.
M. W. Spratling, "A review of predictive coding algorithms," Brain Cogn., v.112, p.92 (2017). DOI: https://doi.org/10.1016/j.bandc.2015.11.003.
G. Sharma, K. Umapathy, S. Krishnan, "Trends in audio signal feature extraction methods," Appl. Acoust., v.158, p.107020 (2020). DOI: https://doi.org/10.1016/j.apacoust.2019.107020.
H. Chaouch, F. Merazka, P. Marthon, "Multiple description coding technique to improve the robustness of ACELP based coders AMR-WB," Speech Commun., v.108, p.33 (2019). DOI: https://doi.org/10.1016/j.specom.2019.02.002.
V. V. Savchenko, A. V. Savchenko, "Method for measuring distortions in speech signals during transmission over a communication channel to a biometric identification system," Meas. Tech., v.63, n.11, p.917 (2021). DOI: https://doi.org/10.1007/s11018-021-01864-x.
Y. Gu, H.-L. Wei, "A robust model structure selection method for small sample size and multiple datasets problems," Inf. Sci., v.451–452, p.195 (2018). DOI: https://doi.org/10.1016/j.ins.2018.04.007.
S. Cui, E. Li, X. Kang, "Autoregressive model based smoothing Forensics of very short speech clips," in 2020 IEEE International Conference on Multimedia and Expo (ICME) (IEEE, 2020). DOI: https://doi.org/10.1109/ICME46284.2020.9102765.
S. L. Marple, Digital Spectral Analysis with Applications (Dover Publications, Mineola, New York, 2019). URI: https://www.goodreads.com/book/show/19484239.
J. Benesty, J. Chen, Y. Huang, "Linear prediction," in Springer Handbook of Speech Processing (Springer Berlin Heidelberg, Berlin, Heidelberg, 2008). DOI: https://doi.org/10.1007/978-3-540-49127-9_7.
J. Gibson, "Mutual information, the linear prediction model, and CELP voice codecs," Information, v.10, n.5, p.179 (2019). DOI: https://doi.org/10.3390/info10050179.
Ç. Candan, "Making linear prediction perform like maximum likelihood in Gaussian autoregressive model parameter estimation," Signal Process., v.166, p.107256 (2020). DOI: https://doi.org/10.1016/j.sigpro.2019.107256.
D. Xiao, F. Mo, Y. Zhang, M. Zhao, L. Ma, "An extended Levinson-Durbin algorithm and its application in mixed excitation linear prediction," Heliyon, v.4, n.11, p.e00948 (2018). DOI: https://doi.org/10.1016/j.heliyon.2018.e00948.
M. Morise, "CheapTrick, a spectral envelope estimator for high-quality speech synthesis," Speech Commun., v.67, p.1 (2015). DOI: https://doi.org/10.1016/j.specom.2014.09.003.
V. Y. Semenov, "Methods for calculating and coding the parameters of autoregressive speech model when developing the vocoder based on fixed point signal process," J. Autom. Inf. Sci., v.51, n.2, p.30 (2019). DOI: https://doi.org/10.1615/JAutomatInfScien.v51.i2.40.
V. V. Savchenko, A. V. Savchenko, "Guaranteed significance level criterion in automatic speech signal segmentation," J. Commun. Technol. Electron., v.65, n.11, p.1311 (2020). DOI: https://doi.org/10.1134/S1064226920110157.
A. V. Savchenko, V. V. Savchenko, "A method for measuring the pitch frequency of speech signals for the systems of acoustic speech analysis," Meas. Tech., v.62, n.3, p.282 (2019). DOI: https://doi.org/10.1007/s11018-019-01617-x.
C. Liu, M. Jiang, "Robust adaptive filter with lncosh cost," Signal Process., v.168, p.107348 (2020). DOI: https://doi.org/10.1016/j.sigpro.2019.107348.
S. Kullback, Information Theory and Statistics (Dover Publications, New York, 1997). URI: https://www.amazon.com/Information-Theory-Statistics-Dover-Mathematics/dp/0486696847.
V. V. Savchenko, A. V. Savchenko, "Criterion of significance level for selection of order of spectral estimation of entropy maximum," Radioelectron. Commun. Syst., v.62, n.5, p.223 (2019). DOI: https://doi.org/10.3103/S0735272719050042.
V. V. Savchenko, L. V. Savchenko, "Speech signal autoregression modeling based on the discrete Fourier transform and scale-invariant measure of information discrimination," J. Commun. Technol. Electron., v.66, n.11, p.1266 (2021). DOI: https://doi.org/10.1134/S1064226921110085.
F. Mustiere, M. Bouchard, M. Bolic, "All-pole modeling of discrete spectral powers: A unified approach," IEEE Trans. Audio, Speech, Lang. Process., v.20, n.2, p.705 (2012). DOI: https://doi.org/10.1109/TASL.2011.2163511.
A. R. Sampson, "Stochastic Approximation," in Wiley StatsRef: Statistics Reference Online (Wiley, 2014). DOI: https://doi.org/10.1002/9781118445112.stat01848.
V. V. Savchenko, "Minimum of information divergence criterion for signals with tuning to speaker voice in automatic speech recognition," Radioelectron. Commun. Syst., v.63, n.1, p.42 (2020). DOI: https://doi.org/10.3103/S0735272720010045.
A. V. Savchenko, V. V. Savchenko, "Scale-invariant modification of COSH distance for measuring speech signal distortions in real-time mode," Radioelectron. Commun. Syst., v.64, n.6, p.300 (2021). DOI: https://doi.org/10.3103/S0735272721060030.
V. V. Savchenko, "Itakura–Saito divergence as an element of the information theory of speech perception," J. Commun. Technol. Electron., v.64, n.6, p.590 (2019). DOI: https://doi.org/10.1134/S1064226919060093.
R. Gray, A. Buzo, A. Gray, Y. Matsuyama, "Distortion measures for speech processing," IEEE Trans. Acoust. Speech, Signal Process., v.28, n.4, p.367 (1980). DOI: https://doi.org/10.1109/TASSP.1980.1163421.
E. Estrada, H. Nazeran, F. Ebrahimi, M. Mikaeili, "Symmetric Itakura distance as an EEG signal feature for sleep depth determination," in ASME 2009 Summer Bioengineering Conference, Parts A and B (American Society of Mechanical Engineers, 2009). DOI: https://doi.org/10.1115/SBC2009-206233.
D. Wang, M. Yu, C. B. Low, S. Arogeti, Model-based Health Monitoring of Hybrid Systems (Springer New York, New York, NY, 2013). DOI: https://doi.org/10.1007/978-1-4614-7369-5.
O. Diana, A. Mihaela, "Feature extraction and classification methods for a motor task brain computer interface: A comparative evaluation for two databases," Int. J. Adv. Comput. Sci. Appl., v.8, n.8 (2017). DOI: https://doi.org/10.14569/IJACSA.2017.080834.
H. B. Kashani, A. Sayadiyan, "Sequential use of spectral models to reduce deletion and insertion errors in vowel detection," Comput. Speech Lang., v.50, p.105 (2018). DOI: https://doi.org/10.1016/j.csl.2017.12.008.
J. Gibson, "Speech compression," Information, v.7, n.2, p.32 (2016). DOI: https://doi.org/10.3390/info7020032.
G. Tamulevicius, J. Kaukenas, "High-order autoregressive modeling of individual speaker’s qualities," in 2017 5th IEEE Workshop on Advances in Information, Electronic and Electrical Engineering (AIEEE) (IEEE, 2017). DOI: https://doi.org/10.1109/AIEEE.2017.8270551.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
ADDITIONAL INFORMATION
V.V. Savchenko
The author declares that he has no conflicts of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
The initial version of this paper in Russian is published in the journal “Izvestiya Vysshikh Uchebnykh Zavedenii. Radioelektronika,” ISSN 2307-6011 (Online), ISSN 0021-3470 (Print) on the link http://radio.kpi.ua/article/view/S0021347021110030 with DOI: https://doi.org/10.20535/S0021347021110030
Additional information
Translated from Izvestiya Vysshikh Uchebnykh Zavedenii. Radioelektronika, No. 11, pp. 682-695, November, 2021 https://doi.org/10.20535/S0021347021110030 .
About this article
Cite this article
Savchenko, V.V. Method for Reduction of Speech Signal Autoregression Model for Speech Transmission Systems on Low-Speed Communication Channels. Radioelectron.Commun.Syst. 64, 592–603 (2021). https://doi.org/10.3103/S0735272721110030
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0735272721110030