Skip to main content
Log in

Method for Reduction of Speech Signal Autoregression Model for Speech Transmission Systems on Low-Speed Communication Channels

  • Published:
Radioelectronics and Communications Systems Aims and scope Submit manuscript

Abstract

In this paper it is considered the problem of reduction or reduction of the order p ≫ 1 of an autoregressive model (AR-model) of a speech signal by the criterion of minimum loss of useful information. The problem is formulated as an optimization problem in terms of discrete spectral modeling. It is indicated that the most acute problem in solving is the necessity to scale the AR-model parameters for the simulated signal at each step of iterative calculation process. To overcome this problem, it is proposed to use the measure of information divergence of signals in the frequency domain with the property of scale invariance as the goal functional. On its basis, a new method of the AR-model reduction is developed where the scaling operation exceeds the limits of the iterative optimization procedure. The effectiveness of the proposed method is substantiated theoretically and researched experimentally. It is shown that the main component of the achieved effect is the gain in accuracy of the reduced AR-model in the Kullback–Leibler information metric. The results obtained are addressed to researchers and developers of systems and technologies for digital speech transmission over low-speed communication channels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 1.
Fig. 5.

Similar content being viewed by others

Notes

  1. https//www.itu.int/rec/T-REC-G/en.

  2. https//dic.academic.ru/dic.nsf/ruwiki/614146.

  3. GOST R 50840-95 "Speech transmission over communication channels. Methods for assessing quality, legibility and recognition".

  4. GOST R 51061-97 Systems of low-speed speech transmission over digital channels. Speech quality options and measurement methods.

  5. Taking into account the features of the acoustic mechanism of speech production [1] and AR-model (1), in this case, the possibility of equality to zero of both the PSD G(F) and its estimate \( \hat{G}(f) \) are excluded from consideration throughout their definition area.

  6. An increase in the convergence rate of iterations (9) by increasing the step γ0 is accompanied [24] by the inevitable decrease in the accuracy of the vector of final approximations bp+1(L).

  7. https://sites.google.com/site/frompldcreators/produkty-1/phonemetraining.

  8. http://www.itu.int/rec/T-REC-G.728-201206-I/en.

References

  1. G. Kitagawa, Introduction to Time Series Modeling (Chapman and Hall/CRC, 2020). DOI: https://doi.org/10.1201/9780429197963.

    Book  Google Scholar 

  2. L. Tan, J. Jiang, "Introduction to digital signal processing," in Digital Signal Processing (Elsevier, 2019). DOI: https://doi.org/10.1016/B978-0-12-815071-9.00001-4.

    Chapter  Google Scholar 

  3. L. R. Rabiner, R. W. Schafer, "Introduction to digital speech processing," Found. Trends® Signal Process., v.1, n.1–2, p.1 (2007). DOI: https://doi.org/10.1561/2000000001.

    Article  MATH  Google Scholar 

  4. M. W. Spratling, "A review of predictive coding algorithms," Brain Cogn., v.112, p.92 (2017). DOI: https://doi.org/10.1016/j.bandc.2015.11.003.

    Article  Google Scholar 

  5. G. Sharma, K. Umapathy, S. Krishnan, "Trends in audio signal feature extraction methods," Appl. Acoust., v.158, p.107020 (2020). DOI: https://doi.org/10.1016/j.apacoust.2019.107020.

    Article  Google Scholar 

  6. H. Chaouch, F. Merazka, P. Marthon, "Multiple description coding technique to improve the robustness of ACELP based coders AMR-WB," Speech Commun., v.108, p.33 (2019). DOI: https://doi.org/10.1016/j.specom.2019.02.002.

    Article  Google Scholar 

  7. V. V. Savchenko, A. V. Savchenko, "Method for measuring distortions in speech signals during transmission over a communication channel to a biometric identification system," Meas. Tech., v.63, n.11, p.917 (2021). DOI: https://doi.org/10.1007/s11018-021-01864-x.

    Article  Google Scholar 

  8. Y. Gu, H.-L. Wei, "A robust model structure selection method for small sample size and multiple datasets problems," Inf. Sci., v.451–452, p.195 (2018). DOI: https://doi.org/10.1016/j.ins.2018.04.007.

    Article  MATH  Google Scholar 

  9. S. Cui, E. Li, X. Kang, "Autoregressive model based smoothing Forensics of very short speech clips," in 2020 IEEE International Conference on Multimedia and Expo (ICME) (IEEE, 2020). DOI: https://doi.org/10.1109/ICME46284.2020.9102765.

    Chapter  Google Scholar 

  10. S. L. Marple, Digital Spectral Analysis with Applications (Dover Publications, Mineola, New York, 2019). URI: https://www.goodreads.com/book/show/19484239.

    Google Scholar 

  11. J. Benesty, J. Chen, Y. Huang, "Linear prediction," in Springer Handbook of Speech Processing (Springer Berlin Heidelberg, Berlin, Heidelberg, 2008). DOI: https://doi.org/10.1007/978-3-540-49127-9_7.

    Chapter  Google Scholar 

  12. J. Gibson, "Mutual information, the linear prediction model, and CELP voice codecs," Information, v.10, n.5, p.179 (2019). DOI: https://doi.org/10.3390/info10050179.

    Article  Google Scholar 

  13. Ç. Candan, "Making linear prediction perform like maximum likelihood in Gaussian autoregressive model parameter estimation," Signal Process., v.166, p.107256 (2020). DOI: https://doi.org/10.1016/j.sigpro.2019.107256.

    Article  Google Scholar 

  14. D. Xiao, F. Mo, Y. Zhang, M. Zhao, L. Ma, "An extended Levinson-Durbin algorithm and its application in mixed excitation linear prediction," Heliyon, v.4, n.11, p.e00948 (2018). DOI: https://doi.org/10.1016/j.heliyon.2018.e00948.

    Article  Google Scholar 

  15. M. Morise, "CheapTrick, a spectral envelope estimator for high-quality speech synthesis," Speech Commun., v.67, p.1 (2015). DOI: https://doi.org/10.1016/j.specom.2014.09.003.

    Article  Google Scholar 

  16. V. Y. Semenov, "Methods for calculating and coding the parameters of autoregressive speech model when developing the vocoder based on fixed point signal process," J. Autom. Inf. Sci., v.51, n.2, p.30 (2019). DOI: https://doi.org/10.1615/JAutomatInfScien.v51.i2.40.

    Article  Google Scholar 

  17. V. V. Savchenko, A. V. Savchenko, "Guaranteed significance level criterion in automatic speech signal segmentation," J. Commun. Technol. Electron., v.65, n.11, p.1311 (2020). DOI: https://doi.org/10.1134/S1064226920110157.

    Article  Google Scholar 

  18. A. V. Savchenko, V. V. Savchenko, "A method for measuring the pitch frequency of speech signals for the systems of acoustic speech analysis," Meas. Tech., v.62, n.3, p.282 (2019). DOI: https://doi.org/10.1007/s11018-019-01617-x.

    Article  Google Scholar 

  19. C. Liu, M. Jiang, "Robust adaptive filter with lncosh cost," Signal Process., v.168, p.107348 (2020). DOI: https://doi.org/10.1016/j.sigpro.2019.107348.

    Article  Google Scholar 

  20. S. Kullback, Information Theory and Statistics (Dover Publications, New York, 1997). URI: https://www.amazon.com/Information-Theory-Statistics-Dover-Mathematics/dp/0486696847.

    MATH  Google Scholar 

  21. V. V. Savchenko, A. V. Savchenko, "Criterion of significance level for selection of order of spectral estimation of entropy maximum," Radioelectron. Commun. Syst., v.62, n.5, p.223 (2019). DOI: https://doi.org/10.3103/S0735272719050042.

    Article  Google Scholar 

  22. V. V. Savchenko, L. V. Savchenko, "Speech signal autoregression modeling based on the discrete Fourier transform and scale-invariant measure of information discrimination," J. Commun. Technol. Electron., v.66, n.11, p.1266 (2021). DOI: https://doi.org/10.1134/S1064226921110085.

    Article  Google Scholar 

  23. F. Mustiere, M. Bouchard, M. Bolic, "All-pole modeling of discrete spectral powers: A unified approach," IEEE Trans. Audio, Speech, Lang. Process., v.20, n.2, p.705 (2012). DOI: https://doi.org/10.1109/TASL.2011.2163511.

    Article  Google Scholar 

  24. A. R. Sampson, "Stochastic Approximation," in Wiley StatsRef: Statistics Reference Online (Wiley, 2014). DOI: https://doi.org/10.1002/9781118445112.stat01848.

    Chapter  Google Scholar 

  25. V. V. Savchenko, "Minimum of information divergence criterion for signals with tuning to speaker voice in automatic speech recognition," Radioelectron. Commun. Syst., v.63, n.1, p.42 (2020). DOI: https://doi.org/10.3103/S0735272720010045.

    Article  Google Scholar 

  26. A. V. Savchenko, V. V. Savchenko, "Scale-invariant modification of COSH distance for measuring speech signal distortions in real-time mode," Radioelectron. Commun. Syst., v.64, n.6, p.300 (2021). DOI: https://doi.org/10.3103/S0735272721060030.

    Article  Google Scholar 

  27. V. V. Savchenko, "Itakura–Saito divergence as an element of the information theory of speech perception," J. Commun. Technol. Electron., v.64, n.6, p.590 (2019). DOI: https://doi.org/10.1134/S1064226919060093.

    Article  Google Scholar 

  28. R. Gray, A. Buzo, A. Gray, Y. Matsuyama, "Distortion measures for speech processing," IEEE Trans. Acoust. Speech, Signal Process., v.28, n.4, p.367 (1980). DOI: https://doi.org/10.1109/TASSP.1980.1163421.

    Article  MATH  Google Scholar 

  29. E. Estrada, H. Nazeran, F. Ebrahimi, M. Mikaeili, "Symmetric Itakura distance as an EEG signal feature for sleep depth determination," in ASME 2009 Summer Bioengineering Conference, Parts A and B (American Society of Mechanical Engineers, 2009). DOI: https://doi.org/10.1115/SBC2009-206233.

    Chapter  Google Scholar 

  30. D. Wang, M. Yu, C. B. Low, S. Arogeti, Model-based Health Monitoring of Hybrid Systems (Springer New York, New York, NY, 2013). DOI: https://doi.org/10.1007/978-1-4614-7369-5.

    Book  Google Scholar 

  31. O. Diana, A. Mihaela, "Feature extraction and classification methods for a motor task brain computer interface: A comparative evaluation for two databases," Int. J. Adv. Comput. Sci. Appl., v.8, n.8 (2017). DOI: https://doi.org/10.14569/IJACSA.2017.080834.

    Article  Google Scholar 

  32. H. B. Kashani, A. Sayadiyan, "Sequential use of spectral models to reduce deletion and insertion errors in vowel detection," Comput. Speech Lang., v.50, p.105 (2018). DOI: https://doi.org/10.1016/j.csl.2017.12.008.

    Article  Google Scholar 

  33. J. Gibson, "Speech compression," Information, v.7, n.2, p.32 (2016). DOI: https://doi.org/10.3390/info7020032.

    Article  Google Scholar 

  34. G. Tamulevicius, J. Kaukenas, "High-order autoregressive modeling of individual speaker’s qualities," in 2017 5th IEEE Workshop on Advances in Information, Electronic and Electrical Engineering (AIEEE) (IEEE, 2017). DOI: https://doi.org/10.1109/AIEEE.2017.8270551.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. V. Savchenko.

Ethics declarations

ADDITIONAL INFORMATION

V.V. Savchenko

The author declares that he has no conflicts of interest.

This article does not contain any studies with human participants or animals performed by any of the authors.

The initial version of this paper in Russian is published in the journal “Izvestiya Vysshikh Uchebnykh Zavedenii. Radioelektronika,” ISSN 2307-6011 (Online), ISSN 0021-3470 (Print) on the link http://radio.kpi.ua/article/view/S0021347021110030 with DOI: https://doi.org/10.20535/S0021347021110030

Additional information

Translated from Izvestiya Vysshikh Uchebnykh Zavedenii. Radioelektronika, No. 11, pp. 682-695, November, 2021 https://doi.org/10.20535/S0021347021110030 .

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Savchenko, V.V. Method for Reduction of Speech Signal Autoregression Model for Speech Transmission Systems on Low-Speed Communication Channels. Radioelectron.Commun.Syst. 64, 592–603 (2021). https://doi.org/10.3103/S0735272721110030

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0735272721110030

Navigation