Gain-optimized spectral distortions for pronunciation training

Savchenko, Andrey V.; Savchenko, Vladimir V.; Savchenko, Lyudmila V.

doi:10.1007/s11590-021-01790-5

Gain-optimized spectral distortions for pronunciation training

Original Paper
Published: 31 July 2021

Volume 16, pages 2095–2113, (2022)
Cite this article

Optimization Letters Aims and scope Submit manuscript

136 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

This paper considers an assessment and evaluation of speech sound pronunciation quality in computer-aided language learning systems. We examine the gain optimization of spectral distortion measures between the speech signals of a native speaker and a learner. During training, a learner has to achieve stable pronunciation of all sounds. This is measured by computing the distances between the sounds produced by the learner and the model speaker. In order to improve pronunciation, it is proposed to adapt the linear prediction coding coefficients of reference sounds by using the gradient descent optimization of the gain-optimized dissimilarity. As a result, we demonstrate the possibility of synthesizing sounds that will be either close to the model pronunciation or achievable by a learner. An experimental study shows that the proposed procedure leads to high efficiency for pronunciation training even in the presence of noise in the observed utterance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

Mishaim Malik, Muhammad Kamran Malik, … Imran Makhdoom

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Article 29 September 2022

Yogesh Kumar, Apeksha Koul & Chamkaur Singh

Conventional and contemporary approaches used in text to speech synthesis: a review

Article 13 November 2022

Navdeep Kaur & Parminder Singh

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. The subset (speech-data.zip) for two speakers suitable to reproduce most of our experiments, is publicly available (https://drive.google.com/drive/folders/1bk1VGNP4fPGwPckCcC-5BvHXsGYgwtUx).

Notes

https://drive.google.com/drive/folders/1bk1VGNP4fPGwPckCcC-5BvHXsGYgwtUx.

References

Agarwal, C., Chakraborty, P.: A review of tools and techniques for computer aided pronunciation training (CAPT) in English. Educ. Inf. Technol. 24(6), 3731–3743 (2019)
Article Google Scholar
Arias, J.P., Yoma, N.B., Vivanco, H.: Automatic intonation assessment for computer aided language learning. Speech Commun. 52(3), 254–267 (2010)
Article Google Scholar
Bastos, I., Oliveira, L.B., Goes, J., Silva, M.: MOSFET-only wideband LNA with noise cancelling and gain optimization. In: Proceedings of the 17th International Conference Mixed Design of Integrated Circuits and Systems (MIXDES), pp. 306–311. IEEE (2010)
Benesty, J., Sondhi, M.M., Huang, Y.: Springer Handbook of Speech Processing. Springer, Berlin (2007)
Google Scholar
Ding, S., Liberatore, C., Sonsaat, S., Lučić, I., Silpachai, A., Zhao, G., Chukharev-Hudilainen, E., Levis, J., Gutierrez-Osuna, R.: Golden speaker builder-an interactive tool for pronunciation training. Speech Commun. 115, 51–66 (2019)
Article Google Scholar
Dionelis, N., Brookes, M.: Speech enhancement using modulation-domain Kalman filtering with active speech level normalized log-spectrum global priors. In: Proceedings of the 25th European Signal Processing Conference (EUSIPCO), pp. 2309–2313. IEEE (2017)
Elaraby, M.S., Abdallah, M., Abdou, S., Rashwan, M.: A deep neural networks (DNN) based models for a computer aided pronunciation learning system. In: International Conference on Speech and Computer (SPECOM), pp. 51–58. Springer (2016)
Erkelens, J., Jensen, J., Heusdens, R.: A data-driven approach to optimizing spectral speech enhancement methods for various error criteria. Speech Commun. 49(7–8), 530–541 (2007)
Article Google Scholar
Franco, H., Bratt, H., Rossier, R., Rao Gadde, V., Shriberg, E., Abrash, V., Precoda, K.: Eduspeak®: a speech recognition and pronunciation scoring toolkit for computer-aided language learning applications. Language Test. 27(3), 401–418 (2010)
Article Google Scholar
Golonka, E.M., Bowles, A.R., Frank, V.M., Richardson, D.L., Freynik, S.: Technologies for foreign language learning: a review of technology types and their effectiveness. Comput. Assisted Language Learn. 27(1), 70–105 (2014)
Article Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT press, Cambridge (2016)
MATH Google Scholar
Gray, R., Buzo, A., Gray, A., Matsuyama, Y.: Distortion measures for speech processing. IEEE Trans. Acoustics Speech Signal Process. 28(4), 367–376 (1980)
Article Google Scholar
Haikun, T., Shiying, W., Xinsheng, L., Yue, X.G.: Speech recognition model based on deep learning and application in pronunciation quality evaluation system. In: Proceedings of the International Conference on Data Mining and Machine Learning, pp. 1–5 (2019)
Han, K.I., Park, H.J., Lee, K.M.: Speech recognition and lip shape feature extraction for English vowel pronunciation of the hearing-impaired based on SVM technique. In: Proceedings of the International Conference on Big Data and Smart Computing (BigComp), pp. 293–296. IEEE (2016)
Hu, W., Qian, Y., Soong, F.K.: A new DNN-based high quality pronunciation evaluation for computer-aided language learning (CALL). In: Proceedings of Interspeech, pp. 1886–1890 (2013)
Huang, G., Ye, J., Shen, Y., Zhou, Y.: A evaluating model of English pronunciation for Chinese students. In: Proceedings of the 9th International Conference on Communication Software and Networks (ICCSN), pp. 1062–1065. IEEE (2017)
Itakura, F., Saito, S.: Analysis synthesis telephony based on the maximum likelihood method. In: Proceedings of the 6th International Congress on Acoustics, pp. 17–20 (1968)
Kneller, E., Karaulnyh, D.: System and method of converting voice signal into transcript presentation with metadata (2016). RU Patent 2589851 C2
Kullback, S.: Information Theory and Statistics. Dover Publications, New York (1997)
MATH Google Scholar
Marple, S.L., Jr.: Digital Spectral Analysis with Applications, 2nd edn. Courier Dover Publications, New York (2019)
Google Scholar
Mošner, L., Wu, M., Raju, A., Parthasarathi, S.H.K., Kumatani, K., Sundaram, S., Maas, R., Hoffmeister, B.: Improving noise robustness of automatic speech recognition via parallel data and teacher-student learning. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6475–6479. IEEE (2019)
Savchenko, A.V., Savchenko, L.V.: Towards the creation of reliable voice control system based on a fuzzy approach. Pattern Recogn. Letts. 65, 145–151 (2015)
Article Google Scholar
Savchenko, A.V., Savchenko, V.V., Savchenko, L.V.: Optimization of gain in symmetrized itakura-saito discrimination for pronunciation learning. In: Proceedings of International Conference on Mathematical Optimization Theory and Operations Research (MOTOR), pp. 440–454. Springer (2020)
Savchenko, L.V., Savchenko, A.V.: Fuzzy phonetic decoding method in a phoneme recognition problem. In: International Conference on Nonlinear Speech Processing (NOLISP), pp. 176–183. Springer (2013)
Savchenko, V.V.: Minimum of information divergence criterion for signals with tuning to speaker voice in automatic speech recognition. Radioelectron. Commun. Syst. 63(1), 42–54 (2020)
Article MathSciNet Google Scholar
Savchenko, V.V., Savchenko, L.V.: Method for measuring the intelligibility of speech signals in the Kullback-Leibler information metric. Measurement Tech. 62(9), 832–839 (2019)
Article Google Scholar
Srinivasan, A., Yarra, C., Ghosh, P.K.: Automatic assessment of pronunciation and its dependent factors by exploring their interdependencies using DNN and LSTM. In: Proceedings of the 8th ISCA Workshop on Speech and Language Technology in Education (SLaTE), pp. 30–34 (2019)
Su, H.Y., Gao, Y.: Adaptive gain reduction for encoding a speech signal (2016). US Patent 9,269,365
Sudhakara, S., Ramanathi, M.K., Yarra, C., Ghosh, P.K.: An improved goodness of pronunciation (GoP) measure for pronunciation evaluation with DNN-HMM system considering hmm transition probabilities. Proceedings of Interspeech. pp. 954–958 (2019)
Sztahó, D., Kiss, G., Vicsi, K.: Computer based speech prosody teaching system. Comput Speech Language 50, 126–140 (2018)
Article Google Scholar
Tejedor-García, C., Escudero, D., Cámara-Arenas, E., González-Ferreras, C., Cardeñoso-Payo, V.: Assessing pronunciation improvement in students of english using a controlled computer-assisted pronunciation tool. IEEE Transactions on Learning Technologies (2020)
Xiao, Y., Soong, F., Hu, W.: Paired phone-posteriors approach to ESL pronunciation quality assessment. Proceedings of Interspeech pp. 1631–1635 (2018)
Zhang, Z., Wang, Y., Yang, J.: Text-conditioned transformer for automatic pronunciation error detection. Speech Commun. 130, 55–63 (2021)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Algorithms and Technologies for Network Analysis, HSE University, Nizhny Novgorod, Russia
Andrey V. Savchenko
Nizhny Novgorod State Linguistic University, Nizhny Novgorod, Russia
Vladimir V. Savchenko
HSE University, Nizhny Novgorod, Russia
Lyudmila V. Savchenko

Authors

Andrey V. Savchenko
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir V. Savchenko
View author publications
You can also search for this author in PubMed Google Scholar
Lyudmila V. Savchenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrey V. Savchenko.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Section 4 was prepared within the framework of the Basic Research Program at the National Research University Higher School of Economics (HSE). The remaining work is supported by RSF (Russian Science Foundation) Grant 20-71-10010.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Savchenko, A.V., Savchenko, V.V. & Savchenko, L.V. Gain-optimized spectral distortions for pronunciation training. Optim Lett 16, 2095–2113 (2022). https://doi.org/10.1007/s11590-021-01790-5

Download citation

Received: 05 December 2020
Accepted: 22 July 2021
Published: 31 July 2021
Issue Date: September 2022
DOI: https://doi.org/10.1007/s11590-021-01790-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gain-optimized spectral distortions for pronunciation training

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Conventional and contemporary approaches used in text to speech synthesis: a review

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Gain-optimized spectral distortions for pronunciation training

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Conventional and contemporary approaches used in text to speech synthesis: a review

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation