Air traffic control communication (ATCC) speech corpora and their use for ASR and TTS development

Šmídl, Luboš; Švec, Jan; Tihelka, Daniel; Matoušek, Jindřich; Romportl, Jan; Ircing, Pavel

doi:10.1007/s10579-019-09449-5

Air traffic control communication (ATCC) speech corpora and their use for ASR and TTS development

Project Notes
Published: 19 February 2019

Volume 53, pages 449–464, (2019)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Luboš Šmídl¹,
Jan Švec¹,
Daniel Tihelka¹,
Jindřich Matoušek¹,
Jan Romportl¹ &
…
Pavel Ircing ORCID: orcid.org/0000-0001-6967-1687¹

702 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

The paper introduces the motivation for creating dedicated speech corpora of air traffic control communication, describes in detail the process of preparation of corpora for both automatic speech recognition and text-to-speech synthesis, presents an illustrative example of speech recognition system developed using the automatic speech recognition corpora and finally describes the technical aspects of the data and the distribution channel.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Which, in our case, does not in fact mean “high quality”—see the following sections.
https://www.clarin.eu/.
Although the ATC communication can be conducted in native language in regional air traffic, the use of English is naturally indispensable in international ATC.
https://catalog.ldc.upenn.edu/LDC94S14A.
http://catalog.elra.info/en-us/repository/browse/ELRA-S0293/.
See for example http://aviationknowledge.wikidot.com/aviation:nato-phonetic-alphabet.
Note that the assumption that we know the origin of the data is perfectly reasonable—in our “artificial pseudopilot” scenario, we also only expect the controller’s speech.
http://en.wikipedia.org/wiki/Arpabet.
http://itblp.zcu.cz/.

References

Barras, C., Geoffrois, E., Wu, Z., & Liberman, M. (2001). Transcriber: Development and use of a tool for assisting speech corpora production. Speech Communication—Special Issue on Speech Annotation and Corpus Tools, 33(1–2), 5–22.
Google Scholar
Delpech, E., Laignelet, M., Pimm, C., Raynal, C., Trzos, M., Arnold, A., & Pronto, D. (2018). A real-life, french-accented corpus of air traffic control communications. In Proceedings of the eleventh international conference on language resources and evaluation, LREC 2018.
Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87(4), 1738–1752.
Article Google Scholar
Hofbauer, K., Petrik, S., & Hering, H. (2008). The ATCOSIM corpus of non-prompted clean air traffic control speech. In Proceedings of the international conference on language resources and evaluation, LREC, 2008 (pp. 2147–2152).
Jelinek, F., Bahl, L., & Mercer, R. (1975). Design of a linguistic statistical decoder for the recognition of continuous speech. IEEE Transactions on Information Theory, 21(3), 250–256.
Article Google Scholar
Jůzová, M., & Tihelka, D. (2014). Minimum text corpus selection for limited domain speech synthesis. In P. Sojka, A. Horák, I. Kopeček, & K. Pala (Eds.), Text, speech and dialogue, volume 8655 of Lecture Notes in Computer Science (pp. 398–407). Berlin: Springer.
Google Scholar
Legát, M., Matoušek, J., & Tihelka, D. (2011). On the detection of pitch marks using a robust multi-phase algorithm. Speech Communication, 53, 552–566.
Article Google Scholar
Matoušek, J., & Romportl, J. (2008). Automatic pitch-synchronous phonetic segmentation. In Interspeech 2008, proceedings of 9th annual conference of International Speech Communication Association (pp. 1626–1629).
Matoušek, J., Tihelka, D., & Psutka, J. (2003). Automatic segmentation for Czech concatenative speech synthesis using statistical approach with boundary-specific correction. In Eurospeech 2003—interspeech, proceedings of the 8th european conference on speech communication and technology (pp. 301–304).
Matoušek, J., Tihelka, D., & Romportl, J. (2008). Building of a speech corpus optimised for unit selection TTS synthesis. In LREC 2008, proceedings of 6th international conference on language resources and evaluation (pp. 1296–1299). ELRA.
Pavlinović, M., Boras, D., & Francetić, I. (2013). First steps in designing air traffic control communication language technology system—Compiling spoken corpus of radiotelephony communication. International Journal of Computers and Communications, 7(3), 73–80.
Google Scholar
Pellegrini, T., Farinas, J., Delpech, E., & Lancelot, F. (2018). The airbus air traffic control speech recognition 2018 challenge: Towards ATC automatic transcription and call sign detection. arXiv:1810.12614.
Prcín, M., Müller, L., & Šmídl, L. (2002). Statistical based speech/non-speech detector with heuristic feature set. In 6th World multi-conference on systemics, cybernetics and informatics (SCI 2002)/8th international conference on information systems analysis and synthesis (ISAS 2002), Orlando, FL (pp. 264–269).
Šmídl, L., & Švec, J. (2014). Semantic entity detection in the spoken air traffic control data. In A. Ronzhin, R. Potapova, & V. Delic (Eds.), SPECOM 2014, volume 8773 of Lecture Notes in Computer Science (pp. 394–401). Berlin: Springer.
Google Scholar
Tihelka, D., Hanzlíček, Z., Jůzová, M., Vít, J., Matoušek, J., & Grůber, M. (2018). Current state of text-to-speech system ARTIC: A decade of research on the field of speech technologies. In P. Sojka, A. Horák, I. Kopeček, & K. Pala (Eds.), TSD 2018, volume 11107 of Lecture Notes in Computer Science (pp. 369–378). Berlin: Springer.
Google Scholar
Valenta, T., & Šmídl, L. (2015). WebTransc—A WWW interface for speech corpora production and processing. In A. Ronzhin, R. Potapova, & V. Fakotakis (Eds.), SPECOM 2015, volume 9319 of Lecture Notes in Computer Science (pp. 487–494). Berlin: Springer.
Google Scholar
Witten, I. H., & Bell, T. (1991). The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory, 37(4), 1085–1094.
Article Google Scholar

Download references

Funding

Funding was provided by Grantová Agentura České Republiky (Grant No. GBP103/12/G084).

Author information

Authors and Affiliations

Department of Cybernetics, Faculty of Applied Sciences, University of West Bohemia, Univerzitní 8, 306 14, Plzeň, Czech Republic
Luboš Šmídl, Jan Švec, Daniel Tihelka, Jindřich Matoušek, Jan Romportl & Pavel Ircing

Authors

Luboš Šmídl
View author publications
You can also search for this author in PubMed Google Scholar
Jan Švec
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Tihelka
View author publications
You can also search for this author in PubMed Google Scholar
Jindřich Matoušek
View author publications
You can also search for this author in PubMed Google Scholar
Jan Romportl
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Ircing
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pavel Ircing.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Šmídl, L., Švec, J., Tihelka, D. et al. Air traffic control communication (ATCC) speech corpora and their use for ASR and TTS development. Lang Resources & Evaluation 53, 449–464 (2019). https://doi.org/10.1007/s10579-019-09449-5

Download citation

Published: 19 February 2019
Issue Date: 15 September 2019
DOI: https://doi.org/10.1007/s10579-019-09449-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Air traffic control communication (ATCC) speech corpora and their use for ASR and TTS development

Abstract

Access this article

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation