Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter Mouton August 17, 2019

Automatic transcription of the Polish newsreel

  • Danijel Koržinek EMAIL logo , Krzysztof Wołk , Łukasz Brocki and Krzysztof Marasek

Abstract

This paper describes an automatic transcription system for the Polish Newsreel, which is a collection of mid to late 20th century news segments presented in audio and video form. They are characterized by their use of archaic language and poor audio quality, which makes them a demanding problem for speech recognition systems. Acoustic and language models had to be retrained using data from in-domain corpora. During the adaptation of the models, experiments were carried out to select optimal adaptation parameters. The experiments showed that the adaptation of the speech recognition system to a narrow and clearly defined domain significantly increases its efficiency. The final word error rate obtained for this domain was 10.97%.


Danijel Koržinek Polish-Japanese Academy of Information Technology Koszykowa 86 02-008 Warszawa Poland

References

Bilmes, J.A. and K. Kirchhoff. 2003. “Factored language models and generalized parallel backoff”. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Companion volume of the Proceedings of HLT-NAACL 2003 – short papers – volume 2 Association for Computational Linguistics. 4–6.10.3115/1073483.1073485Search in Google Scholar

Brocki, Ł., K. Marasek and D. Koržinek. 2012a. “Connectionist language model for Polish”. Intelligent tools for building a scientific information platform Berlin: Springer. 243–250.10.1007/978-3-642-24809-2_15Search in Google Scholar

Brocki, Ł., K. Marasek and D. Koržinek. 2012b. “Multiple model text normalization for the Polish language”. International Symposium on Methodologies for Intelligent Systems Berlin: Springer. 143–148.10.1007/978-3-642-34624-8_17Search in Google Scholar

Chorowski, J.K, D. Bahdanau, D. Serdyuk, K. Cho and Y. Bengio. 2015. “Attention-based models for speech recognition”. Advances in neural information processing systems. 577–585.Search in Google Scholar

Cui, X., V. Goel and B. Kingsbury. 2015. “Data augmentation for deep neural network acoustic modeling”. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 23(9). 1469–1477.10.1109/TASLP.2015.2438544Search in Google Scholar

Demenko, G., S. Grocholewski, K. Klessa, J. Ogórkiewicz, A. Wagner, M. Lange, D. Śledziński and N. Cylwik. 2008. “JURISDIC: Polish speech database for taking dictation of legal texts”. Proceedings of LREC .Search in Google Scholar

Gage, P. 1994. “A new algorithm for data compression”. The C Users Journal 12(2). 23–38.Search in Google Scholar

Graves, A. and J. Schmidhuber. 2005. “Framewise phoneme classification with bidirectional LSTM and other neural network architectures”. Neural Networks 18(5–6). 602–610.10.1016/j.neunet.2005.06.042Search in Google Scholar

Joshua, T. and J.T. Goodman. 2001. “A bit of progress in language modeling extended version”. Machine. Learning and Applied Statistics Group Microsoft Research Technical Report MSR-TR-2001-72.Search in Google Scholar

Koržinek, D. 2017. “Transkrypcja fonetyczna kronik RP” [Phonetic transcription of the Polish newsreel]. <http://hdl.handle.net/11321/426>Search in Google Scholar

Marasek, K., D. Koržinek and Ł. Brocki. 2014. “System for automatic transcription of sessions of the Polish Senate”. Archives of Acoustics 39(4). 501–509.10.2478/aoa-2014-0054Search in Google Scholar

Maziarz, M., M. Piasecki and S. Szpakowicz. 2012. “Approaching plWordNet 2.0”. Proceedings of 6th International Global Wordnet Conference The Global Wordnet Association. 189–196.Search in Google Scholar

Mikolov, T., I. Sutskever, K. Chen, G. S. Corrado and J. Dean. 2013. “Distributed representations of words and phrases and their compositionality”. Advances in Neural Information Processing Systems 3111–3119.Search in Google Scholar

Pan, S.J., Q. Yang et al. 2010. “A survey on transfer learning”. IEEE Transactions on Knowledge and Data Engineering 22(10). 1345–1359.10.1109/TKDE.2009.191Search in Google Scholar

Pawłowski, A. 2016. “Chronological corpora: Challenges and opportunities of sequential analysis. The example of ChronoPress corpus of Polish”. Digital Humanities 2016. 311–313.Search in Google Scholar

Sennrich, R., B. Haddow and A. Birch. 2015. “Neural machine translation of rare words with subword units”. arXiv preprint arXiv:1508.07909.10.18653/v1/P16-1162Search in Google Scholar

Shannon, C.E. and W. Weaver. 1949. The mathematical theory of information Urbana, IL: University of Illinois Press.Search in Google Scholar

Soutner, D. and L. Müller. 2015. “On continuous space word representations as input of LSTM language model”. International Conference on Statistical Language and Speech Processing. Berlin: Springer. 267–274.10.1007/978-3-319-25789-1_25Search in Google Scholar

Stolcke, A. 2000. “Entropy-based pruning of backoff language models”. arXiv preprint cs/0006025.Search in Google Scholar

Stolcke, A. 2002. “SRILM – An extensible language modeling toolkit”. Seventh International Conference on Spoken Language Processing10.21437/ICSLP.2002-303Search in Google Scholar

Sundermeyer, M., R. Schlüter and H. Ney. 2012. “LSTM neural networks for language modeling”. Thirteenth Annual Conference of the International Speech Communication Association10.21437/Interspeech.2012-65Search in Google Scholar

Tiedemann, J. 2009. “News from OPUS-A collection of multilingual parallel corpora with tools and interfaces”. Recent advances in natural language processing (vol. 5). 237–248.10.1075/cilt.309.19tieSearch in Google Scholar

Wang, D. and T.F. Zheng. 2015. “Transfer learning for speech and language processing”. arXiv preprint arXiv:1511.06066.Search in Google Scholar

Werbos, P. 1990. “Backpropagation through time: What it does and how to do it”. Proceedings of the IEEE 78(10). 1550–1560.10.1109/5.58337Search in Google Scholar

Wołk, A., K. Wołk and K. Marasek. 2017. “Analysis of complexity between spoken and written language for statistical machine translation in West-Slavic group”. Multimedia and network information systems. Berlin: Springer. 251–260.10.1007/978-3-319-43982-2_22Search in Google Scholar

Wołk, K. and K. Marasek. 2013. “Polish–English speech statistical machine translation systems for the IWSLT 2013”. IWSLT 2013 Conference Proceedings 113–119.Search in Google Scholar

Wołk, K. and K. Marasek. 2014. “Building subject-aligned comparable corpora and mining it for truly parallel sentence pairs”. Procedia Technology 18. 126–132.10.1016/j.protcy.2014.11.024Search in Google Scholar

Wołk, K. and A. Wołk. 2018. “Augmenting SMT with semantically-generated virtual-parallel corpora from monolingual texts”. World Conference on Information Systems and Technologies Berlin: Springer. 358–374.10.1007/978-3-319-77703-0_37Search in Google Scholar

Wołk, K., A. Wołk and K. Marasek. 2017. “Big data language model of contemporary Polish”. 2017 Federated Conference on Computer Science and Information SystemsFEDCSIS 389–395.10.15439/2017F432Search in Google Scholar

Xiong, W., J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu and G. Zweig. 2016. “Achieving human parity in conversational speech recognition”. arXiv preprint arXiv:1610.05256.10.1109/TASLP.2017.2756440Search in Google Scholar

Ziółko, B., T. Jadczyk, D. Skurzok, P. Żelasko, J. Gałka, T. Pędzimąż, I. Gawlik and S. Pałka. 2015. “SARMATA 2.0 automatic Polish language speech recognition system”. Sixteenth Annual Conference of the International Speech Communication AssociationSearch in Google Scholar

Published Online: 2019-08-17
Published in Print: 2019-06-26

© 2019 Faculty of English, Adam Mickiewicz University, Poznań, Poland

Downloaded on 19.4.2024 from https://www.degruyter.com/document/doi/10.1515/psicl-2019-0008/html
Scroll to top button