Abstract
The availability of videos has grown rapidly in recent years. Finding and browsing relevant information to be automatically extracted from videos is not an easy task, but today it is an indispensable feature due to the immense number of digital products available. In this paper, we present a system which provides a process to automatically extract information from videos. We describe a system solution that uses a re-trained OpenNLP model to locate all the places and famous people included in a specific video. The system obtains information from the Google Knowledge Graph related to relevant named entities such as places or famous people. In this paper we will also present the Automatic Georeferencing Video (AGV) system developed by RAI (Radiotelevisione italiana, which is the national public broadcasting company of Italy, owned by the Ministry of Economy and Finance) Teche for the European Project “La Città Educante” (The Educating City: teaching and learning processes in cross-media ecosystem) Our system contributes to The Educating City project by providing the technological environment to create statistical models for automatic named entity recognition (NER), and has been implemented in the field of education, in Italian initially. The system has been applied to the learning challenges facing the world of educational media and has demonstrated how beneficial combining topical news content with scientific content can be in education.
Similar content being viewed by others
Notes
http://videolectures.net/. Last seen July 30, 2021.
http://www.evalita.it/. Last seen July 30, 2021.
http://www.teche.rai.it/. Last seen July 30, 2021.
https://www.google.com/maps. Last seen July 30, 2021.
https://www.bing.com/maps. Last seen July 30, 2021.
https://ffmpeg.org/. Last seen July 30, 2021.
https://opennlp.apache.org/. Last seen July 30, 2021.
https://developers.google.com/knowledge-graph. Last seen July 30, 2021.
https://www.postgresql.org/. Last seen July 30, 2021.
JavaScript Object Notation
https://www.rainews.it/tgr/rubriche/leonardo/. Last seen July 30, 2021.
The Ministry of Education, University and Research (in Italian: Ministero dell’Istruzione, dell’Università e della Ricerca or MIUR).
References
Basile P, Caputo A, Gentile AL, Rizzo G (2016) Overview of the evalita 2016 named entity recognition and linking in italian tweets (neel-it) task. In: the Final Workshop 7 December 2016, Naples, pp 40
Ceccarelli M, di Bisceglie M, Galdi C, Giangregorio G, Ullo SL (2008) Image registration using non-linear diffusion. In: IGARSS 2008 - IEEE International Geoscience and Remote Sensing Symposium, 5, pp 220–223
Chiu C-C, Sainath TN, Wu Y, Prabhavalkar R, Nguyen P, Chen Z, Kannan A, Weiss RJ, Rao K, Gonina E et al (2018) State-of-the-art speech recognition with sequence-to-sequence models. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 4774–4778. IEEE
Gaikwad SK, Gawali BW, Yannawar P (2010) A review on speech recognition technique. International Journal of Computer Applications 10(3):16–24
Giuliano R, Cardarilli GC, Cesarini C, Nunzio LD, Fallucchi F, Fazzolari R, Mazzenga F, Re M, Vizzarri A (2020) Indoor localization system based on bluetooth low energy for museum applications. Electronics, pp 1055
Golubovic N, Krintz C, Wolski R, Lafia S, Hervey T, Kuhn W (2016) Extracting spatial information from social media in support of agricultural management decisions. In: Proceedings of the 10th Workshop on Geographic Information Retrieval, pp 1–2
Han KJ, Chandrashekaran A, Kim J, Lane I (2017) The capio 2017 conversational speech recognition system. arXiv preprint arXiv:1801.00059
Hendricks AL, Wang O, Shechtman E, Sivic J, Darrell T, Russell B (2017) Localizing moments in video with natural language. In: Proceedings of the IEEE international conference on computer vision, pp 5803–5812
Hu W, Xie N, Li L, Zeng X, Maybank S (2011) A survey on visual content-based video indexing and retrieval. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 41(6):797–819
Jung JJ (2012) Online named entity recognition method for microtexts in social networking services: A case study of twitter. Expert Syst Appl 39(9):8066–8070. https://doi.org/10.1016/j.eswa.2012.01.136, http://www.sciencedirect.com/science/article/pii/S0957417412001546
Kelm P, Schmiedeke S, Sikora T (2012) Multimodal geo-tagging in social media websites using hierarchical spatial segmentation. LBSN ’12: Proceedings of the 5th ACM SIGSPATIAL International Workshop on Location-Based Social Networks, pp 32–39. https://doi.org/10.1145/2442796.2442805
Kotelly B (2003) Art and business of speech recognition: Creating the noble voice. Addison-Wesley Longman Publishing Co., Inc., USA
Larson RR (1996) Geographic information retrieval and spatial browsing. Geographic information systems and libraries: patrons, maps, and spatial information [papers presented at the 1995 Clinic on Library Applications of Data Processing, April 10-12, 1995]
Liu X, Zhang S, Wei F, Zhou M (2011) Recognizing named entities in tweets. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp 359–367. Association for Computational Linguistics, Portland, Oregon, USA. https://www.aclweb.org/anthology/P11-1037
Liu Y, Albanie S, Nagrani A, Zisserman A (2019) Use what you have: Video retrieval using representations from collaborative experts. arXiv preprint arXiv:1907.13487
Messina A, Borgotallo R, Dimino G, Gnota DA, Boch L (2008) Ants: A complete system for automatic news programme annotation based on multimodal analysis. In: 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services, pp 219–222
Miech A, Zhukov D, Alayrac J-B, Tapaswi M, Laptev I, Sivic J (2019) Howto100m: Learning a text-video embedding by watching hundred million narrated video clips. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2630–2640
Mithun NC, Li J, Metze F, Roy-Chowdhury AK (2018) Learning joint embedding with multimodal cues for cross-modal video-text retrieval. In: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, pp 19–27
Mithun NC, Paul S, Roy-Chowdhury AK (2019) Weakly supervised video moment retrieval from text queries. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 11592–11601
Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Linguisticae Investigationes 30(1):3–26. Publisher: John Benjamins Publishing Company
Nothman J, Ringland N, Radford W, Murphy T, Curran JR (2013) Learning multilingual named entity recognition from wikipedia. Artif Intell 194:151–175
Patel BV, Meshram BB (2012) Content based video retrieval systems. International Journal of UbiComp (IJU) 3(2):13–30
Purificato E, Rinaldi AM (2018) Multimedia and geographic data integration for cultural heritage information retrieval. Multimedia Tools and Applications 77(20):27447–27469
Purves RS, Clough P, Jones CB, Hall MH, Murdock V (2018) Geographic information retrieval: progress and challenges in spatial search of text. Foundations and Trends in Information Retrieval 12(2-3):164–318
Rae A, Kelm P (2012) Working notes for the placing task at mediaeval 2012. Santa Croce in Fossabanda, Pisa, Italy, October 4-5. MediaEval 2012 Working Notes Proceedings, available at http://ceur-ws.org/Vol-927/, pp 32–39
Raju N, Anita HB (2017) Text extraction from video images. Int J Appl Eng Res 12(24):14750–14754
Ritter A, Clark S, Etzioni O, et al. (2011) Named entity recognition in tweets: an experimental study. In: Proceedings of the conference on empirical methods in natural language processing, pp 1524–1534. Association for Computational Linguistics
Ritter A, Clark S, Mausam, Etzioni O (2011) Named entity recognition in tweets: An experimental study. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp 1524–1534. Association for Computational Linguistics. Edinburgh, Scotland, UK. https://www.aclweb.org/anthology/D11-1141
Snoek CGM, Worring M (2008) Concept-based video retrieval. Foundations and trends in information retrieval 2(4):215–322
Speranza M (2009) The named entity recognition task at evalita 2009. In: EVALITA 2009
Sundheim BM (1995) Overview of results of the MUC-6 evaluation. In: Sixth Message Understanding Conference (MUC-6): Proceedings of a Conference Held in Columbia, Maryland, November 6-8, 1995. https://www.aclweb.org/anthology/M95-1002
Tjong Kim Sang EF, De Meulder F (2003) Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pp 142–147. https://www.aclweb.org/anthology/W03-0419
Torabi A, Tandon N, Sigal L (2016) Learning language-visual embedding for movie understanding with natural-language. arXiv preprint arXiv:1609.08124
Ullo SL, Khare SK, Bajaj V, Sinha GR (2020) Hybrid computerized method for environmental sound classification. IEEE Access 8:124055–124065
Ullo SL, Sinha GR (2020) Advances in smart environment monitoring systems using iot and sensors. Sensors 20:3113. https://doi.org/10.3390/s20113113
Veltkamp RC, Burkhardt H, Kriegel H-P (2013) State-of-the-art in content-based image and video retrieval. Springer Science & Business Media, 22
Wilhelm-Stein T, Herms R, Ritter M, Eibl M (2014) Improving transcript-based video retrieval using unsupervised language model adaptation. In: Kanoulas E, Lupu M, Clough P, Sanderson M, Hall M, Hanbury A, Toms E (eds) Information Access Evaluation. Multilinguality, Multimodality, and Interaction, pp 110–115. Springer International Publishing, Cham
Xu R, Xiong C, Chen W, Corso JJ (2015) Jointly modeling deep video and compositional text to bridge vision and language in a unified framework. In: Twenty-Ninth AAAI Conference on Artificial Intelligence
Zheng Y-T, Zha Z-J, Chua T-S (2011) Research and applications on georeferenced multimedia: a survey. Multimedia Tools and Applications 51(1):77–98
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fallucchi, F., Di Stabile, R., Purificato, E. et al. Enriching videos with automatic place recognition in google maps. Multimed Tools Appl 81, 23105–23121 (2022). https://doi.org/10.1007/s11042-021-11253-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11253-9