research-article

Is the Reign of Interactive Search Eternal? Findings from the Video Browser Showdown 2020

Authors:
Jakub Lokoč

Charles University, Czech Republic

Charles University, Czech Republic

0000-0002-3558-4144
View Profile

,
Patrik Veselý

Charles University, Czech Republic

Charles University, Czech Republic
View Profile

,
František Mejzlík

Charles University, Czech Republic

Charles University, Czech Republic
View Profile

,
Gregor Kovalčík

Charles University, Czech Republic

Charles University, Czech Republic
View Profile

,
Tomáš Souček

Charles University, Czech Republic

Charles University, Czech Republic
View Profile

,
Luca Rossetto

University of Zurich, Switzerland

University of Zurich, Switzerland
View Profile

,
Klaus Schoeffmann

Klagenfurt University, Klagenfurt, Austria

Klagenfurt University, Klagenfurt, Austria
View Profile

,
Werner Bailer

Joanneum Research, Austria

Joanneum Research, Austria
View Profile

,
Cathal Gurrin

Dublin City University, Dublin, Ireland

Dublin City University, Dublin, Ireland
View Profile

,
Loris Sauter

University of Basel, Basel, Switzerland

University of Basel, Basel, Switzerland
View Profile

,
Jaeyub Song

Korea Advanced Institute of Science and Technology, Daejeon, South Korea

Korea Advanced Institute of Science and Technology, Daejeon, South Korea
View Profile

,
Stefanos Vrochidis

Information Technologies Institute, CERTH, Greece

Information Technologies Institute, CERTH, Greece
View Profile

,
Jiaxin Wu

City University of Hong Kong, Hong Kong, China

City University of Hong Kong, Hong Kong, China
View Profile

,
Björn þóR Jónsson

IT University of Copenhagen, Denmark

IT University of Copenhagen, Denmark
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17 Issue 3Article No.: 91pp 1–26https://doi.org/10.1145/3445031

Published:22 July 2021Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

Comprehensive and fair performance evaluation of information retrieval systems represents an essential task for the current information age. Whereas Cranfield-based evaluations with benchmark datasets support development of retrieval models, significant evaluation efforts are required also for user-oriented systems that try to boost performance with an interactive search approach. This article presents findings from the 9th Video Browser Showdown, a competition that focuses on a legitimate comparison of interactive search systems designed for challenging known-item search tasks over a large video collection. During previous installments of the competition, the interactive nature of participating systems was a key feature to satisfy known-item search needs, and this article continues to support this hypothesis. Despite the fact that top-performing systems integrate the most recent deep learning models into their retrieval process, interactive searching remains a necessary component of successful strategies for known-item search tasks. Alongside the description of competition settings, evaluated tasks, participating teams, and overall results, this article presents a detailed analysis of query logs collected by the top three performing systems, SOMHunter, VIRET, and vitrivr. The analysis provides a quantitative insight to the observed performance of the systems and constitutes a new baseline methodology for future events. The results reveal that the top two systems mostly relied on temporal queries before a correct frame was identified. An interaction log analysis complements the result log findings and points to the importance of result set and video browsing approaches. Finally, various outlooks are discussed in order to improve the Video Browser Showdown challenge in the future.

References

Stelios Andreadis, Anastasia Moumtzidou, Konstantinos Apostolidis, Konstantinos Gkountakos, Damianos Galanopoulos, Emmanouil Michail, Ilias Gialampoukidis, Stefanos Vrochidis, Vasileios Mezaris, and Ioannis Kompatsiaris. 2020. VERGE in VBS 2020. In MultiMedia Modeling, Yong Man Ro, Wen-Huang Cheng, Junmo Kim, Wei-Ta Chu, Peng Cui, Jung-Woo Choi, Min-Chun Hu, and Wesley De Neve (Eds.). Springer International Publishing, Cham, 778–783. Google Scholar
George Awad, Asad Butt, Keith Curtis, Yooyoung Lee, Jonathan Fiscus, Afzal Godil, Andrew Delgado, Alan F. Smeaton, Yvette Graham, Wessel Kraaij, and Georges Quénot. 2019. TRECVID 2019: An evaluation campaign to benchmark video activity detection, video captioning and matching, and video search & retrieval. In TRECVID 2019. NIST, USA. https://www.nist.gov/publications/trecvid-2019-evaluation-campaign-benchmark-video-activity-detection-video-captioning.Google Scholar
George Awad, Asad Butt, Jonathan Fiscus, Martial Michel, David Joy, Wessel Kraaij, Alan F. Smeaton, Georges Quénot, Maria Eskevich, Roeland Ordelman, Gareth J. F. Jones, and Benoit Huet. 2017. TRECVID 2017: Evaluating ad-hoc and instance video search, events detection, video captioning and hyperlinking. In TRECVID 2017. NIST, USA.Google Scholar
Fabian Berns, Luca Rossetto, Klaus Schoeffmann, Christian Beecks, and George Awad. 2019. V3c1 dataset: An evaluation of content characteristics. In Proceedings of the 2019 on International Conference on Multimedia Retrieval. 334–338. Google ScholarDigital Library
João Carreira, Eric Noland, Andras Banki-Horvath, Chloe Hillier, and Andrew Zisserman. 2018. A short note about kinetics-600. ArXiv abs/1808.01340 (2018).Google Scholar
Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. 2019. Hybrid task cascade for instance segmentation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19), 4969–4978.Google ScholarCross Ref
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the 15th European Conference on Computer Vision (ECCV’18). Munich, Germany.Google ScholarCross Ref
Claudiu Cobârzan, Klaus Schoeffmann, Werner Bailer, Wolfgang Hürst, Adam Blažek, Jakub Lokoč, Stefanos Vrochidis, Kai Uwe Barthel, and Luca Rossetto. 2017. Interactive video search tools: A detailed analysis of the video browser showdown 2015. Multimedia Tools and Applications 76, 4 (2017), 5539–5571. DOI:https://doi.org/10.1007/s11042-016-3661-2 Google ScholarDigital Library
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE, Las Vegas, NV, 3213–3223.Google ScholarCross Ref
Ingemar J. Cox, Matthew L. Miller, Thomas P. Minka, Thomas V. Papathomas, and Peter N. Yianilos. 2000. The bayesian image retrieval system, PicHunter: Theory, implementation, and psychophysical experiments. IEEE Transactions on Image Processing 9, 1 (2000), 20–37. Google ScholarDigital Library
Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai. 2018. PixelLink: Detecting scene text via instance segmentation. (2018). arxiv:cs.CV/1801.01315Google Scholar
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.Google ScholarCross Ref
Jianfeng Dong, Xirong Li, Chaoxi Xu, Shouling Ji, Yuan He, Gang Yang, and Xun Wang. 2019. Dual encoding for zero-example video retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9346–9355.Google ScholarCross Ref
Ling-Yu Duan, Jie Lin, Jie Chen, Tiejun Huang, and Wen Gao. 2014. Compact descriptors for visual search. IEEE MultiMedia 21, 3 (2014), 30–40.Google ScholarCross Ref
Ling-Yu Duan, Yihang Lou, Yan Bai, Tiejun Huang, Wen Gao, Vijay Chandrasekhar, Jie Lin, Shiqi Wang, and Alex Chichung Kot. 2018. Compact descriptors for video analysis: The emerging MPEG standard. IEEE MultiMedia 26, 2 (2018), 44–54.Google ScholarCross Ref
Mark Everingham, S.M. Ali Eslami, Luc Van Gool, Christopher K.I. Williams, John Winn, and Andrew Zisserman. 2015. The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision (IJCV) 111, 1 (2015), 98–136. Google ScholarDigital Library
Damianos Galanopoulos and Vasileios Mezaris. 2020. Attention mechanisms, signal encodings and fusion strategies for improved ad-hoc videosearch with dual encoding networks. In Proceedings of the 2020 ACM on International Conference on Multimedia Retrieval (ICMR’20). ACM. Google ScholarDigital Library
Ralph Gasser, Luca Rossetto, and Heiko Schuldt. 2019. Multimodal multimedia retrieval with Vitrivr. In Proceedings of the 2019 on International Conference on Multimedia Retrieval. 391–394. Google ScholarDigital Library
Ilias Gialampoukidis, Anastasia Moumtzidou, Dimitris Liparas, Stefanos Vrochidis, and Ioannis Kompatsiaris. 2016. A hybrid graph-based and non-linear late fusion approach for multimedia retrieval. In 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI’16). IEEE, 1–6.Google ScholarCross Ref
Konstantinos Gkountakos, Anastasios Dimou, Georgios Th Papadopoulos, and Petros Daras. 2019. Incorporating textual similarity in video captioning schemes. In 2019 IEEE International Conference on Engineering, Technology and Innovation (ICE/ITMC’19). IEEE, 1–6.Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770–778. DOI:https://doi.org/10.1109/CVPR.2016.90Google ScholarCross Ref
Peiyun Hu and Deva Ramanan. 2016. Finding tiny faces. CoRR abs/1612.04402 (2016). arxiv:1612.04402Google Scholar
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning. 448–456. Google ScholarDigital Library
Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (2010), 117–128. Google ScholarDigital Library
Björn þór Jónsson, Omar Shahbaz Khan, Dennis C. Koelma, Stevan Rudinac, Marcel Worring, and Jan Zahálka. 2020. Exquisitor at the video browser showdown 2020. In MultiMedia Modeling, Yong Man Ro, Wen-Huang Cheng, Junmo Kim, Wei-Ta Chu, Peng Cui, Jung-Woo Choi, Min-Chun Hu, and Wesley De Neve (Eds.). Springer International Publishing, Cham, 796–802. Google Scholar
Omar Shahbaz Khan, Björn Þór Jónsson, Stevan Rudinac, Jan Zahálka, Hanna Ragnarsdóttir, Þórhildur Þorleiksdóttir, Gylfi Þór Guðmundsson, Laurent Amsaleg, and Marcel Worring. 2020. Interactive learning for multimedia at large. In Proceedings of the European Conference on Information Retrieval (ECIR’20). Springer, Lisboa, Portugal, 16.Google Scholar
Miroslav Kratochvíl, Patrik Veselý, František Mejzlík, and Jakub Lokoč. 2020. SOM-Hunter: Video browsing with relevance-to-SOM feedback loop. In MultiMedia Modeling, Yong Man Ro, Wen-Huang Cheng, Junmo Kim, Wei-Ta Chu, Peng Cui, Jung-Woo Choi, Min-Chun Hu, and Wesley De Neve (Eds.). Springer International Publishing, Cham, 790–795. Google Scholar
Miroslav Kratochvíl, Patrik Veselý, František Mejzlík, and Jakub Lokoč. 2020. SOM-Hunter: Video browsing with relevance-to-SOM feedback loop. In International Conference on Multimedia Modeling. Springer, 790–795.Google ScholarCross Ref
Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper R. R. Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Tom Duerig, and Vittorio Ferrari. 2018. The open images dataset V4. International Journal of Computer Vision (2018), 1–26. https://link.springer.com/article/10.1007/s11263-020-01316-z.Google Scholar
Nguyen-Khang Le, Dieu-Hien Nguyen, and Minh-Triet Tran. 2020. An interactive video search platform for multi-modal retrieval with advanced concepts. In MultiMedia Modeling, Yong Man Ro, Wen-Huang Cheng, Junmo Kim, Wei-Ta Chu, Peng Cui, Jung-Woo Choi, Min-Chun Hu, and Wesley De Neve (Eds.). Springer International Publishing, Cham, 766–771. Google Scholar
Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, and Xiaodong He. 2018. Stacked cross attention for image-text matching. (2018). arxiv:cs.CV/1803.08024Google Scholar
Andreas Leibetseder, Bernd Münzer, Jürgen Primus, Sabrina Kletz, and Klaus Schoeffmann. 2020. diveXplore 4.0: The ITEC deep interactive video exploration system at VBS2020. In MultiMedia Modeling, Yong Man Ro, Wen-Huang Cheng, Junmo Kim, Wei-Ta Chu, Peng Cui, Jung-Woo Choi, Min-Chun Hu, and Wesley De Neve (Eds.). Springer International Publishing, Cham, 753–759. Google Scholar
Xirong Li, Chaoxi Xu, Gang Yang, Zhineng Chen, and Jianfeng Dong. 2019. W2VV++: Fully deep learning for ad-hoc video search. In Proceedings of the 27th ACM International Conference on Multimedia (MM’19). 1786–1794. DOI:https://doi.org/10.1145/3343031.3350906 Google ScholarDigital Library
Yuncheng Li, Yale Song, Liangliang Cao, Joel Tetreault, Larry Goldberg, Alejandro Jaimes, and Jiebo Luo. 2016. TGIF: A new dataset and benchmark on animated GIF description. In IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
Jakub Lokoč, Tomáš Souček, Patrik Veselý, František Mejzlík, Jiaqi Ji, Chaoxi Xu, and Xirong Li. 2020. A W2VV++ case study with automated and interactive text-to-video retrieval. In Proceedings of the 28th ACM International Conference on Multimedia (MM’20). Association for Computing Machinery, New York, NY. Google ScholarDigital Library
Jakub Lokoč, Werner Bailer, Klaus Schoeffmann, Bernd Münzer, and George Awad. 2018. On influential trends in interactive video retrieval: Video browser showdown 2015-2017. IEEE Transactions on Multimedia 20, 12 (2018), 3361–3376.Google ScholarDigital Library
Jakub Lokoč, Gregor Kovalčík, Bernd Münzer, Klaus Schöffmann, Werner Bailer, Ralph Gasser, Stefanos Vrochidis, Phuong Anh Nguyen, Sitapa Rujikietgumjorn, and Kai Uwe Barthel. 2019. Interactive search or sequential browsing? A detailed analysis of the video browser showdown 2018. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1, Article 29 (Feb. 2019), 18 pages. DOI:https://doi.org/10.1145/3295663 Google ScholarDigital Library
Jakub Lokoč, Gregor Kovalčík, and Tomáš Souček. 2020. VIRET at video browser showdown 2020. In MultiMedia Modeling - 26th International Conference (MMM’20), Proceedings, Part II (Lecture Notes in Computer Science), Yong Man Ro, Wen-Huang Cheng, Junmo Kim, Wei-Ta Chu, Peng Cui, Jung-Woo Choi, Min-Chun Hu, and Wesley De Neve (Eds.), Vol. 11962. Springer, 784–789. DOI:https://doi.org/10.1007/978-3-030-37734-2_70Google Scholar
Jakub Lokoč, Gregor Kovalčík, Tomáš Souček, Jaroslav Moravec, and Přemysl Čech. 2019. A framework for effective known-item search in video. In Proceedings of the 27th ACM International Conference on Multimedia (MM’19). ACM, New York, NY, 1777–1785. DOI:https://doi.org/10.1145/3343031.3351046 Google ScholarDigital Library
Jakub Lokoč, Gregor Kovalčík, Tomáš Souček, Jaroslav Moravec, and Přemysl Čech. 2019. VIRET: A video retrieval tool for interactive known-item search. In Proceedings of the 2019 on International Conference on Multimedia Retrieval (ICMR’19). ACM, New York, NY, 177–181. DOI:https://doi.org/10.1145/3323873.3325034 Google ScholarDigital Library
Bangalore S. Manjunath, Philippe Salembier, and Thomas Sikora. 2002. Introduction to MPEG-7: Multimedia Content Description Interface. John Wiley & Sons. Google ScholarDigital Library
Foteini Markatopoulou, Vasileios Mezaris, and Ioannis Patras. 2018. Implicit and explicit concept relations in deep neural networks for multi-label video/image annotation. IEEE Transactions on Circuits and Systems for Video Technology (2018). https://ieeexplore.ieee.org/document/8387768.Google Scholar
Pascal Mettes, Dennis C. Koelma, and Cees G.M. Snoek. 2016. The Imagenet shuffle: Reorganized pre-training for video event detection. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval (ICMR’16). ACM, New York, NY, 175–182. DOI:https://doi.org/10.1145/2911996.2912036 Google ScholarDigital Library
Pascal Mettes, Dennis C. Koelma, and Cees G.M. Snoek. 2020. Shuffled Imagenet banks for video event detection and search. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 16, 2 (2020), 1–21. Google ScholarDigital Library
Phuong Anh Nguyen, Yi-Jie Lu, Hao Zhang, and Chong-Wah Ngo. 2018. Enhanced VIREO KIS at VBS 2018. In MultiMedia Modeling. 407–412.Google Scholar
Phuong Anh Nguyen, Jiaxin Wu, Chong-Wah Ngo, Francis Danny, and Huet Benoit. 2019. VIREO-EURECOM @ TRECVID 2019: Ad-hoc video search. In NIST TRECVID Workshop.Google Scholar
Phuong Anh Nguyen, Jiaxin Wu, Chong-Wah Ngo, Danny Francis, and Benoit Huet. 2020. VIREO @ Video browser showdown 2020. In MultiMedia Modeling, Yong Man Ro, Wen-Huang Cheng, Junmo Kim, Wei-Ta Chu, Peng Cui, Jung-Woo Choi, Min-Chun Hu, and Wesley De Neve (Eds.). Springer International Publishing, Cham, 772–777. Google Scholar
Sungjune Park, Jaeyub Song, Minho Park, and Yong Man Ro. 2020. IVIST: Interactive video search tool in VBS 2020. In MultiMedia Modeling, Yong Man Ro, Wen-Huang Cheng, Junmo Kim, Wei-Ta Chu, Peng Cui, Jung-Woo Choi, Min-Chun Hu, and Wesley De Neve (Eds.). Springer International Publishing, Cham, 809–814. Google Scholar
Zhaofan Qiu, Ting Yao, and Tao Mei. 2017. Learning spatio-temporal representation with pseudo-3D residual networks. CoRR abs/1711.10305 (2017). arxiv:1711.10305Google Scholar
Joseph Redmon and Ali Farhadi. 2018. YOLO v3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).Google Scholar
Luca Rossetto, Werner Bailer, and Abraham Bernstein. 2021. Considering human perception and memory in interactive multimedia retrieval evaluations. In Proceedings of the 27th International Conference on MultiMedia Modeling.Google Scholar
Luca Rossetto, Ralph Gasser, Jakub Lokoč, Werner Bailer, Klaus Schoeffmann, Bernd Muenzer, Tomáš Souček, Phuong Anh Nguyen, Paolo Bolettieri, Andreas Leibetseder, and Stefanos Vrochidis. 2021. Interactive video retrieval in the age of deep learning - Detailed evaluation of VBS 2019. IEEE Transactions on Multimedia 23 (2021), 243–256. DOI:https://doi.org/10.1109/TMM.2020.2980944Google ScholarCross Ref
Luca Rossetto, Ralph Gasser, Loris Sauter, Abraham Bernstein, and Heiko Schuldt. 2021. A system for interactive multimedia retrieval evaluations. In Proceedings of the 27th International Conference on MultiMedia Modeling.Google Scholar
Luca Rossetto, Ivan Giangreco, and Heiko Schuldt. 2014. Cineast: A multi-feature sketch-based video retrieval engine. In 2014 IEEE International Symposium on Multimedia. 18–23. Google ScholarDigital Library
Luca Rossetto, Ivan Giangreco, Claudiu Tanase, and Heiko Schuldt. 2016. vitrivr: A flexible retrieval stack supporting multiple query modes for searching in multimedia collections. In Proceedings of the 24th ACM International Conference on Multimedia. 1183–1186. Google ScholarDigital Library
Luca Rossetto, Mahnaz Amiri Parian, Ralph Gasser, Ivan Giangreco, Silvan Heller, and Heiko Schuldt. 2019. Deep learning-based concept detection in vitrivr. In MultiMedia Modeling - 25th International Conference (MMM’19), Proceedings, Part II. 616–621. DOI:https://doi.org/10.1007/978-3-030-05716-9_55Google Scholar
Luca Rossetto, Heiko Schuldt, George Awad, and Asad A. Butt. 2019. V3C - A research video collection. In MultiMedia Modeling - 25th International Conference (MMM’19), Proceedings, Part I. 349–360. DOI:https://doi.org/10.1007/978-3-030-05710-7_29Google Scholar
Loris Sauter, Mahnaz Amiri Parian, Ralph Gasser, Silvan Heller, Luca Rossetto, and Heiko Schuldt. 2020. Combining Boolean and multimedia retrieval in vitrivr for large-scale video search. In MultiMedia Modeling, Yong Man Ro, Wen-Huang Cheng, Junmo Kim, Wei-Ta Chu, Peng Cui, Jung-Woo Choi, Min-Chun Hu, and Wesley De Neve (Eds.). Springer International Publishing, Cham, 760–765. Google Scholar
Klaus Schoeffmann. 2019. Video browser showdown 2012-2019: A review. In 2019 International Conference on Content-Based Multimedia Indexing (CBMI’19). 1–4. DOI:https://doi.org/10.1109/CBMI.2019.8877397Google ScholarCross Ref
Klaus Schoeffmann, Bernd Münzer, Andreas Leibetseder, Jürgen Primus, and Sabrina Kletz. 2019. Autopiloting feature maps: The deep interactive video exploration (diveXplore) System at VBS2019. In MultiMedia Modeling. Springer International Publishing, Cham, 585–590. Google Scholar
Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2019. ASTER: An attentional scene text recognizer with flexible rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 9 (2019), 2035–2048.Google ScholarCross Ref
Ray Smith. 2007. An overview of the Tesseract OCR engine. In 9th International Conference on Document Analysis and Recognition (ICDAR’07), Vol. 2. IEEE, 629–633. Google ScholarDigital Library
Tomáš Souček, Jaroslav Moravec, and Jakub Lokoč. 2019. TransNet: A deep network for fast detection of common shot transitions. CoRR abs/1906.03363 (2019). arxiv:1906.03363Google Scholar
Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2017. Show and tell: Lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4 (2017), 652–663. Google ScholarDigital Library
Saining Xie, Ross B. Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Honolulu, HI, 5987–5995.Google ScholarCross Ref
Dejing Xu, Jun Xiao, Zhou Zhao, Jian Shao, Di Xie, and Yueting Zhuang. 2019. Self-supervised spatiotemporal learning via video clip order prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10334–10343.Google ScholarCross Ref
Jun Xu, Tao Mei, Ting Yao, and Yong Rui. 2016. MSR-VTT: A large video description dataset for bridging video and language. In IEEE International Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
Xun Yang, Jianfeng Dong, Yixin Cao, Xun Wang, Meng Wang, and Tat-Seng Chua. 2020. Tree-augmented cross-modal encoding for complex-query video retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1339–1348. Google ScholarDigital Library
Liangli Zhen, Peng Hu, Xu Wang, and Dezhong Peng. 2019. Deep supervised cross-modal retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10394–10403.Google ScholarCross Ref
Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. 2014. Learning deep features for scene recognition using places database. In Proceedings of the International Conference on Neural Information. 487–495. Google ScholarDigital Library
Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene parsing through ade20k dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17), Vol. 1. IEEE, Honolulu, HI, 4.Google ScholarCross Ref
Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. EAST: An efficient and accurate scene text detector. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 2642–2651. DOI:https://doi.org/10.1109/CVPR.2017.283Google ScholarCross Ref

Index Terms

Is the Reign of Interactive Search Eternal? Findings from the Video Browser Showdown 2020
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Video search

Recommendations

A Framework for Effective Known-item Search in Video
MM '19: Proceedings of the 27th ACM International Conference on Multimedia

Searching for one particular scene in a large video collection (known-item search) represents a challenging task for video retrieval systems. According to the recent results reached at evaluation campaigns, even respected approaches based on machine ...
Read More
What Is the Role of Similarity for Known-Item Search at Video Browser Showdown?
Similarity Search and Applications
Abstract
Across many domains, machine learning approaches start to compete with human experts in tasks originally considered as very difficult for automation. However, effective retrieval of general video shots still represents an issue due to their ...
Read More
Interactive Video Search
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

With an increasing amount of video data in our daily life, the need for content-based search in videos increases as well. Though a lot of research has been spent on video retrieval tools and methods which allow for automatic search in videos through ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17, Issue 3
August 2021
443 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3476118
Editor:
Alberto Del Bimbo
University of Firenze, Italy
Issue’s Table of Contents
Copyright © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 July 2021
- Accepted: 1 December 2020
- Revised: 1 October 2020
- Received: 1 April 2020
Published in tomm Volume 17, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Interactive video retrieval
deep learning
interactive search evaluation
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 30
  Total Citations
  View Citations
- 209
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Is the Reign of Interactive Search Eternal? Findings from the Video Browser Showdown 2020

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

References

Cited By

Index Terms

Recommendations

A Framework for Effective Known-item Search in Video

What Is the Role of Similarity for Known-Item Search at Video Browser Showdown?

Interactive Video Search