skip to main content
research-article

Is the Reign of Interactive Search Eternal? Findings from the Video Browser Showdown 2020

Authors Info & Claims
Published:22 July 2021Publication History
Skip Abstract Section

Abstract

Comprehensive and fair performance evaluation of information retrieval systems represents an essential task for the current information age. Whereas Cranfield-based evaluations with benchmark datasets support development of retrieval models, significant evaluation efforts are required also for user-oriented systems that try to boost performance with an interactive search approach. This article presents findings from the 9th Video Browser Showdown, a competition that focuses on a legitimate comparison of interactive search systems designed for challenging known-item search tasks over a large video collection. During previous installments of the competition, the interactive nature of participating systems was a key feature to satisfy known-item search needs, and this article continues to support this hypothesis. Despite the fact that top-performing systems integrate the most recent deep learning models into their retrieval process, interactive searching remains a necessary component of successful strategies for known-item search tasks. Alongside the description of competition settings, evaluated tasks, participating teams, and overall results, this article presents a detailed analysis of query logs collected by the top three performing systems, SOMHunter, VIRET, and vitrivr. The analysis provides a quantitative insight to the observed performance of the systems and constitutes a new baseline methodology for future events. The results reveal that the top two systems mostly relied on temporal queries before a correct frame was identified. An interaction log analysis complements the result log findings and points to the importance of result set and video browsing approaches. Finally, various outlooks are discussed in order to improve the Video Browser Showdown challenge in the future.

References

  1. Stelios Andreadis, Anastasia Moumtzidou, Konstantinos Apostolidis, Konstantinos Gkountakos, Damianos Galanopoulos, Emmanouil Michail, Ilias Gialampoukidis, Stefanos Vrochidis, Vasileios Mezaris, and Ioannis Kompatsiaris. 2020. VERGE in VBS 2020. In MultiMedia Modeling, Yong Man Ro, Wen-Huang Cheng, Junmo Kim, Wei-Ta Chu, Peng Cui, Jung-Woo Choi, Min-Chun Hu, and Wesley De Neve (Eds.). Springer International Publishing, Cham, 778–783. Google ScholarGoogle Scholar
  2. George Awad, Asad Butt, Keith Curtis, Yooyoung Lee, Jonathan Fiscus, Afzal Godil, Andrew Delgado, Alan F. Smeaton, Yvette Graham, Wessel Kraaij, and Georges Quénot. 2019. TRECVID 2019: An evaluation campaign to benchmark video activity detection, video captioning and matching, and video search & retrieval. In TRECVID 2019. NIST, USA. https://www.nist.gov/publications/trecvid-2019-evaluation-campaign-benchmark-video-activity-detection-video-captioning.Google ScholarGoogle Scholar
  3. George Awad, Asad Butt, Jonathan Fiscus, Martial Michel, David Joy, Wessel Kraaij, Alan F. Smeaton, Georges Quénot, Maria Eskevich, Roeland Ordelman, Gareth J. F. Jones, and Benoit Huet. 2017. TRECVID 2017: Evaluating ad-hoc and instance video search, events detection, video captioning and hyperlinking. In TRECVID 2017. NIST, USA.Google ScholarGoogle Scholar
  4. Fabian Berns, Luca Rossetto, Klaus Schoeffmann, Christian Beecks, and George Awad. 2019. V3c1 dataset: An evaluation of content characteristics. In Proceedings of the 2019 on International Conference on Multimedia Retrieval. 334–338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. João Carreira, Eric Noland, Andras Banki-Horvath, Chloe Hillier, and Andrew Zisserman. 2018. A short note about kinetics-600. ArXiv abs/1808.01340 (2018).Google ScholarGoogle Scholar
  6. Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. 2019. Hybrid task cascade for instance segmentation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19), 4969–4978.Google ScholarGoogle ScholarCross RefCross Ref
  7. Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the 15th European Conference on Computer Vision (ECCV’18). Munich, Germany.Google ScholarGoogle ScholarCross RefCross Ref
  8. Claudiu Cobârzan, Klaus Schoeffmann, Werner Bailer, Wolfgang Hürst, Adam Blažek, Jakub Lokoč, Stefanos Vrochidis, Kai Uwe Barthel, and Luca Rossetto. 2017. Interactive video search tools: A detailed analysis of the video browser showdown 2015. Multimedia Tools and Applications 76, 4 (2017), 5539–5571. DOI:https://doi.org/10.1007/s11042-016-3661-2 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE, Las Vegas, NV, 3213–3223.Google ScholarGoogle ScholarCross RefCross Ref
  10. Ingemar J. Cox, Matthew L. Miller, Thomas P. Minka, Thomas V. Papathomas, and Peter N. Yianilos. 2000. The bayesian image retrieval system, PicHunter: Theory, implementation, and psychophysical experiments. IEEE Transactions on Image Processing 9, 1 (2000), 20–37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai. 2018. PixelLink: Detecting scene text via instance segmentation. (2018). arxiv:cs.CV/1801.01315Google ScholarGoogle Scholar
  12. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.Google ScholarGoogle ScholarCross RefCross Ref
  13. Jianfeng Dong, Xirong Li, Chaoxi Xu, Shouling Ji, Yuan He, Gang Yang, and Xun Wang. 2019. Dual encoding for zero-example video retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9346–9355.Google ScholarGoogle ScholarCross RefCross Ref
  14. Ling-Yu Duan, Jie Lin, Jie Chen, Tiejun Huang, and Wen Gao. 2014. Compact descriptors for visual search. IEEE MultiMedia 21, 3 (2014), 30–40.Google ScholarGoogle ScholarCross RefCross Ref
  15. Ling-Yu Duan, Yihang Lou, Yan Bai, Tiejun Huang, Wen Gao, Vijay Chandrasekhar, Jie Lin, Shiqi Wang, and Alex Chichung Kot. 2018. Compact descriptors for video analysis: The emerging MPEG standard. IEEE MultiMedia 26, 2 (2018), 44–54.Google ScholarGoogle ScholarCross RefCross Ref
  16. Mark Everingham, S.M. Ali Eslami, Luc Van Gool, Christopher K.I. Williams, John Winn, and Andrew Zisserman. 2015. The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision (IJCV) 111, 1 (2015), 98–136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Damianos Galanopoulos and Vasileios Mezaris. 2020. Attention mechanisms, signal encodings and fusion strategies for improved ad-hoc videosearch with dual encoding networks. In Proceedings of the 2020 ACM on International Conference on Multimedia Retrieval (ICMR’20). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ralph Gasser, Luca Rossetto, and Heiko Schuldt. 2019. Multimodal multimedia retrieval with Vitrivr. In Proceedings of the 2019 on International Conference on Multimedia Retrieval. 391–394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ilias Gialampoukidis, Anastasia Moumtzidou, Dimitris Liparas, Stefanos Vrochidis, and Ioannis Kompatsiaris. 2016. A hybrid graph-based and non-linear late fusion approach for multimedia retrieval. In 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI’16). IEEE, 1–6.Google ScholarGoogle ScholarCross RefCross Ref
  20. Konstantinos Gkountakos, Anastasios Dimou, Georgios Th Papadopoulos, and Petros Daras. 2019. Incorporating textual similarity in video captioning schemes. In 2019 IEEE International Conference on Engineering, Technology and Innovation (ICE/ITMC’19). IEEE, 1–6.Google ScholarGoogle ScholarCross RefCross Ref
  21. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770–778. DOI:https://doi.org/10.1109/CVPR.2016.90Google ScholarGoogle ScholarCross RefCross Ref
  22. Peiyun Hu and Deva Ramanan. 2016. Finding tiny faces. CoRR abs/1612.04402 (2016). arxiv:1612.04402Google ScholarGoogle Scholar
  23. Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning. 448–456. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (2010), 117–128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Björn þór Jónsson, Omar Shahbaz Khan, Dennis C. Koelma, Stevan Rudinac, Marcel Worring, and Jan Zahálka. 2020. Exquisitor at the video browser showdown 2020. In MultiMedia Modeling, Yong Man Ro, Wen-Huang Cheng, Junmo Kim, Wei-Ta Chu, Peng Cui, Jung-Woo Choi, Min-Chun Hu, and Wesley De Neve (Eds.). Springer International Publishing, Cham, 796–802. Google ScholarGoogle Scholar
  26. Omar Shahbaz Khan, Björn Þór Jónsson, Stevan Rudinac, Jan Zahálka, Hanna Ragnarsdóttir, Þórhildur Þorleiksdóttir, Gylfi Þór Guðmundsson, Laurent Amsaleg, and Marcel Worring. 2020. Interactive learning for multimedia at large. In Proceedings of the European Conference on Information Retrieval (ECIR’20). Springer, Lisboa, Portugal, 16.Google ScholarGoogle Scholar
  27. Miroslav Kratochvíl, Patrik Veselý, František Mejzlík, and Jakub Lokoč. 2020. SOM-Hunter: Video browsing with relevance-to-SOM feedback loop. In MultiMedia Modeling, Yong Man Ro, Wen-Huang Cheng, Junmo Kim, Wei-Ta Chu, Peng Cui, Jung-Woo Choi, Min-Chun Hu, and Wesley De Neve (Eds.). Springer International Publishing, Cham, 790–795. Google ScholarGoogle Scholar
  28. Miroslav Kratochvíl, Patrik Veselý, František Mejzlík, and Jakub Lokoč. 2020. SOM-Hunter: Video browsing with relevance-to-SOM feedback loop. In International Conference on Multimedia Modeling. Springer, 790–795.Google ScholarGoogle ScholarCross RefCross Ref
  29. Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper R. R. Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Tom Duerig, and Vittorio Ferrari. 2018. The open images dataset V4. International Journal of Computer Vision (2018), 1–26. https://link.springer.com/article/10.1007/s11263-020-01316-z.Google ScholarGoogle Scholar
  30. Nguyen-Khang Le, Dieu-Hien Nguyen, and Minh-Triet Tran. 2020. An interactive video search platform for multi-modal retrieval with advanced concepts. In MultiMedia Modeling, Yong Man Ro, Wen-Huang Cheng, Junmo Kim, Wei-Ta Chu, Peng Cui, Jung-Woo Choi, Min-Chun Hu, and Wesley De Neve (Eds.). Springer International Publishing, Cham, 766–771. Google ScholarGoogle Scholar
  31. Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, and Xiaodong He. 2018. Stacked cross attention for image-text matching. (2018). arxiv:cs.CV/1803.08024Google ScholarGoogle Scholar
  32. Andreas Leibetseder, Bernd Münzer, Jürgen Primus, Sabrina Kletz, and Klaus Schoeffmann. 2020. diveXplore 4.0: The ITEC deep interactive video exploration system at VBS2020. In MultiMedia Modeling, Yong Man Ro, Wen-Huang Cheng, Junmo Kim, Wei-Ta Chu, Peng Cui, Jung-Woo Choi, Min-Chun Hu, and Wesley De Neve (Eds.). Springer International Publishing, Cham, 753–759. Google ScholarGoogle Scholar
  33. Xirong Li, Chaoxi Xu, Gang Yang, Zhineng Chen, and Jianfeng Dong. 2019. W2VV++: Fully deep learning for ad-hoc video search. In Proceedings of the 27th ACM International Conference on Multimedia (MM’19). 1786–1794. DOI:https://doi.org/10.1145/3343031.3350906 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yuncheng Li, Yale Song, Liangliang Cao, Joel Tetreault, Larry Goldberg, Alejandro Jaimes, and Jiebo Luo. 2016. TGIF: A new dataset and benchmark on animated GIF description. In IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  35. Jakub Lokoč, Tomáš Souček, Patrik Veselý, František Mejzlík, Jiaqi Ji, Chaoxi Xu, and Xirong Li. 2020. A W2VV++ case study with automated and interactive text-to-video retrieval. In Proceedings of the 28th ACM International Conference on Multimedia (MM’20). Association for Computing Machinery, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jakub Lokoč, Werner Bailer, Klaus Schoeffmann, Bernd Münzer, and George Awad. 2018. On influential trends in interactive video retrieval: Video browser showdown 2015-2017. IEEE Transactions on Multimedia 20, 12 (2018), 3361–3376.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jakub Lokoč, Gregor Kovalčík, Bernd Münzer, Klaus Schöffmann, Werner Bailer, Ralph Gasser, Stefanos Vrochidis, Phuong Anh Nguyen, Sitapa Rujikietgumjorn, and Kai Uwe Barthel. 2019. Interactive search or sequential browsing? A detailed analysis of the video browser showdown 2018. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1, Article 29 (Feb. 2019), 18 pages. DOI:https://doi.org/10.1145/3295663 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jakub Lokoč, Gregor Kovalčík, and Tomáš Souček. 2020. VIRET at video browser showdown 2020. In MultiMedia Modeling - 26th International Conference (MMM’20), Proceedings, Part II (Lecture Notes in Computer Science), Yong Man Ro, Wen-Huang Cheng, Junmo Kim, Wei-Ta Chu, Peng Cui, Jung-Woo Choi, Min-Chun Hu, and Wesley De Neve (Eds.), Vol. 11962. Springer, 784–789. DOI:https://doi.org/10.1007/978-3-030-37734-2_70Google ScholarGoogle Scholar
  39. Jakub Lokoč, Gregor Kovalčík, Tomáš Souček, Jaroslav Moravec, and Přemysl Čech. 2019. A framework for effective known-item search in video. In Proceedings of the 27th ACM International Conference on Multimedia (MM’19). ACM, New York, NY, 1777–1785. DOI:https://doi.org/10.1145/3343031.3351046 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Jakub Lokoč, Gregor Kovalčík, Tomáš Souček, Jaroslav Moravec, and Přemysl Čech. 2019. VIRET: A video retrieval tool for interactive known-item search. In Proceedings of the 2019 on International Conference on Multimedia Retrieval (ICMR’19). ACM, New York, NY, 177–181. DOI:https://doi.org/10.1145/3323873.3325034 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Bangalore S. Manjunath, Philippe Salembier, and Thomas Sikora. 2002. Introduction to MPEG-7: Multimedia Content Description Interface. John Wiley & Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Foteini Markatopoulou, Vasileios Mezaris, and Ioannis Patras. 2018. Implicit and explicit concept relations in deep neural networks for multi-label video/image annotation. IEEE Transactions on Circuits and Systems for Video Technology (2018). https://ieeexplore.ieee.org/document/8387768.Google ScholarGoogle Scholar
  43. Pascal Mettes, Dennis C. Koelma, and Cees G.M. Snoek. 2016. The Imagenet shuffle: Reorganized pre-training for video event detection. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval (ICMR’16). ACM, New York, NY, 175–182. DOI:https://doi.org/10.1145/2911996.2912036 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Pascal Mettes, Dennis C. Koelma, and Cees G.M. Snoek. 2020. Shuffled Imagenet banks for video event detection and search. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 16, 2 (2020), 1–21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Phuong Anh Nguyen, Yi-Jie Lu, Hao Zhang, and Chong-Wah Ngo. 2018. Enhanced VIREO KIS at VBS 2018. In MultiMedia Modeling. 407–412.Google ScholarGoogle Scholar
  46. Phuong Anh Nguyen, Jiaxin Wu, Chong-Wah Ngo, Francis Danny, and Huet Benoit. 2019. VIREO-EURECOM @ TRECVID 2019: Ad-hoc video search. In NIST TRECVID Workshop.Google ScholarGoogle Scholar
  47. Phuong Anh Nguyen, Jiaxin Wu, Chong-Wah Ngo, Danny Francis, and Benoit Huet. 2020. VIREO @ Video browser showdown 2020. In MultiMedia Modeling, Yong Man Ro, Wen-Huang Cheng, Junmo Kim, Wei-Ta Chu, Peng Cui, Jung-Woo Choi, Min-Chun Hu, and Wesley De Neve (Eds.). Springer International Publishing, Cham, 772–777. Google ScholarGoogle Scholar
  48. Sungjune Park, Jaeyub Song, Minho Park, and Yong Man Ro. 2020. IVIST: Interactive video search tool in VBS 2020. In MultiMedia Modeling, Yong Man Ro, Wen-Huang Cheng, Junmo Kim, Wei-Ta Chu, Peng Cui, Jung-Woo Choi, Min-Chun Hu, and Wesley De Neve (Eds.). Springer International Publishing, Cham, 809–814. Google ScholarGoogle Scholar
  49. Zhaofan Qiu, Ting Yao, and Tao Mei. 2017. Learning spatio-temporal representation with pseudo-3D residual networks. CoRR abs/1711.10305 (2017). arxiv:1711.10305Google ScholarGoogle Scholar
  50. Joseph Redmon and Ali Farhadi. 2018. YOLO v3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).Google ScholarGoogle Scholar
  51. Luca Rossetto, Werner Bailer, and Abraham Bernstein. 2021. Considering human perception and memory in interactive multimedia retrieval evaluations. In Proceedings of the 27th International Conference on MultiMedia Modeling.Google ScholarGoogle Scholar
  52. Luca Rossetto, Ralph Gasser, Jakub Lokoč, Werner Bailer, Klaus Schoeffmann, Bernd Muenzer, Tomáš Souček, Phuong Anh Nguyen, Paolo Bolettieri, Andreas Leibetseder, and Stefanos Vrochidis. 2021. Interactive video retrieval in the age of deep learning - Detailed evaluation of VBS 2019. IEEE Transactions on Multimedia 23 (2021), 243–256. DOI:https://doi.org/10.1109/TMM.2020.2980944Google ScholarGoogle ScholarCross RefCross Ref
  53. Luca Rossetto, Ralph Gasser, Loris Sauter, Abraham Bernstein, and Heiko Schuldt. 2021. A system for interactive multimedia retrieval evaluations. In Proceedings of the 27th International Conference on MultiMedia Modeling.Google ScholarGoogle Scholar
  54. Luca Rossetto, Ivan Giangreco, and Heiko Schuldt. 2014. Cineast: A multi-feature sketch-based video retrieval engine. In 2014 IEEE International Symposium on Multimedia. 18–23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Luca Rossetto, Ivan Giangreco, Claudiu Tanase, and Heiko Schuldt. 2016. vitrivr: A flexible retrieval stack supporting multiple query modes for searching in multimedia collections. In Proceedings of the 24th ACM International Conference on Multimedia. 1183–1186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Luca Rossetto, Mahnaz Amiri Parian, Ralph Gasser, Ivan Giangreco, Silvan Heller, and Heiko Schuldt. 2019. Deep learning-based concept detection in vitrivr. In MultiMedia Modeling - 25th International Conference (MMM’19), Proceedings, Part II. 616–621. DOI:https://doi.org/10.1007/978-3-030-05716-9_55Google ScholarGoogle Scholar
  57. Luca Rossetto, Heiko Schuldt, George Awad, and Asad A. Butt. 2019. V3C - A research video collection. In MultiMedia Modeling - 25th International Conference (MMM’19), Proceedings, Part I. 349–360. DOI:https://doi.org/10.1007/978-3-030-05710-7_29Google ScholarGoogle Scholar
  58. Loris Sauter, Mahnaz Amiri Parian, Ralph Gasser, Silvan Heller, Luca Rossetto, and Heiko Schuldt. 2020. Combining Boolean and multimedia retrieval in vitrivr for large-scale video search. In MultiMedia Modeling, Yong Man Ro, Wen-Huang Cheng, Junmo Kim, Wei-Ta Chu, Peng Cui, Jung-Woo Choi, Min-Chun Hu, and Wesley De Neve (Eds.). Springer International Publishing, Cham, 760–765. Google ScholarGoogle Scholar
  59. Klaus Schoeffmann. 2019. Video browser showdown 2012-2019: A review. In 2019 International Conference on Content-Based Multimedia Indexing (CBMI’19). 1–4. DOI:https://doi.org/10.1109/CBMI.2019.8877397Google ScholarGoogle ScholarCross RefCross Ref
  60. Klaus Schoeffmann, Bernd Münzer, Andreas Leibetseder, Jürgen Primus, and Sabrina Kletz. 2019. Autopiloting feature maps: The deep interactive video exploration (diveXplore) System at VBS2019. In MultiMedia Modeling. Springer International Publishing, Cham, 585–590. Google ScholarGoogle Scholar
  61. Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2019. ASTER: An attentional scene text recognizer with flexible rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 9 (2019), 2035–2048.Google ScholarGoogle ScholarCross RefCross Ref
  62. Ray Smith. 2007. An overview of the Tesseract OCR engine. In 9th International Conference on Document Analysis and Recognition (ICDAR’07), Vol. 2. IEEE, 629–633. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Tomáš Souček, Jaroslav Moravec, and Jakub Lokoč. 2019. TransNet: A deep network for fast detection of common shot transitions. CoRR abs/1906.03363 (2019). arxiv:1906.03363Google ScholarGoogle Scholar
  64. Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2017. Show and tell: Lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4 (2017), 652–663. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Saining Xie, Ross B. Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Honolulu, HI, 5987–5995.Google ScholarGoogle ScholarCross RefCross Ref
  66. Dejing Xu, Jun Xiao, Zhou Zhao, Jian Shao, Di Xie, and Yueting Zhuang. 2019. Self-supervised spatiotemporal learning via video clip order prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10334–10343.Google ScholarGoogle ScholarCross RefCross Ref
  67. Jun Xu, Tao Mei, Ting Yao, and Yong Rui. 2016. MSR-VTT: A large video description dataset for bridging video and language. In IEEE International Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  68. Xun Yang, Jianfeng Dong, Yixin Cao, Xun Wang, Meng Wang, and Tat-Seng Chua. 2020. Tree-augmented cross-modal encoding for complex-query video retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1339–1348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Liangli Zhen, Peng Hu, Xu Wang, and Dezhong Peng. 2019. Deep supervised cross-modal retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10394–10403.Google ScholarGoogle ScholarCross RefCross Ref
  70. Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. 2014. Learning deep features for scene recognition using places database. In Proceedings of the International Conference on Neural Information. 487–495. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene parsing through ade20k dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17), Vol. 1. IEEE, Honolulu, HI, 4.Google ScholarGoogle ScholarCross RefCross Ref
  72. Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. EAST: An efficient and accurate scene text detector. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 2642–2651. DOI:https://doi.org/10.1109/CVPR.2017.283Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Is the Reign of Interactive Search Eternal? Findings from the Video Browser Showdown 2020

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 3
      August 2021
      443 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3476118
      Issue’s Table of Contents

      Copyright © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 July 2021
      • Accepted: 1 December 2020
      • Revised: 1 October 2020
      • Received: 1 April 2020
      Published in tomm Volume 17, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format