Abstract
Manual processing of a large volume of video data captured through closed-circuit television is challenging due to various reasons. First, manual analysis is highly time-consuming. Moreover, as surveillance videos are recorded in dynamic conditions such as in the presence of camera motion, varying illumination, or occlusion, conventional supervised learning may not work always. Thus, computer vision-based automatic surveillance scene analysis is carried out in unsupervised ways. Topic modelling is one of the emerging fields used in unsupervised information processing. Topic modelling is used in text analysis, computer vision applications, and other areas involving spatio-temporal data. In this article, we discuss the scope, variations, and applications of topic modelling, particularly focusing on surveillance video analysis. We have provided a methodological survey on existing topic models, their features, underlying representations, characterization, and applications in visual surveillance’s perspective. Important research papers related to topic modelling in visual surveillance have been summarized and critically analyzed in this article.
- Parvin Ahmadi, Iman Gholampour, and Mahmoud Tabandeh. 2017. A new two-stage topic model-based framework for modeling traffic motion patterns. In Proceedings of the Iranian Conference on Machine Vision and Image Processing. IEEE, 276–280.Google ScholarCross Ref
- Parvin Ahmadi, Iman Gholampour, and Mahmoud Tabandeh. 2018. Employing topical relations in semantic analysis of traffic videos. IEEE Intell. Syst. 34, 1 (2018), 3–13.Google ScholarDigital Library
- John Aitchison and C. H. Ho. 1989. The multivariate poisson-log normal distribution. Biometrika 76, 4 (1989), 643–653.Google ScholarCross Ref
- Manal Al Ghamdi and Yoshihiko Gotoh. 2020. Graph-based topic models for trajectory clustering in crowd videos. Mach. Vision Appl. 31, 5 (2020), 1–13.Google Scholar
- Rubayyi Alghamdi and Khalid Alfalqi. 2015. A survey of topic modeling in text mining. Int. J. Adv. Comput. Sci. Appl. 6, 1 (2015), 147–153.Google Scholar
- Saad Ali and Mubarak Shah. 2007. A lagrangian particle dynamics approach for crowd flow segmentation and stability analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1–6.Google ScholarCross Ref
- Vahid Bastani, Lucio Marcenaro, and Carlo S. Regazzoni. 2016. Online nonparametric bayesian activity mining and analysis from surveillance video. IEEE Trans. Image Process. 25, 5 (2016), 2089–2102.Google ScholarDigital Library
- Ben Benfold and Ian Reid. 2011. Stable multi-target tracking in real-time surveillance video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3457–3464.Google ScholarDigital Library
- David Blei, Lawrence Carin, and David Dunson. 2010. Probabilistic topic models. IEEE Signal Process. Mag. 27, 6 (2010), 55–65.Google ScholarCross Ref
- David M. Blei. 2012. Probabilistic topic models. Commun. ACM 55, 4 (2012), 77–84.Google ScholarDigital Library
- David M. Blei and John D. Lafferty. 2006. Dynamic topic models. In Proceedings of the International Conference on Machine learning. ACM, 113–120.Google Scholar
- David M. Blei and John D. Lafferty. 2007. A correlated topic model of science. Ann. Appl. Stat. (2007), 17–35.Google Scholar
- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2002. Latent dirichlet allocation. In Advances in Neural Information Processing Systems. MIT Press, 601–608.Google Scholar
- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan.2003), 993–1022.Google Scholar
- Jordan L. Boyd-Graber and David M. Blei. 2009. Syntactic topic models. In Advances in Neural Information Processing Systems. MIT Press, 185–192.Google ScholarDigital Library
- Xiao-Qin Cao and Zhi-Qiang Liu. 2015. Type-2 fuzzy topic models for human action recognition. IEEE Trans. Fuzzy Syst. 23, 5 (2015), 1581–1593.Google ScholarDigital Library
- Ziqiang Cao, Sujian Li, Yang Liu, Wenjie Li, and Heng Ji. 2015. A novel neural topic model and its supervised extension. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
- Baitong Chen, Satoshi Tsutsui, Ying Ding, and Feicheng Ma. 2017. Understanding the topic evolution in a scientific domain: An exploratory study for the field of information retrieval. J. Informetr. 11, 4 (2017), 1175–1189.Google ScholarCross Ref
- Qian Chen, Ni Ai, Jie Liao, Xin Shao, Yufeng Liu, and Xiaohui Fan. 2017. Revealing topics and their evolution in biomedical literature using Bio-DTM: A case study of ginseng. Chinese Med. 12, 1 (2017), 27.Google ScholarCross Ref
- Shizhe Chen, Jia Chen, and Qin Jin. 2017. Generating video descriptions with topic guidance. In Proceedings of the ACM International Conference on Multimedia Retrieval. 5–13.Google ScholarDigital Library
- Shizhe Chen, Qin Jin, Jia Chen, and Alexander G. Hauptmann. 2019. Generating video descriptions with latent topic guidance. IEEE Trans. Multimedia 21, 9 (2019), 2407–2418.Google ScholarCross Ref
- Yu Chen, Tom Diethe, and Peter Flach. 2016. ™: A topic model for discovery of activities of daily living in a smart home. In Proceedings of the IJCAI. 1404–1410.Google Scholar
- Yuhao Chen, Ming Yang, Chunxiang Wang, and Bing Wang. 2019. 3D semantic modelling with label correction for extensive outdoor scene. In Proceedings of the IEEE Intelligent Vehicles Symposium. IEEE, 1262–1267.Google ScholarCross Ref
- Jen-Tzung Chien and Meng-Sung Wu. 2008. Adaptive bayesian latent semantic analysis. IEEE Trans. Audio, Speech, Language Process. 16, 1 (2008), 198–207.Google ScholarDigital Library
- Ayesha Choudhary, Manish Pal, Subhashis Banerjee, and Santanu Chaudhury. 2008. Unusual activity analysis using video epitomes and pLSA. In Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing. IEEE, 390–397.Google ScholarDigital Library
- Pradipto Das, Chenliang Xu, Richard F. Doell, and Jason J. Corso. 2013. A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2634–2641.Google Scholar
- Sokemi Rene Emmanuel Datondji, Yohan Dupuis, Peggy Subirats, and Pascal Vasseur. 2016. A survey of vision-based traffic monitoring of road intersections. IEEE Trans. Intell. Transport. Syst. 17, 10 (2016), 2681–2698.Google ScholarDigital Library
- N. A. Deepak and U. N. Sinha. 2016. Analysis of human gait for person identification and human action recognition. Analysis 4, 4 (2016).Google Scholar
- Mohamed Dermouche, Julien Velcin, Leila Khouas, and Sabine Loudcher. 2014. A joint model for topic-sentiment evolution over time. In Proceedings of the IEEE International Conference on Data Mining. IEEE, 773–778.Google ScholarDigital Library
- Adji B. Dieng, Chong Wang, Jianfeng Gao, and John Paisley. 2016. Topicrnn: A recurrent neural network with long-range semantic dependency. Retrieved from https://arXiv preprint arXiv:1611.01702.Google Scholar
- M. Divya, K. Thendral, and S. Chitrakala. 2013. A survey on topic modeling. Int. J. Recent Adv. Eng. Technol. 1 (2013), 57–61.Google Scholar
- Yinpeng Dong, Hang Su, Jun Zhu, and Bo Zhang. 2017. Improving interpretability of deep neural networks with semantic information. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4306–4314.Google ScholarCross Ref
- Ke Du, Ying Shi, Bowen Lei, Jie Chen, and Mingjun Sun. 2016. A method of human action recognition based on spatio-temporal interest points and PLSA. In Proceedings of the International Conference on Industrial Informatics-Computing Technology, Intelligent Technology, Industrial Information Integration. IEEE, 69–72.Google ScholarCross Ref
- Liang Du, Haitao Lang, Ying-Li Tian, Chiu C. Tan, Jie Wu, and Haibin Ling. 2016. Covert video classification by codebook growing pattern. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 11–18.Google ScholarCross Ref
- Paul Duckworth, Muhannad Al-Omari, James Charles, David C. Hogg, and Anthony G. Cohn. 2017. Latent dirichlet allocation for unsupervised activity analysis on an autonomous mobile robot. In Proceedings of the AAAI Conference on Artificial Intelligence. 3819–3826.Google Scholar
- Yawen Fan, Quan Zhou, Wenjing Yue, and Weiping Zhu. 2017. A dynamic causal topic model for mining activities from complex videos. Multimedia Tools Appl. (2017), 1–16.Google Scholar
- Li Fei-Fei and Pietro Perona. 2005. A bayesian hierarchical model for learning natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. IEEE, 524–531.Google ScholarDigital Library
- Jiangfan Feng and Amin Fu. 2018. Scene semantic recognition based on probability topic model. Information 9, 4 (2018), 97.Google ScholarCross Ref
- Yanwei Fu, Timothy M. Hospedales, Tao Xiang, and Shaogang Gong. 2013. Learning multimodal latent attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36, 2 (2013), 303–316.Google ScholarDigital Library
- Yanwei Fu, Timothy M. Hospedales, Tao Xiang, and Shaogang Gong. 2014. Learning multimodal latent attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36, 2 (2014), 303–316.Google ScholarDigital Library
- Laya Elsa George and Lokendra Birla. 2018. A study of topic modeling methods. In Proceedings of the 2nd International Conference on Intelligent Computing and Control Systems (ICICCS’18). IEEE, 109–113.Google ScholarCross Ref
- Yogesh Girdhar, Walter Cho, Matthew Campbell, Jesus Pineda, Elizabeth Clarke, and Hanumant Singh. 2016. Anomaly detection in unstructured environments using bayesian nonparametric scene modeling. In Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, 2651–2656.Google ScholarDigital Library
- Shaogang Gong and Tao Xiang. 2011. Visual analysis of behaviour: From pixels to semantics. Springer Science & Business Media.Google Scholar
- Lena Gorelick, Moshe Blank, Eli Shechtman, Michal Irani, and Ronen Basri. 2007. Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29, 12 (2007), 2247–2253.Google ScholarDigital Library
- Tom Griffiths. 2002. Gibbs sampling in the generative model of latent dirichlet allocation. Citesee.Google Scholar
- Amit Gruber, Yair Weiss, and Michal Rosen-Zvi. 2007. Hidden topic Markov models. In Artificial Intelligence and Statistics. Springer, 163–170.Google Scholar
- Adrien Guille and Edmundo-Pavel Soriano-Morales. 2016. TOM: A library for topic modeling and browsing. In Proceedings of the European Grid Conference (EGC’16). 451–456.Google Scholar
- Amirhossein Habibian, Thomas Mensink, and Cees G. M. Snoek. 2014. Videostory: A new multimedia embedding for few-example recognition and translation of events. In Proceedings of the ACM International Conference on Multimedia. 17–26.Google Scholar
- Avishai Hendel, Daphna Weinshall, and Shmuel Peleg. 2010. Identifying surprising events in videos using bayesian topic models. In Proceedings of the Asian Conference on Computer Vision. Springer, 448–459.Google Scholar
- Yan Heng, Zhifeng Gao, Yuan Jiang, and Xuqi Chen. 2018. Exploring hidden factors behind online food shopping from Amazon reviews: A topic mining approach. J. Retail. Consumer Services 42 (2018), 161–168.Google ScholarCross Ref
- Joao F. Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2015. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37, 3 (2015), 583–596.Google ScholarDigital Library
- Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2009. Replicated softmax: An undirected topic model. In Advances in Neural Information Processing Systems. MIT Press, 1607–1614.Google ScholarDigital Library
- Matthew Hoffman, Francis R. Bach, and David M. Blei. 2010. Online learning for latent dirichlet allocation. In Advances in Neural Information Processing Systems. MIT Press, 856–864.Google ScholarDigital Library
- Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the International ACM Conference on Research and Development in Information Retrieval. ACM, 50–57.Google ScholarDigital Library
- Thomas Hofmann. 2001. Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42, 1 (2001), 177–196.Google ScholarCross Ref
- Timothy Hospedales, Shaogang Gong, and Tao Xiang. 2009. A Markov clustering topic model for mining behaviour in video. In Proceedings of the International Conference on Computer Vision. IEEE, 1165–1172.Google ScholarCross Ref
- Timothy Hospedales, Shaogang Gong, and Tao Xiang. 2012. Video behaviour mining using a dynamic topic model. Int. J. Comput. Vision 98, 3 (2012), 303–323.Google ScholarDigital Library
- Timothy M. Hospedales, Jian Li, Shaogang Gong, and Tao Xiang. 2011. Identifying rare and subtle behaviors: A weakly supervised joint topic model. IEEE Trans. Pattern Anal. Mach. Intell. 33, 12 (2011), 2451–2464.Google ScholarDigital Library
- Sujuan Hou, Ling Chen, Dacheng Tao, Shangbo Zhou, Wenjie Liu, and Yuanjie Zheng. 2017. Multi-layer multi-view topic model for classifying advertising video. Pattern Recogn. 68 (2017), 66–81.Google ScholarDigital Library
- Chun-Hao Huang, Edmond Boyer, Nassir Navab, and Slobodan Ilic. 2014. Human shape and pose tracking using keyframes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3446–3453.Google ScholarDigital Library
- Michael C. Hughes. 2010. Supervised topic models for video activity recognition. Unpublished manuscript.Google Scholar
- Jing Huo, Yang Gao, Yinghuan Shi, and Hujun Yin. 2018. Cross-modal metric learning for auc optimization. IEEE Trans. Neural Netw. Learn. Syst. 29, 10 (2018), 4844–4856.Google ScholarCross Ref
- Tâm Huynh, Mario Fritz, and Bernt Schiele. 2008. Discovery of activity patterns using topic models. In Proceedings of the International Conference on Ubiquitous Computing. ACM, 10–19.Google ScholarDigital Library
- Olga Isupova, Danil Kuzin, and Lyudmila Mihaylova. 2015. Abnormal behaviour detection in video using topic modeling. In Proceedings of the USES Conference. The University of Sheffield.Google Scholar
- Olga Isupova, Danil Kuzin, and Lyudmila Mihaylova. 2018. Learning methods for dynamic topic modeling in automated behavior analysis. IEEE Trans. Neural Netw. Learn. Syst. 29, 9 (2018), 3980–3993.Google ScholarCross Ref
- Olga Isupova, Lyudmila Mihaylova, Danil Kuzin, Garik Markarian, and Francois Septier. 2015. An expectation maximisation algorithm for behaviour analysis in video. In Proceedings of the International Conference on Information Fusion. IEEE, 126–133.Google Scholar
- Rahul Radhakrishnan Iyer, Sanjeel Parekh, Vikas Mohandoss, Anush Ramsurat, Bhiksha Raj, and Rita Singh. 2016. Content-based video indexing and retrieval using corr-lda. Retrieved from https://arXiv preprint arXiv:1602.08581.Google Scholar
- Anil K. Jain. 2010. Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31, 8 (2010), 651–666.Google ScholarDigital Library
- V. Jelisavčić, Bojan Furlan, Jelica Protić, and Veljko Milutinović. 2012. Topic models and advanced algorithms for profiling of knowledge in scientific papers. In Proceedings of the International Convention MIPRO. IEEE, 1030–1035.Google Scholar
- Hamed Jelodar, Yongli Wang, Chi Yuan, Xia Feng, Xiahui Jiang, Yanchao Li, and Liang Zhao. 2019. Latent dirichlet allocation (LDA) and topic modeling: Models, applications, a survey. Multimedia Tools Appl. 78, 11 (2019), 15169–15211.Google ScholarDigital Library
- Hawook Jeong, Youngjoon Yoo, Kwang Moo Yi, and Jin Young Choi. 2014. Two-stage online inference model for traffic pattern analysis and anomaly detection. Mach. Vision Appl. 25, 6 (2014), 1501–1517.Google ScholarDigital Library
- Longlong Jing and Yingli Tian. 2020. Self-supervised visual feature learning with deep neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. (2020), 1–1. DOI:10.1109/TPAMI.2020.2992393Google ScholarCross Ref
- Peiguang Jing, Yuting Su, Liqiang Nie, Xu Bai, Jing Liu, and Meng Wang. 2017. Low-rank multi-view embedding learning for micro-video popularity prediction. IEEE Trans. Knowl. Data Eng. 30, 8 (2017), 1519–1532.Google ScholarCross Ref
- Michael I. Jordan. 2010. Bayesian nonparametric learning: Expressive priors for intelligent systems. Heuristics, Probabil. Causal.: Trib. Judea Pearl 11 (2010), 167–185.Google Scholar
- Arnold Kalmbach, Maia Hoeberechts, Alexandra Branzan Albu, Hervé Glotin, Sébastien Paris, and Yogesh Girdhar. 2016. Learning deep-sea substrate types with visual topic models. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. IEEE, 1–9.Google ScholarCross Ref
- Zenun Kastrati, Ali Shariq Imran, and Arianit Kurti. 2019. Integrating word embeddings and document topics with deep learning in a video classification framework. Pattern Recogn. Lett. 128 (2019), 85–92.Google ScholarDigital Library
- Hirokatsu Kataoka, Yoshimitsu Aoki, Kenji Iwata, and Yutaka Satoh. 2015. Evaluation of vision-based human activity recognition in dense trajectory framework. In Proceedings of the International Symposium on Visual Computing. Springer, 634–646.Google ScholarCross Ref
- Hirokatsu Kataokai, Kenji Iwata, Yutaka Satoh, Masaki Hayashi, Yoshimitsu Aok, and Slobodan Ilic. 2016. Dominant codewords selection with topic model for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 65–72.Google ScholarCross Ref
- Angelos Katharopoulos, Despoina Paschalidou, Christos Diou, and Anastasios Delopoulos. 2016. Fast supervised lda for discovering micro-events in large-scale video datasets. In Proceedings of the ACM International Conference on Multimedia. 332–336.Google ScholarDigital Library
- Sayed Hossein Khatoonabadi and Ivan V. Bajic. 2013. Video object tracking in the compressed domain using spatio-temporal Markov random fields. IEEE Trans. Image Process. 22, 1 (2013), 300–313.Google ScholarDigital Library
- Jaechul Kim and Kristen Grauman. 2009. Observe locally, infer globally: A space-time MRF for detecting abnormal activities with incremental updates. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2921–2928.Google ScholarCross Ref
- Patrik Ehrencrona Kjellin and Yan Liu. 2016. A survey on interactivity in topic models. Int. J. Adv. Comput. Sci. Appl. 7, 4 (2016), 456–461.Google Scholar
- Santhosh Kelathodi Kumaran, Adyasha Chakravarty, Debi Prosad Dogra, and Partha Pratim Roy. 2019. Likelihood learning in modified dirichlet process mixture model for video analysis. Pattern Recogn. Lett. 128 (2019), 211–219.Google ScholarCross Ref
- Lakhdar Laib, Mohand Said Allili, and Samy Ait-Aoudia. 2019. A probabilistic topic model for event-based image classification and multi-label annotation. Signal Process.: Image Commun. 76 (2019), 283–294.Google ScholarCross Ref
- Jey Han Lau, David Newman, and Timothy Baldwin. 2014. Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics. 530–539.Google ScholarCross Ref
- Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning. 1188–1196.Google Scholar
- Sangno Lee, Jeff Baker, Jaeki Song, and James C. Wetherbe. 2010. An empirical comparison of four text mining methods. In Proceedings of the International Conference on System Sciences. IEEE, 1–10.Google Scholar
- Haojie Li, Lijuan Liu, Fuming Sun, Yu Bao, and Chenxin Liu. 2016. Multi-level feature representations for video semantic concept detection. Neurocomputing 172 (2016), 64–70.Google ScholarDigital Library
- Jian Li, Shaogang Gong, and Tao Xiang. 2008. Global behaviour inference using probabilistic latent semantic analysis. In Proceedings of the British Machine Vision Conference, Vol. 3231. 3232.Google ScholarCross Ref
- Jian Li, Shaogang Gong, and Tao Xiang. 2012. Learning behavioural context. Int. J. Comput. Vision 97, 3 (2012), 276–304.Google ScholarDigital Library
- Jian Li, Timothy M. Hospedales, Shaogang Gong, and Tao Xiang. 2010. Learning rare behaviours. In Proceedings of the Asian Conference on Computer Vision. Springer, 293–307.Google Scholar
- Li-Jia Li and Li Fei-Fei. 2007. What, where and who? classifying events by scene and object recognition. In Proceedings of the International Conference on Computer Vision. IEEE, 1–8.Google ScholarCross Ref
- Teng Li, Huan Chang, Meng Wang, Bingbing Ni, Richang Hong, and Shuicheng Yan. 2015. Crowded scene analysis: A survey. IEEE Trans. Circ. Syst. Video Technol. 25, 3 (2015), 367–386.Google ScholarDigital Library
- Wentong Liao, Bodo Rosenhahn, and Machael Yang. 2015. Video event recognition by combining HDP and gaussian process. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 19–27.Google ScholarDigital Library
- Chih-Ching Lin, Shwu-Huey Yen, and Ching-Ting Tu. 2017. Visual object tracking via LDA. In Proceedings of the International Conference on Applied System Innovation. IEEE, 315–318.Google ScholarCross Ref
- Lu Lu, Zhan Yi-Ju, Jiang Qing, and Cai Qing-Ling. 2017. Recognizing human actions by two-level Beta process hidden Markov model. Multimedia Syst. 23, 2 (2017), 183–194.Google ScholarDigital Library
- Wenhan Luo, Björn Stenger, Xiaowei Zhao, and Tae-Kyun Kim. 2015. Automatic topic discovery for multi-object tracking. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
- Guangyi Lv, Tong Xu, Enhong Chen, Qi Liu, and Yi Zheng. 2016. Reading the videos: Temporal labeling for crowdsourced time-sync videos based on semantic embedding. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
- Xudong Mao, Qing Li, Haoran Xie, Raymond Y. K. Lau, Zhen Wang, and Stephen Paul Smolley. 2017. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2794–2802.Google ScholarCross Ref
- Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, Makarand Tapaswi, Ivan Laptev, and Josef Sivic. 2019. Howto100m: Learning a text-video embedding by watching hundred million narrated video clips. In Proceedings of the IEEE International Conference on Computer Vision. 2630–2640.Google ScholarCross Ref
- Arjan Mieremet, Ivo Alberink, Bart Hoogeboom, and Derk Vrijdag. 2018. Probability intervals of speed estimations from video images: The Markov chain monte carlo approach. Forens. Sci. Int. 288 (2018), 29–35.Google ScholarCross Ref
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. Retrieved from https://arXiv preprint arXiv:1301.3781.Google Scholar
- Samaneh Moghaddam and Martin Ester. 2012. On the design of LDA models for aspect-based opinion mining. In Proceedings of the ACM International Conference on Information and Knowledge Management. ACM, 803–812.Google Scholar
- Brendan Tran Morris and Mohan Trivedi. 2013. Understanding vehicular traffic behavior from video: A survey of unsupervised approaches. J. Electron. Imag. 22, 4 (2013), 041113.Google ScholarCross Ref
- T Nathan Mundhenk, Daniel Ho, and Barry Y. Chen. 2018. Improvements to context-based self-supervised learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9339–9348.Google Scholar
- Shi-Yong Neo, Yuanyuan Ran, Hai-Kiat Goh, Yantao Zheng, Tat-Seng Chua, and Jintao Li. 2007. The use of topic evolution to help users browse and find answers in news video corpus. In Proceedings of the ACM International Conference on Multimedia. ACM, 198–207.Google ScholarDigital Library
- David Newman, Padhraic Smyth, Max Welling, and Arthur U. Asuncion. 2008. Distributed inference for latent dirichlet allocation. In Advances in Neural Information Processing Systems. MIT Press, 1081–1088.Google Scholar
- Juan Carlos Niebles, Hongcheng Wang, and Li Fei-Fei. 2008. Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vision 79, 3 (2008), 299–318.Google ScholarDigital Library
- Zhenxing Niu, Gang Hua, Le Wang, and Xinbo Gao. 2018. Knowledge-based topic model for unsupervised object discovery and localization. IEEE Trans. Image Process. 27, 1 (2018), 50–63.Google ScholarCross Ref
- Aytug Onan, Serdar Korukoglu, and Hasan Bulut. 2016. LDA-based topic modelling in text sentiment classification: An empirical analysis.Int. J. Comput. Linguist. Appl. 7, 1 (2016), 101–119.Google Scholar
- Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, and Yong Rui. 2016. Jointly modeling embedding and translation to bridge video and language. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4594–4602.Google ScholarCross Ref
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318.Google Scholar
- Deepak Pathak, Abhijit Sharang, and Amitabha Mukerjee. 2015. Anomaly localization in topic-based analysis of surveillance videos. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. IEEE, 389–395.Google ScholarDigital Library
- Anastasia Podosinnikova, Francis Bach, and Simon Lacoste-Julien. 2015. Rethinking lda: Moment matching for discrete ICA. In Advances in Neural Information Processing Systems. MIT Press, 514–522.Google Scholar
- Oluwatoyin P. Popoola and Kejun Wang. 2012. Video-based abnormal human behavior recognition—A review. IEEE Trans. Systems, Man, Cybernet., Part C (Appl. Rev.) 42, 6 (2012), 865–878.Google ScholarDigital Library
- Ronald Poppe. 2010. A survey on vision-based human action recognition. Image Vision Comput. 28, 6 (2010), 976–990.Google ScholarDigital Library
- Ian Porteous, David Newman, Alexander Ihler, Arthur Asuncion, Padhraic Smyth, and Max Welling. 2008. Fast collapsed gibbs sampling for latent dirichlet allocation. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining. ACM, 569–577.Google ScholarDigital Library
- Anderson Rocha, Walter Scheirer, Terrance Boult, and Siome Goldenstein. 2011. Vision of the unseen: Current trends and challenges in digital image and video forensics. ACM Comput. Surveys 43, 4 (2011), 26.Google ScholarDigital Library
- Filipe Rodrigues, Mariana Lourenco, Bernardete Ribeiro, and Francisco C. Pereira. 2017. Learning supervised topic models for classification and regression from crowds. IEEE Trans. Pattern Anal. Mach. Intell. 39, 12 (2017), 2409–2422.Google ScholarCross Ref
- Mikel Rodriguez, Josef Sivic, Ivan Laptev, and Jean-Yves Audibert. 2011. Data-driven crowd analysis in videos. In Proceedings of the IEEE International Conference on Computer vision. IEEE, 1235–1242.Google ScholarDigital Library
- Sergio Rodríguez-Pérez and Raul Montoliu. 2013. Bag-of-words and topic modeling-based sport video analysis. In Proceedings of the Iberian Conference on Pattern Recognition and Image Analysis. Springer, 189–196.Google ScholarCross Ref
- Marcus Rohrbach, Sikandar Amin, Mykhaylo Andriluka, and Bernt Schiele. 2012. A database for fine grained activity detection of cooking activities. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1194–1201.Google ScholarCross Ref
- Xin Rong. 2014. word2vec parameter learning explained. Retrieved from https://arXiv preprint arXiv:1411.2738.Google Scholar
- Lukas Rybok, Simon Friedberger, Uwe D. Hanebeck, and Rainer Stiefelhagen. 2011. The kit robo-kitchen data set for the evaluation of view-based activity recognition systems. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots. IEEE, 128–133.Google ScholarCross Ref
- Imran Saleemi, Khurram Shafique, and Mubarak Shah. 2009. Probabilistic modeling of scene dynamics for applications in visual surveillance. IEEE Trans. Pattern Anal. Mach. Intell. 31, 8 (2009), 1472–1485.Google ScholarDigital Library
- Juan C. SanMiguel, Andrea Cavallaro, and José M. Martínez. 2012. Adaptive online performance evaluation of video trackers. IEEE Trans. Image Process. 21, 5 (2012), 2812–2823.Google ScholarDigital Library
- Kelathodi Kumaran Santhosh, Debi Prosad Dogra, and Partha Pratim Roy. 2018. Temporal unknown incremental clustering model for analysis of traffic surveillance videos. IEEE Trans. Intell. Transport. Syst. 20, 5 (2018), 1762–1773.Google ScholarCross Ref
- Kelathodi Kumaran Santhosh, Debi Prosad Dogra, Partha Pratim Roy, and Bidyut Baran Chaudhuri. 2019. Trajectory-based scene understanding using dirichlet process mixture model. IEEE Trans. Cybernet. (2019), 1–14. DOI:10.1109/TCYB.2019.2931139Google Scholar
- Christian Schuldt, Ivan Laptev, and Barbara Caputo. 2004. Recognizing human actions: A local SVM approach. In Proceedings of the International Conference on Pattern Recognition, Vol. 3. IEEE, 32–36.Google ScholarCross Ref
- Matthew W. Segar, Kershaw V. Patel, Colby Ayers, Mujeeb Basit, W. H. Wilson Tang, Duwayne Willett, Jarett Berry, Justin L. Grodin, and Ambarish Pandey. 2020. Phenomapping of patients with heart failure with preserved ejection fraction using machine learning-based unsupervised cluster analysis. Eur. J. Heart Fail. 22, 1 (2020), 148–158.Google ScholarCross Ref
- Giulia Slavic, Damian Campo, Mohamad Baydoun, Pablo Marin, David Martin, Lucio Marcenaro, and Carlo Regazzoni. 2020. Anomaly detection in video data based on probabilistic latent space models. In Proceedings of the IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS’20). IEEE, 1–8.Google ScholarCross Ref
- Angela A. Sodemann, Matthew P. Ross, and Brett J. Borghetti. 2012. A review of anomaly detection in automated surveillance. IEEE Trans. Syst. Man Cybernet., Part C (Appl. Rev.) 42, 6 (2012), 1257–1272.Google ScholarDigital Library
- Berkan Solmaz, Brian E. Moore, and Mubarak Shah. 2012. Identifying behaviors in crowd scenes using stability analysis for dynamical systems. IEEE Trans. Pattern Anal. Mach. Intell. 34, 10 (2012), 2064–2070.Google ScholarDigital Library
- Khurram Soomro, Amir Roshan Zamir, and M. Shah. 2012. A dataset of 101 human action classes from videos in the wild. Center Res. Comput. Vision 2, 11 (2012).Google Scholar
- Keith Stevens, Philip Kegelmeyer, David Andrzejewski, and David Buttler. 2012. Exploring topic coherence over many models and many topics. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 952–961.Google Scholar
- Xing Sun, Nelson H. C. Yung, Edmund Y. Lam, and Hayden K.-H. So. 2016. Unsupervised tracking with a low computational cost using the doubly stochastic Dirichlet process mixture model. Electron. Imag. 2016, 14 (2016), 1–8.Google Scholar
- Yee Whye Teh and Michael I. Jordan. 2010. Hierarchical Bayesian nonparametric models with applications. Bayesian Nonparametr. 1 (2010), 158–207.Google ScholarCross Ref
- Yee W. Teh, David Newman, and Max Welling. 2007. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In Advances in Neural Information Processing Systems. MIT Press, 1353–1360.Google Scholar
- Nguyen Anh Tu, Thien Huynh-The, Kifayat Ullah Khan, and Young-Koo Lee. 2018. ML-HDP: A hierarchical bayesian nonparametric model for recognizing human actions in video. IEEE Trans. Circ. Syst. Video Technol. 29, 3 (2018), 800–814.Google ScholarDigital Library
- Jagannadan Varadarajan, Rémi Emonet, and Jean-Marc Odobez. 2013. A sequential topic model for mining recurrent activities from long term video logs. Int. J. Comput. Vision 103, 1 (2013), 100–126.Google ScholarCross Ref
- Jagannadan Varadarajan and Jean-Marc Odobez. 2009. Topic models for scene analysis and abnormality detection. In Proceedings of the IEEE International Conference on Computer Vision Workshops. IEEE, 1338–1345.Google ScholarCross Ref
- Chong Wang and David M. Blei. 2011. Collaborative topic modeling for recommending scientific articles. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining. ACM, 448–456.Google Scholar
- Chong Wang, John Paisley, and David Blei. 2011. Online variational inference for the hierarchical Dirichlet process. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 752–760.Google Scholar
- He Wang and Carol O’Sullivan. 2016. Globally continuous and non-Markovian crowd activity analysis from videos. In Proceedings of the European Conference on Computer Vision. Springer, 527–544.Google ScholarCross Ref
- Heng Wang and Cordelia Schmid. 2013. Action recognition with improved trajectories. In Proceedings of the IEEE International Conference on Computer Vision. 3551–3558.Google ScholarDigital Library
- Hongxing Wang, Gangqiang Zhao, and Junsong Yuan. 2014. Visual pattern discovery in image and video data: A brief survey. Wiley Interdisc. Rev.: Data Min. Knowl. Discov. 4, 1 (2014), 24–37.Google ScholarDigital Library
- Jinqiao Wang, Wei Fu, Hanqing Lu, and Songde Ma. 2014. Bilayer sparse topic model for scene analysis in imbalanced surveillance videos. IEEE Trans. Image Process. 23, 12 (2014), 5198–5208.Google ScholarCross Ref
- Jun Wang, Limin Xia, Xiangjie Hu, and Yongliang Xiao. 2019. Abnormal event detection with semi-supervised sparse topic model. Neural Comput. Appl. 31, 5 (2019), 1607–1617.Google ScholarDigital Library
- Le Wang, Gang Hua, Rahul Sukthankar, Jianru Xue, and Nanning Zheng. 2014. Video object discovery and co-segmentation with extremely weak supervision. In Proceedings of the European Conference on Computer Vision. Springer, 640–655.Google ScholarCross Ref
- Tingwei Wang and Chuancai Liu. 2013. Human action recognition using supervised pLSA. Int. J. Signal Process. Image Process. Pattern Recogn. 6, 4 (2013), 403–414.Google Scholar
- Wei Wang, Payam Mamaani Barnaghi, and Andrzej Bargiela. 2010. Probabilistic topic models for learning terminological ontologies. IEEE Trans. Knowl. Data Eng. 22, 7 (2010), 1028–1040.Google ScholarDigital Library
- Xiaogang Wang, Keng Teck Ma, Gee-Wah Ng, and W. Eric L. Grimson. 2011. Trajectory analysis and semantic region modeling using nonparametric hierarchical bayesian models. Int. J. Comput. Vision 95, 3 (2011), 287–312.Google ScholarDigital Library
- Xiaogang Wang, Xiaoxu Ma, and W. Eric L. Grimson. 2009. Unsupervised activity perception in crowded and complicated scenes using hierarchical bayesian models. IEEE Trans. Pattern Anal. Mach. Intell. 31, 3 (2009), 539–555.Google ScholarDigital Library
- Xuerui Wang and Andrew McCallum. 2006. Topics over time: A non-Markov continuous-time model of topical trends. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining. ACM, 424–433.Google ScholarDigital Library
- Yinying Wang, Alex J. Bowers, and David J. Fikis. 2017. Automated text data mining analysis of five decades of educational leadership research literature: Probabilistic topic modeling of EAQ articles from 1965 to 2014. Edu. Admin. Quart. 53, 2 (2017), 289–323.Google ScholarCross Ref
- Zheng Wang, Jie Zhou, Jing Ma, Jingjing Li, Jiangbo Ai, and Yang Yang. 2020. Discovering attractive segments in the user-generated video streams. Info. Process. Manage. 57, 1 (2020), 102130.Google ScholarCross Ref
- Daniel Weinland, Remi Ronfard, and Edmond Boyer. 2006. Free viewpoint action recognition using motion history volumes. Comput. Vision Image Understand. 104, 2–3 (2006), 249–257.Google ScholarDigital Library
- F. P. Wheeler. 1998. Bayesian forecasting and dynamic models (2nd edn). J. Operat. Res. Soc. 49, 2 (1998), 179–180.Google ScholarCross Ref
- Sinead Williamson, Chong Wang, Katherine Heller, and David Blei. 2010. The IBP compound Dirichlet process and its application to focused topic modeling. In Proceedings of the ICML. 1151–1158. https://icml.cc/Conferences/2010/papers/397.pdf.Google Scholar
- Chenxia Wu, Jiemi Zhang, Ozan Sener, Bart Selman, Silvio Savarese, and Ashutosh Saxena. 2018. Watch-n-patch: Unsupervised learning of actions and relations. IEEE Trans. Pattern Anal. Mach. Intell. 40, 2 (2018), 467–481.Google ScholarDigital Library
- Jun Xu, Tao Mei, Ting Yao, and Yong Rui. 2016. Msr-vtt: A large video description dataset for bridging video and language. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5288–5296.Google ScholarCross Ref
- Xun Xu, Timothy M. Hospedales, and Shaogang Gong. 2017. Discovery of shared semantic spaces for multiscene video query and summarization. IEEE Trans. Circ. Syst. Video Technol. 27, 6 (2017), 1353–1367.Google ScholarDigital Library
- Junyu Xuan, Jie Lu, Guangquan Zhang, and Xiangfeng Luo. 2015. Topic model for graph mining. IEEE Trans. Cybernet. 45, 12 (2015), 2792–2803.Google ScholarCross Ref
- Jianfei Xue and Koji Eguchi. 2018. Sequential Bayesian nonparametric multimodal topic models for video data analysis. IEICE Trans. Info. Syst. 101, 4 (2018), 1079–1087.Google ScholarCross Ref
- Jianfei Xue and Koji Eguchi. 2019. Supervised nonparametric multimodal topic models for multi-class video classification. ITE Trans. Media Technol. Appl. 7, 2 (2019), 80–91.Google ScholarCross Ref
- Michael Ying Yang, Wentong Liao, Yanpeng Cao, and Bodo Rosenhahn. 2018. Video event recognition and anomaly detection by combining gaussian process and hierarchical dirichlet process models. Photogram. Eng. Remote Sens. 84, 4 (2018), 203–214.Google ScholarCross Ref
- Shuang Yang, Chunfeng Yuan, Weiming Hu, and Xinmiao Ding. 2014. A hierarchical model based on latent dirichlet allocation for action recognition. In Proceedings of the International Conference on Pattern Recognition. IEEE, 2613–2618.Google ScholarDigital Library
- Shuang Yang, Chunfeng Yuan, Baoxin Wu, Weiming Hu, and Fangshi Wang. 2015. Multi-feature max-margin hierarchical Bayesian model for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1610–1618.Google ScholarCross Ref
- Yang Yang, Jingen Liu, and Mubarak Shah. 2009. Video scene understanding using multi-scale analysis. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 1669–1676.Google ScholarCross Ref
- Litao Yu, Zi Huang, Jiewei Cao, and Heng Tao Shen. 2016. Scalable video event retrieval by visual state binary embedding. IEEE Trans. Multimedia 18, 8 (2016), 1590–1603.Google ScholarDigital Library
- Niange Yu, Xiaolin Hu, Binheng Song, Jian Yang, and Jianwei Zhang. 2018. Topic-oriented image captioning based on order-embedding. IEEE Trans. Image Process. 28, 6 (2018), 2743–2754.Google ScholarCross Ref
- Yin Yuan, Haomian Zheng, Zhu Li, and David Zhang. 2010. Video action recognition with spatio-temporal graph embedding and spline modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2422–2425.Google ScholarCross Ref
- Yun Zhai and Mubarak Shah. 2006. Video scene segmentation using Markov chain Monte Carlo. IEEE Trans. Multimedia 8, 4 (2006), 686–697.Google ScholarDigital Library
- Jianguo Zhang and Shaogang Gong. 2010. Action categorization by structural probabilistic latent semantic analysis. Comput. Vision Image Understand. 114, 8 (2010), 857–864.Google ScholarDigital Library
- Bin Zhao, Wei Xu, Genlin Ji, and Chao Tan. 2015. Discovering topic evolution topology in a microblog corpus. In Proceedings of the International Conference on Advanced Cloud and Big Data. IEEE, 7–14.Google ScholarCross Ref
- Fang Zhao, Yongzhen Huang, Liang Wang, and Tieniu Tan. 2013. Relevance topic model for unstructured social group activity recognition. In Advances in Neural Information Processing Systems. MIT Press, 2580–2588.Google Scholar
- Liang Zhao, Lin Shang, Yang Gao, Yubin Yang, and Xiuyi Jia. 2013. Video behavior analysis using topic models and rough sets [applications notes]. IEEE Comput. Intell. Mag. 8, 1 (2013), 56–67.Google ScholarDigital Library
- Zhicheng Zhao, Yifan Song, and Fei Su. 2016. Specific video identification via joint learning of latent semantic concept, scene and temporal structure. Neurocomputing 208 (2016), 378–386.Google ScholarDigital Library
- Yin Zheng, Yu-Jin Zhang, and Hugo Larochelle. 2014. Topic modeling of multimodal data: An autoregressive approach. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1370–1377.Google ScholarDigital Library
- Bolei Zhou, Xiaogang Wang, and Xiaoou Tang. 2011. Random field topic model for semantic region analysis in crowded scenes from tracklets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3441–3448.Google ScholarDigital Library
- Bolei Zhou, Xiaogang Wang, and Xiaoou Tang. 2012. Understanding collective crowd behaviors: Learning a mixture model of dynamic pedestrian-agents. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2871–2878.Google Scholar
- Houkui Zhou, Huimin Yu, Roland Hu, Guangqun Zhang, Junguo Hu, and Tao He. 2019. Analyzing multiple types of behaviors from traffic videos via nonparametric topic model. J. Visual Commun. Image Represent. 64 (2019), 102649.Google ScholarCross Ref
- Qiqi Zhu, Yanfei Zhong, Liangpei Zhang, and Deren Li. 2017. Scene classification based on the fully sparse semantic topic model. IEEE Trans. Geosci. Remote Sens. 55, 10 (2017), 5525–5538.Google ScholarCross Ref
- Xudong Zhu and Hui Li. 2012. Unsupervised human action categorization using latent Dirichlet Markov clustering. In Proceedings of the International Conference on Intelligent Networking and Collaborative Systems. IEEE, 347–352.Google ScholarDigital Library
- Xudong Zhu and Zhijing Liu. 2011. Human behavior clustering for anomaly detection. Front. Comput. Sci. China 5, 3 (2011), 279.Google ScholarDigital Library
- Jialing Zou, Qixiang Ye, Yanting Cui, David Doermann, and Jianbin Jiao. 2014. A belief-based correlated topic model for trajectory clustering in crowded video scenes. In Proceedings of the International Conference on Pattern Recognition. IEEE, 2543–2548.Google ScholarDigital Library
- Jialing Zou, Qixiang Ye, Yanting Cui, Fang Wan, Kun Fu, and Jianbin Jiao. 2016. Collective motion pattern inference via locally consistent latent dirichlet allocation. Neurocomputing 184 (2016), 221–231.Google ScholarDigital Library
Index Terms
- Topic-based Video Analysis: A Survey
Recommendations
Topic sentiment change analysis
MLDM'11: Proceedings of the 7th international conference on Machine learning and data mining in pattern recognitionPublic opinions on a topic may change over time. Topic Sentiment change analysis is a new research problem consisting of two main components: (a) mining opinions on a certain topic, and (b) detect significant changes of sentiment of the opinions on the ...
Topic-driven reader comments summarization
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementReaders of a news article often read its comments contributed by other readers. By reading comments, readers obtain not only complementary information about this news article but also the opinions from other readers. However, the existing ranking ...
Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...
Comments