research-article

Topic Modeling Using Latent Dirichlet allocation: A Survey

Authors:
Uttam Chauhan

Vishwakarma Government Engineering College, Chandkheda, Ahmedabad, Gujarat - India

Vishwakarma Government Engineering College, Chandkheda, Ahmedabad, Gujarat - India
View Profile

,
Apurva Shah

Maharaja Sayaji Rao University of Baroda, Vadodara, Gujarat - India

Maharaja Sayaji Rao University of Baroda, Vadodara, Gujarat - India
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 54 Issue 7Article No.: 145pp 1–35https://doi.org/10.1145/3462478

Published:17 September 2021Publication History

ACM Computing Surveys

Abstract

We are not able to deal with a mammoth text corpus without summarizing them into a relatively small subset. A computational tool is extremely needed to understand such a gigantic pool of text. Probabilistic Topic Modeling discovers and explains the enormous collection of documents by reducing them in a topical subspace. In this work, we study the background and advancement of topic modeling techniques. We first introduce the preliminaries of the topic modeling techniques and review its extensions and variations, such as topic modeling over various domains, hierarchical topic modeling, word embedded topic models, and topic models in multilingual perspectives. Besides, the research work for topic modeling in a distributed environment, topic visualization approaches also have been explored. We also covered the implementation and evaluation techniques for topic models in brief. Comparison matrices have been shown over the experimental results of the various categories of topic modeling. Diverse technical challenges and future directions have been discussed.

Supplemental Material

Available for Download

zip

chauhan.zip (93.2 KB)

Supplemental movie, appendix, image and software files for, Topic Modeling Using Latent Dirichlet allocation: A Survey

References

Nikolaos Aletras and Mark Stevenson. 2013. Evaluating topic coherence using distributional semantics. In Proceedings of the 10th International Conference on Computational Semantics. 13–22.Google Scholar
Rubayyi Alghamdi and Khalid Alfalqi. 2015. A survey of topic modeling in text mining. Int. J. Adv. Comput. Sci. Appl. 6, 1 (2015).Google Scholar
Loulwah AlSumait, Daniel Barbará, and Carlotta Domeniconi. 2008. On-line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking. In Proceedings of the 8th IEEE International Conference on Data Mining. IEEE, 3–12.Google ScholarDigital Library
Arthur Asuncion, Max Welling, Padhraic Smyth, and Yee Whye Teh. 2009. On smoothing and inference for topic models. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence. 27–34.Google Scholar
Hazeline U. Asuncion, Arthur U. Asuncion, and Richard N. Taylor. 2010. Software traceability with topic modeling. In Proceedings of the ACM/IEEE 32nd International Conference on Software Engineering, Vol. 1. IEEE, 95–104.Google Scholar
D. K. JinYeong Bak and A. Oh. 2012. Distributed online learning for latent Dirichlet allocation. In Proceedings of the NIPS Workshop on Big Learning. 1–8.Google Scholar
Parantapa Bhattacharya, Muhammad Bilal Zafar, Niloy Ganguly, Saptarshi Ghosh, and Krishna P. Gummadi. 2014. Inferring user interests in the Twitter social network. In Proceedings of the 8th ACM Conference on Recommender Systems. ACM, 357–360.Google Scholar
David M. Blei. 2012. Probabilistic topic models. Commun. ACM 55, 4 (2012), 77–84. DOI:https://doi.org/doi:10.1145/2133806.2133826Google ScholarDigital Library
David M. Blei, Thomas L. Griffiths, and Michael I. Jordan. 2010. The nested chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. J. ACM 57, 2 (2010), 7. DOI:https://doi.org/10.1145/1667053.1667056Google ScholarDigital Library
David M. Blei and John D. Lafferty. 2006. Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 113–120. DOI:https://doi.org/10.1145/1143844.1143859Google ScholarDigital Library
David M. Blei and John D. Lafferty. 2007. A correlated topic model of science. Ann. Appl. Statist. (2007), 17–35.Google Scholar
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, Jan. (2003), 993–1022. DOI:https://doi.org/10.1162/jmlr.2003.3.4-5.993Google Scholar
Jordan Boyd-Graber, David Mimno, and David Newman. 2014. Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements. Vol. 225255. CRC Press, Boca Raton, FL.Google Scholar
Samuel Brody and Mirella Lapata. 2009. Bayesian word sense induction. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 103–111.Google ScholarDigital Library
Stefan Bunk and Ralf Krestel. 2018. WELDA: Enhancing topic models by incorporating local word context. In Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries. 293–302.Google ScholarDigital Library
George Casella and Edward I. George. 1992. Explaining the Gibbs sampler. Amer. Statist. 46, 3 (1992), 167–174.Google Scholar
Jonathan Chang. 2012. Collapsed Gibbs sampling methods for topic models. R package: lda (version 1.3.2). http://cran.r-project.org/web/packages/lda/index.html.Google Scholar
Jonathan Chang and David Blei. 2009. Relational topic models for document networks. In Artificial Intelligence and Statistics. PMLR, 81–88.Google Scholar
Ying-Lang Chang and Jen-Tzung Chien. 2009. Latent Dirichlet learning for document summarization. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 1689–1692. DOI:https://doi.org/10.1109/ICASSP.2009.4959927Google ScholarDigital Library
Tse-Hsun Chen, Weiyi Shang, Meiyappan Nagappan, Ahmed E. Hassan, and Stephen W. Thomas. 2017. Topic-based software defect explanation. J. Syst. Softw. 129 (2017), 79–106. DOI:https://doi.org/10.1016/j.jss.2016.05.015Google ScholarDigital Library
Xueqi Cheng, Xiaohui Yan, Yanyan Lan, and Jiafeng Guo. 2014. BTM: Topic modeling over short texts. IEEE Trans. Knowl. Data Eng. 26, 12 (2014), 2928–2941. DOI:https://doi.org/10.1109/TKDE.2014.2313872Google ScholarCross Ref
Jason Chuang, Christopher D. Manning, and Jeffrey Heer. 2012. Termite: Visualization techniques for assessing textual topic models. In Proceedings of the International Working Conference on Advanced Visual Interfaces. ACM, 74–77. DOI:https://doi.org/10.1145/2254556.2254572Google ScholarDigital Library
Raphael Cohen, Iddo Aviram, Michael Elhadad, and Noémie Elhadad. 2014. Redundancy-aware topic modeling for patient record notes. PloS One 9, 2 (2014), e87555. DOI:https://doi.org/10.1371/journal.pone.0087555Google ScholarCross Ref
Mário Cordeiro. 2012. Twitter event detection: Combining wavelet analysis and topic inference summarization. In Doctoral Symposium on Informatics Engineering. 11–16.Google Scholar
Christopher S. Corley, Kostadin Damevski, and Nicholas A. Kraft. 2020. Changeset-based topic modeling of software repositories. IEEE Trans. Softw. Eng. 46, 10 (2020), 1068–1080. DOI:10.1109/TSE.2018.2874960Google ScholarCross Ref
Rajarshi Das, Manzil Zaheer, and Chris Dyer. 2015. Gaussian LDA for topic models with word embeddings. In Proceedings of the Meeting of the Association for Computational Linguistics. 795–804.Google ScholarCross Ref
Ali Daud, Juanzi Li, Lizhu Zhou, and Faqir Muhammad. 2010. Knowledge discovery through directed probabilistic topic models: A survey. Front. Comput. Sci. China 4, 2 (2010), 280–301.Google ScholarCross Ref
Wim De Smet and Marie-Francine Moens. 2009. Cross-language linking of news stories on the web using interlingual topic modelling. In Proceedings of the 2nd ACM Workshop on Social Web Search and Mining. ACM, 57–64.Google ScholarDigital Library
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107–113.Google ScholarDigital Library
Stefan Debortoli, Oliver Müller, Iris Junglas, and Jan vom Brocke. 2016. Text mining for information systems researchers: An annotated topic modeling tutorial. Commun. Assoc. Inf. Syst. 39, 1 (2016), 7.Google ScholarCross Ref
Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Inf. Sci. 41, 6 (1990), 391. DOI:https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9Google ScholarCross Ref
Mohamed Dermouche, Julien Velcin, Leila Khouas, and Sabine Loudcher. 2014. A joint model for topic-sentiment evolution over time. In Proceedings of the IEEE International Conference on Data Mining (ICDM’14). IEEE, 773–778. DOI:https://doi.org/10.1109/ICDM.2014.82Google ScholarDigital Library
Adji B. Dieng, Francisco J. R. Ruiz, and David M. Blei. 2019. The dynamic embedded topic model. arXiv preprint arXiv:1907.05545 (2019).Google Scholar
Adji B. Dieng, Francisco J. R. Ruiz, and David M. Blei. 2020. Topic modeling in embedding spaces. Trans. Assoc. Comput. Ling. 8 (2020), 439–453.Google ScholarCross Ref
Tarek Elguebaly and Nizar Bouguila. 2013. Simultaneous Bayesian clustering and feature selection using RJMCMC-based learning of finite generalized Dirichlet mixture models. Sig. Process. 93, 6 (2013), 1531–1546.Google ScholarDigital Library
Katayoun Farrahi and Daniel Gatica-Perez. 2011. Discovering routines from large-scale human locations using probabilistic topic models. ACM Trans. Intell. Syst. Technol. 2, 1 (2011), 3.Google ScholarDigital Library
Xianghua Fu, Kun Yang, Joshua Zhexue Huang, and Laizhong Cui. 2015. Dynamic non-parametric joint sentiment topic mixture model. Knowl.-based Syst. 82 (2015), 102–114.Google Scholar
Debasis Ganguly, Manisha Ganguly, Johannes Leveling, and Gareth J. F. Jones. 2013. TopicVis: A GUI for topic-based feedback and navigation. DOI:https://doi.org/10.1145/2484028.2484202Google ScholarDigital Library
Debasis Ganguly, Johannes Leveling, and Gareth J. F. Jones. 2012. Cross-lingual topical relevance models. DOI:https://doi.org/10.1145/564405.564408Google Scholar
Brynjar Gretarsson, John O’Donovan, Svetlin Bostandjiev, Tobias Höllerer, Arthur Asuncion, David Newman, and Padhraic Smyth. 2012. Topicnets: Visual analysis of large text corpora with topic modeling. ACM Trans. Intell. Syst. Technol. 3, 2 (2012), 23. DOI:https://doi.org/10.1126/science.1178206Google ScholarCross Ref
Tom Griffiths. 2002. Gibbs sampling in the generative model of latent Dirichlet allocation. DOI:https://doi.org/10.1145/1401890.1401960Google ScholarDigital Library
Thomas L. Griffiths and Mark Steyvers. 2004. Finding scientific topics. Proc. Nat. Acad. Sci. 101, suppl 1 (2004), 5228–5235. DOI:https://doi.org/10.1073/pnas.0307752101Google ScholarCross Ref
Loni Hagen. 2018. Content analysis of e-petitions with topic modeling: How to train and evaluate LDA models?Inf. Proc. Manag. 54, 6 (2018), 1292–1307.Google ScholarCross Ref
Aria Haghighi and Lucy Vanderwende. 2009. Exploring content models for multi-document summarization. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 362–370.Google ScholarCross Ref
Xingwei He, Hua Xu, Jia Li, Liu He, and Linlin Yu. 2017. FastBTM: Reducing the sampling time for biterm topic model. Knowl.-Based Syst 132 (2017), 11–20.Google ScholarCross Ref
Gregor Heinrich. 2008. Parameter Estimation for Text Analysis. Technical Report. University of Leipzig. 1–32.Google Scholar
Go Eun Heo, Keun Young Kang, Min Song, and Jeong-Hoon Lee. 2017. Analyzing the field of bioinformatics with the multi-faceted topic modeling technique. BMC Bioinf 18, 7 (2017), 251.Google ScholarCross Ref
Matthew Hoffman, Francis R. Bach, and David M. Blei. 2010. Online learning for latent Dirichlet allocation. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 856–864. DOI:https://doi.org/10.1.1.187.1883Google Scholar
Thomas Hofmann. 1999. Probabilistic latent semantic analysis. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., 289–296. DOI:https://doi.org/10.1162/jmlr.2003.3.4-5.993Google Scholar
Thomas Hofmann. 2001. Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42, 1 (2001), 177–196.Google ScholarCross Ref
Liangjie Hong, Ovidiu Dan, and Brian D. Davison. 2011. Predicting popular messages in Twitter. In Proceedings of the 20th International Conference Companion on World Wide Web. ACM, 57–58.Google Scholar
Pengfei Hu, Wenju Liu, Wei Jiang, and Zhanlei Yang. 2014. Latent topic model for audio retrieval. Pattern Recog. 47, 3 (2014), 1138–1143. DOI:https://doi.org/10.1016/j.patcog.2013.06.010Google ScholarDigital Library
Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, and Alison Smith. 2014. Interactive topic modeling. Mach. Learn. 95, 3 (2014), 423–469.Google ScholarDigital Library
Dongping Huang, Shuyu Hu, Yi Cai, and Huaqing Min. 2014. Discovering event evolution graphs based on news articles relationships. In Proceedings of the IEEE 11th International Conference on e-Business Engineering (ICEBE’14). IEEE, 246–251.Google ScholarDigital Library
Hamed Jelodar, Yongli Wang, Chi Yuan, Xia Feng, Xiahui Jiang, Yanchao Li, and Liang Zhao. 2019. Latent Dirichlet allocation (LDA) and topic modeling: Models, applications, a survey. Multimedia Tools. Applic. 78, 11 (2019), 15169–15211.Google ScholarDigital Library
Do-Heon Jeong and Min Song. 2014. Time gap analysis by the topic model-based temporal technique. J. Informet. 8, 3 (2014), 776–790. DOI:https://doi.org/10.1016/j.joi.2014.07.005Google ScholarCross Ref
Di Jiang, Yongxin Tong, and Yuanfeng Song. 2016. Cross-lingual topic discovery from multilingual search engine query log. ACM Trans. Inf. Syst. 35, 2 (2016), 9.Google ScholarDigital Library
Efsun Sarioglu Kayi, Kabir Yadav, James M. Chamberlain, and Hyeong-Ah Choi. 2017. Topic modeling for classification of clinical reports. arXiv preprint arXiv:1706.06177 (2017).Google Scholar
Muhammad Taimoor Khan, Mehr Durrani, Shehzad Khalid, and Furqan Aziz. 2016. Online knowledge-based model for big data topic extraction. Comput. Intell. Neurosci. DOI:https://doi.org/10.1155/2016/6081804Google Scholar
Milad Kharratzadeh, Benjamin Renard, and Mark J. Coates. 2015. Bayesian topic model approaches to online and time-dependent clustering. Dig. Sig. Process. 47 (2015), 25–35. DOI:https://doi.org/10.1016/j.dsp.2015.03.010Google ScholarDigital Library
Dongwoo Kim and Alice Oh. 2011. Accounting for data dependencies within a hierarchical Dirichlet process mixture model. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM, 873–878.Google ScholarDigital Library
Dongwoo Kim and Alice Oh. 2011. Topic chains for understanding a news corpus. Comput. Ling. Intell. Text Process.. DOI:https://doi.org/10.1007/978-3-642-19437-5_13Google Scholar
Dongwoo Kim and Alice Oh. 2014. Hierarchical Dirichlet scaling process. In Proceedings of the International Conference on Machine Learning. 973–981.Google Scholar
Joon Hee Kim, Dongwoo Kim, Suin Kim, and Alice Oh. 2012. Modeling topic hierarchies with the recursive Chinese restaurant process. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 783–792. DOI:https://doi.org/10.1145/2396761.2396861Google ScholarDigital Library
Younghoon Kim and Kyuseok Shim. 2014. TWILITE: A recommendation system for Twitter using a probabilistic model based on latent Dirichlet allocation. Inf. Syst. 42 (2014), 59–77. DOI:https://doi.org/10.1016/j.is.2013.11.003Google ScholarDigital Library
Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques. The MIT Press.Google ScholarDigital Library
Julian F. P. Kooij, Gwenn Englebienne, and Dariu M. Gavrila. 2015. Identifying multiple objects from their appearance in inaccurate detections. Comput. Vis. Image Underst. 136 (2015), 103–116.Google ScholarDigital Library
Guy Lansley and Paul A. Longley. 2016. The geography of Twitter topics in London. Comput. Environ. Urb. Syst. 58 (2016), 85–96. DOI:https://doi.org/10.1016/j.compenvurbsys.2016.04.002Google ScholarCross Ref
Jey Han Lau and Timothy Baldwin. 2016. The sensitivity of topic coherence evaluation to topic cardinality. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.483–487.Google ScholarCross Ref
Jey Han Lau, David Newman, and Timothy Baldwin. 2014. Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality.Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics.530–539.Google Scholar
Jure Leskovec, Lars Backstrom, and Jon Kleinberg. 2009. Meme-tracking and the dynamics of the news cycle. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 497–506.Google ScholarDigital Library
Chenliang Li, Yu Duan, Haoran Wang, Zhiqian Zhang, Aixin Sun, and Zongyang Ma. 2017. Enhancing topic modeling for short texts with auxiliary word embeddings. ACM Trans. Inf. Syst. 36, 2 (2017), 11.Google ScholarDigital Library
Chenliang Li, Haoran Wang, Zhiqian Zhang, Aixin Sun, and Zongyang Ma. 2016. Topic modeling for short texts with auxiliary word embeddings. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 165–174.Google ScholarDigital Library
Weifeng Li, Junming Yin, and Hsinchsun Chen. 2017. Supervised topic modeling using hierarchical Dirichlet process-based inverse regression: Experiments on e-commerce applications. IEEE Trans. Knowl. Data Eng. 30, 6 (2017), 1192–1205.Google ScholarCross Ref
Tianyi Lin, Wentao Tian, Qiaozhu Mei, and Hong Cheng. 2014. The dual-sparse topic model: Mining focused topics and focused terms in short text. In Proceedings of the 23rd International Conference on World Wide Web. 539–550.Google ScholarDigital Library
Erik Linstead, Paul Rigor, Sushil Bajracharya, Cristina Lopes, and Pierre Baldi. 2007. Mining concepts from code with probabilistic topic models. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering. ACM, 461–464.Google ScholarDigital Library
Jun S. Liu. 1994. The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. J. Amer. Statist. Assoc. 89, 427 (1994), 958–966.Google ScholarCross Ref
Shuhua Liu and Patrick Jansson. 2017. Topic Modelling Analysis of Instagram Data for the Greater Helsinki Region.Google Scholar
Xiaodong Liu, Kevin Duh, and Yuji Matsumoto. 2015. Multilingual topic models for bilingual dictionary extraction. ACM Trans. Asian Low-resour. Lang. Inf. Process. 14, 3 (2015), 11.Google ScholarDigital Library
Xiao Liu, Mingli Song, Qi Zhao, Dacheng Tao, Chun Chen, and Jiajun Bu. 2012. Attribute-restricted latent topic model for person re-identification. Pattern Recog. 45, 12 (2012), 4204–4213.Google ScholarDigital Library
Zhiyuan Liu, Yuzhou Zhang, Edward Y. Chang, and Maosong Sun. 2011. PLDA+: Parallel latent Dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol. 2, 3 (2011), 26.Google ScholarDigital Library
Kun Lu and Dietmar Wolfram. 2012. Measuring author research relatedness: A comparison of word-based, topic-based, and author cocitation approaches. J. Amer. Soc. Inf. Sci. Technol. 63, 10 (2012), 1973–1986.Google ScholarDigital Library
Zhiwu Lu and Yuxin Peng. 2013. Latent semantic learning with structured sparse representation for human action recognition. Pattern Recog. 46, 7 (2013), 1799–1809. DOI:https://doi.org/10.1016/j.patcog.2012.09.027Google ScholarDigital Library
Stacy K. Lukins, Nicholas A. Kraft, and Letha H. Etzkorn. 2010. Bug localization using latent Dirichlet allocation. Inf. Softw. Technol. 52, 9 (2010), 972–990.Google ScholarDigital Library
Minnan Luo, Feiping Nie, Xiaojun Chang, Yi Yang, Alexander Hauptmann, and Qinghua Zheng. 2017. Probabilistic non-negative matrix factorization and its robust extensions for topic modeling. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
Baizhang Ma, Dongsong Zhang, Zhijun Yan, and Taeha Kim. 2013. An LDA and synonym lexicon based approach to product feature extraction from online consumer product reviews. J. Electron. Commer. Res. 14, 4 (2013), 304. DOI:https://doi.org/10.1016/j.im.2015.02.002Google ScholarDigital Library
Hui-Fang Ma. 2011. Hot topic extraction using time window. In Proceedings of the International Conference on Machine Learning and Cybernetics (ICMLC’11). IEEE, 56–60.Google ScholarCross Ref
Masoud Makrehchi. 2011. Social link recommendation by learning hidden topics. In Proceedings of the 5th ACM Conference on Recommender Systems. ACM, 189–196.Google ScholarDigital Library
James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela H. Byers. 2011. Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute.Google Scholar
Jon D. Mcauliffe and David M. Blei. 2008. Supervised topic models. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 121–128.Google Scholar
Andrew Kachites McCallum. 2002. MALLET: A Machine Learning for Language Toolkit. (2002). Retrieved from http://mallet.cs.umass.edu.Google Scholar
Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, and ChengXiang Zhai. 2007. Topic sentiment mixture: Modeling facets and opinions in weblogs. In Proceedings of the 16th International Conference on World Wide Web. ACM, 171–180.Google ScholarDigital Library
David Mimno and Andrew McCallum. 2007. Expertise modeling for matching papers with reviewers. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 500–509.Google ScholarDigital Library
David Mimno and Andrew McCallum. 2007. Organizing the OCA: Learning faceted subjects from a library of digital books. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries. ACM, 376–385.Google ScholarDigital Library
David Mimno, Hanna M. Wallach, Jason Naradowsky, David A. Smith, and Andrew McCallum. 2009. Polylingual topic models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 880–889. DOI:https://doi.org/10.3115/1699571.1699627Google ScholarCross Ref
Christopher E. Moody. 2016. Mixing Dirichlet topic models and word embeddings to make lda2vec. arXiv preprint arXiv:1605.02019 (2016).Google Scholar
Gordon E. Moon, Israt Nisa, Aravind Sukumaran-Rajam, Bortik Bandyopadhyay, Srinivasan Parthasarathy, and P. Sadayappan. 2018. Parallel latent Dirichlet allocation on GPUs. In Proceedings of the International Conference on Computational Science. Springer, 259–272.Google Scholar
N. K. Nagwani. 2015. Summarizing large text collection using topic modeling and clustering based on MapReduce framework. J. Big Data 2, 1 (2015), 6.Google ScholarCross Ref
Ramesh Nallapati, William Cohen, and John Lafferty. 2007. Parallelized variational EM for latent Dirichlet allocation: An experimental evaluation of speed and scalability. In Proceedings of the International Conference on Data Mining Workshops (ICDMW’07). IEEE, 349–354.Google ScholarDigital Library
David Newman, Arthur Asuncion, Padhraic Smyth, and Max Welling. 2009. Distributed algorithms for topic models. J. Mach. Learn. Res. 10, Aug. (2009), 1801–1828.Google Scholar
David Newman, Jey Han Lau, Karl Grieser, and Timothy Baldwin. 2010. Automatic evaluation of topic coherence. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 100–108.Google Scholar
David Newman, Padhraic Smyth, and Mark Steyvers. 2006. Scalable parallel topic models. J. Intell. Commun. Res. Devel. 5 (2006). DOI:https://doi.org/10.7551/mitpress/9486.003.0011Google Scholar
David Newman, Padhraic Smyth, Max Welling, and Arthur U. Asuncion. 2008. Distributed inference for latent Dirichlet allocation. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 1081–1088.Google Scholar
Zhenxing Niu, Gang Hua, Le Wang, and Xinbo Gao. 2017. Knowledge-based topic model for unsupervised object discovery and localization. IEEE Trans. Image Process. 27, 1 (2017), 50–63.Google ScholarCross Ref
Michael J. Paul and Mark Dredze. 2014. Discovering health topics in social media using topic models. PloS One 9, 8 (2014), e103408. DOI:https://doi.org/10.1371/journal.pone.0103408Google ScholarCross Ref
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 (2011), 2825–2830.Google ScholarDigital Library
Nanyun Peng, Yiming Wang, and Mark Dredze. 2014. Learning polylingual topic models from code-switched social media documents. In Proceedings of the 52nd Meeting of the Association for Computational Linguistics. 674–679.Google ScholarCross Ref
James Petterson, Wray Buntine, Shravan M. Narayanamurthy, Tibério S. Caetano, and Alex J. Smola. 2010. Word features for latent Dirichlet allocation. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 1921–1929.Google Scholar
Ian Porteous, David Newman, Alexander Ihler, Arthur Asuncion, Padhraic Smyth, and Max Welling. 2008. Fast collapsed Gibbs sampling for latent Dirichlet allocation. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 569–577.Google ScholarDigital Library
Jipeng Qiang, Zhenyu Qian, Yun Li, Yunhao Yuan, and Xindong Wu. 2020. Short text topic modeling techniques, applications, and performance: A survey. IEEE Transactions on Knowledge and Data Engineering.Google ScholarCross Ref
Xiaojun Quan, Chunyu Kit, Yong Ge, and Sinno Jialin Pan. 2015. Short and sparse text topic modeling via self-aggregation. In Proceedings of the 24th International Joint Conference on Artificial Intelligence.Google ScholarDigital Library
Daniel Ramage, Susan Dumais, and Dan Liebling. 2010. Characterizing microblogs with topic models. In Proceedings of the 4th International AAAI Conference on Weblogs and Social Media.Google ScholarCross Ref
Daniel Ramage, David Hall, Ramesh Nallapati, and Christopher D. Manning. 2009. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 248–256.Google ScholarDigital Library
Daniel Ramage, Christopher D. Manning, and Susan Dumais. 2011. Partially labeled topic models for interpretable text mining. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 457–465. DOI:https://doi.org/10.1145/2020408.2020481Google ScholarDigital Library
Radim Řehůřek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC Workshop on New Challenges for NLP Frameworks. ELRA, 45–50.Google Scholar
Joseph Reisinger, Austin Waters, Bryan Silverthorn, and Raymond J. Mooney. 2010. Spherical topic models. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). 903–910. DOI:https://doi.org/10.1007/s10955-009-9892-0Google Scholar
Yafeng Ren, Ruimin Wang, and Donghong Ji. 2016. A topic-enhanced word embedding for Twitter sentiment classification. Inf. Sci. 369 (2016), 188–198.Google ScholarDigital Library
Philip Resnik and Eric Hardisty. 2010. Gibbs sampling for the uninitiated. Maryland Univ College Park Inst for Advanced Computer Studies.Google Scholar
Kirk Roberts, Michael A. Roach, Joseph Johnson, Josh Guthrie, and Sanda M. Harabagiu. 2012. EmpaTweet: Annotating and detecting emotions on Twitter. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12). Citeseer, 3806–3813.Google Scholar
Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, and Padhraic Smyth. 2004. The author-topic model for authors and documents. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. AUAI Press, 487–494. DOI:https://doi.org/10.1016/j.nima.2010.11.062Google ScholarDigital Library
Karim Sayadi, Quang Vu Bui, and Marc Bui. 2016. Distributed implementation of the latent Dirichlet allocation on Spark. In Proceedings of the 7th Symposium on Information and Communication Technology. ACM, 92–98.Google ScholarDigital Library
Alexandra Schofield, Måns Magnusson, and David Mimno. 2017. Pulling out the stops: Rethinking stopword removal for topic models. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. 432–436.Google ScholarCross Ref
Karthick Seshadri, S. Mercy Shalinie, and Chidambaram Kollengode. 2015. Design and evaluation of a parallel algorithm for inferring topic hierarchies. Inf. Proc. Manag. 51, 5 (2015), 662–676. DOI:https://doi.org/10.1016/j.ipm.2015.06.006Google ScholarDigital Library
Carson Sievert and Kenneth Shirley. 2014. LDAvis: A method for visualizing and interpreting topics. In Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces. 63–70.Google ScholarCross Ref
Bradley Skaggs and Lise Getoor. 2014. Topic modeling for Wikipedia link disambiguation. ACM Trans. Inf. Syst. 32, 3 (2014), 10.Google ScholarDigital Library
Alison Smith, Jason Chuang, Yuening Hu, Jordan Boyd-Graber, and Leah Findlater. 2014. Concurrent visualization of relationships between words and topics in topic models. In Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces. 79–82.Google ScholarCross Ref
Alexander Smola and Shravan Narayanamurthy. 2010. An architecture for parallel topic models. Proc. VLDB Endow. 3, 1-2 (2010), 703–710.Google ScholarDigital Library
Padhraic Smyth, Max Welling, and Arthur U. Asuncion. 2009. Asynchronous distributed learning of topic models. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 81–88.Google Scholar
Mark Steyvers and Tom Griffiths. 2007. Probabilistic topic models. Handb. Latent Semant. Anal. 427, 7 (2007), 424–440.Google Scholar
Xiaobing Sun, Bixin Li, Hareton Leung, Bin Li, and Yun Li. 2015. MSR4SM: Using topic models to effectively mining software repositories for software maintenance tasks. Inf. Softw. Technol. 66 (2015), 1–12. DOI:https://doi.org/10.1016/j.infsof.2015.05.003Google ScholarDigital Library
Yee W. Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. 2005. Sharing clusters among related groups: Hierarchical Dirichlet processes. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 1385–1392.Google Scholar
Rajeev Thakur, Rolf Rabenseifner, and William Gropp. 2005. Optimization of collective communication operations in MPICH. Int. J. High Perf. Comput. Applic. 19, 1 (2005), 49–66.Google ScholarDigital Library
Stephen W. Thomas, Bram Adams, Ahmed E. Hassan, and Dorothea Blostein. 2014. Studying software evolution using topic models. Sci. Comput. Prog. 80 (2014), 457–479. DOI:https://doi.org/10.1016/j.scico.2012.08.003Google ScholarCross Ref
Kai Tian, Meghan Revelle, and Denys Poshyvanyk. 2009. Using latent Dirichlet allocation for automatic categorization of software. In Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories. IEEE, 163–166.Google ScholarDigital Library
Zhongyuan Tian, Harumichi Yokoyama, and Takuya Araki. 2019. Parallel latent Dirichlet allocation using vector processors. In Proceedings of the IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, 1548–1555.Google Scholar
Calin Rares Turliuc, Luke Dickens, Alessandra Russo, and Krysia Broda. 2016. Probabilistic abductive logic programming using Dirichlet priors. Int. J. Approx. Reas. 78 (2016), 223–240. DOI:https://doi.org/10.1016/j.ijar.2016.07.001Google ScholarDigital Library
Duc-Thuan Vo and Cheol-Young Ock. 2015. Learning to classify short text from scientific documents using topic models with various types of knowledge. Exp. Syst. Applic. 42, 3 (2015), 1684–1698. DOI:https://doi.org/10.1016/j.eswa.2014.09.031Google ScholarDigital Library
Konstantin Vorontsov, Oleksandr Frei, Murat Apishev, Peter Romov, and Marina Dudarenko. 2015. BigARTM: Open source library for regularized multimodal topic modeling of large collections. In Proceedings of the International Conference on Analysis of Images, Social Networks and Texts. Springer, 370–381.Google ScholarCross Ref
Konstantin Vorontsov and Anna Potapenko. 2015. Additive regularization of topic models. Mach. Learn. 101, 1–3 (2015), 303–323.Google ScholarDigital Library
Nicholas Vretos, Nikos Nikolaidis, and Ioannis Pitas. 2012. Video fingerprinting using latent Dirichlet allocation and facial images. Pattern Recog. 45, 7 (2012), 2489–2498. DOI:https://doi.org/10.1016/j.patcog.2011.12.022Google ScholarDigital Library
Ivan Vulić, Wim De Smet, and Marie-Francine Moens. 2013. Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora. Inf. Retr. 16, 3 (2013), 331–368. DOI:https://doi.org/10.1007/s10791-012-9200-5Google ScholarDigital Library
Ivan Vulić, Wim De Smet, Jie Tang, and Marie-Francine Moens. 2015. Probabilistic topic modeling in multilingual settings: An overview of its methodology and applications. Inf. Proc. Manag. 51, 1 (2015), 111–147. DOI:https://doi.org/10.1016/j.ipm.2014.08.003Google ScholarCross Ref
Martin J. Wainwright, Michael I. Jordan et al. 2008. Graphical models, exponential families, and variational inference. Found. Trends® Mach. Learn. 1, 1–2 (2008), 1–305.Google ScholarDigital Library
Hanna M Wallach. 2006. Topic modeling: Beyond bag-of-words. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 977–984.Google ScholarDigital Library
Hanna M. Wallach, Iain Murray, Ruslan Salakhutdinov, and David Mimno. 2009. Evaluation methods for topic models. In Proceedings of the 26th International Conference on Machine Learning.1105–1112.Google ScholarDigital Library
Chong Wang, David Blei, and David Heckerman. 2012. Continuous time dynamic topic models. arXiv preprint arXiv:1206.3298 (2012).Google Scholar
Di Wang and Ahmad Al-Rubaie. 2015. Incremental learning with partial-supervision based on hierarchical Dirichlet process and the application for document classification. Appl. Soft Comput. 33 (2015), 250–262. DOI:https://doi.org/10.1016/j.asoc.2015.04.044Google ScholarDigital Library
Jin Wang, Xiangping Sun, Mary F. H. She, Abbas Kouzani, and Saeid Nahavandi. 2013. Unsupervised mining of long time series based on latent topic model. Neurocomputing 103 (2013), 93–103. DOI:https://doi.org/10.1016/j.neucom.2012.09.008Google ScholarDigital Library
Xuerui Wang and Andrew McCallum. 2006. Topics over time: A non-Markov continuous-time model of topical trends. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 424–433.Google ScholarDigital Library
Xuerui Wang, Andrew McCallum, and Xing Wei. 2007. Topical n-grams: Phrase and topic discovery, with an application to information retrieval. In Proceedings of the 7th IEEE International Conference on Data Mining (ICDM’07). IEEE, 697–702.Google ScholarDigital Library
Xiang Wang, Kai Zhang, Xiaoming Jin, and Dou Shen. 2009. Mining common topics from multiple asynchronous text streams. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining. ACM, 192–201.Google ScholarDigital Library
Yi Wang, Hongjie Bai, Matt Stanton, Wen-Yen Chen, and Edward Y. Chang. 2009. PLDA: Parallel latent Dirichlet allocation for large-scale applications. In Proceedings of the International Conference on Algorithmic Applications in Management. 301–314. DOI:https://doi.org/10.1007/978-3-642-02158-9_26Google ScholarDigital Library
Yu Wang, Jiebo Luo, Richard Niemi, Yuncheng Li, and Tianran Hu. 2016. Catching fire via “likes”: Inferring topic preferences of Trump followers on Twitter. In Proceedings of the 10th International AAAI Conference on Web and Social Media.Google Scholar
Yi Wang, Xuemin Zhao, Zhenlong Sun, Hao Yan, Lifeng Wang, Zhihui Jin, Liubin Wang, Yang Gao, Jia Zeng, Qiang Yang et al. 2014. Towards topic modeling for big data. arXiv preprint arXiv:1405.4402 (2014).Google Scholar
Lino Wehrheim. 2019. Economic history goes digital: Topic modeling the journal of economic history. Cliometrica 13, 1 (2019), 83–125.Google ScholarCross Ref
Jianshu Weng, Ee-Peng Lim, Jing Jiang, and Qi He. 2010. Twitterrank: Finding topic-sensitive influential Twitterers. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. ACM, 261–270.Google ScholarDigital Library
Erik Wiener, Jan O. Pedersen, Andreas S. Weigend, et al. 1995. A neural network approach to topic spotting. In Proceedings of the 4th Symposium on Document Analysis and Information Retrieval.Google Scholar
Andrew T. Wilson and Peter A. Chew. 2010. Term weighting schemes for latent Dirichlet allocation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. 465–473. DOI:https://doi.org/1857999.1858069Google Scholar
Yueshen Xu, Jianwei Yin, Jianbin Huang, and Yuyu Yin. 2018. Hierarchical topic modeling with automatic knowledge mining. Exp. Syst. Applic. 103 (2018), 106–117.Google ScholarCross Ref
Yueshen Xu, Yuyu Yin, and Jianwei Yin. 2017. Tackling topic general words in topic modeling. Eng. Applic. Artif. Intell. 62 (2017), 124–133.Google ScholarDigital Library
Guangxu Xun, Yaliang Li, Wayne Xin Zhao, Jing Gao, and Aidong Zhang. 2017. A correlated topic model using word embeddings. In Proceedings of the International Joint Conference on Artificial Intelligence. 4207–4213.Google ScholarCross Ref
Feng Yan, Ningyi Xu, and Yuan Qi. 2009. Parallel inference for latent Dirichlet allocation on graphics processing units. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 2134–2142.Google Scholar
Shuang Yang, Chunfeng Yuan, Weiming Hu, and Xinmiao Ding. 2014. A hierarchical model based on latent Dirichlet allocation for action recognition. In Proceedings of the 22nd International Conference on Pattern Recognition. IEEE, 2613–2618. DOI:https://doi.org/10.1109/ICPR.2014.451Google ScholarDigital Library
Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik. 2019. A multilingual topic model for learning weighted topic links across corpora with low comparability. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 1243–1248.Google ScholarCross Ref
Yi Yang, Doug Downey, and Jordan Boyd-Graber. 2015. Efficient methods for incorporating knowledge into topic models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 308–317.Google ScholarCross Ref
Limin Yao, David Mimno, and Andrew McCallum. 2009. Efficient methods for topic model inference on streaming document collections. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 937–946. DOI:https://doi.org/10.1145/1557019.1557121Google ScholarDigital Library
Liang Yao, Yin Zhang, Baogang Wei, Lei Li, Fei Wu, Peng Zhang, and Yali Bian. 2016. Concept over time: the combination of probabilistic topic model with wikipedia knowledge. Exp. Syst. Applic. 60 (2016), 27–38.Google ScholarDigital Library
Chyi-Kwei Yau, Alan Porter, Nils Newman, and Arho Suominen. 2014. Clustering scientific documents with topic modeling. Scientometrics 100, 3 (2014), 767–786.Google ScholarDigital Library
Hsiang-Fu Yu, Cho-Jui Hsieh, Hyokun Yun, S. V. N. Vishwanathan, and Inderjit S. Dhillon. 2015. A scalable asynchronous distributed algorithm for topic modeling. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1340–1350.Google Scholar
Bo Yuan, Xinbo Gao, Zhenxing Niu, and Qi Tian. 2019. Discovering latent topics by Gaussian latent Dirichlet allocation and spectral clustering. ACM Trans. Multimedia Comput. Commun. Applic. 15, 1 (2019), 25.Google ScholarDigital Library
Lele Yut, Ce Zhang, Yingxia Shao, and Bin Cui. 2017. LDA* a robust and large-scale topic modeling system. Proc. VLDB Endow. 10, 11 (2017), 1406–1417.Google ScholarDigital Library
Manzil Zaheer, Amr Ahmed, and Alexander J. Smola. 2017. Latent LSTM allocation joint clustering and non-linear dynamic modeling of sequential data. In Proceedings of the 34th International Conference on Machine Learning. JMLR.org, 3967–3976.Google Scholar
Jianping Zeng, Jiangjiao Duan, Wenjun Cao, and Chengrong Wu. 2012. Topics modeling based on selective Zipf distribution. Exp. Syst. Applic. 39, 7 (2012), 6541–6546. DOI:https://doi.org/10.1016/j.eswa.2011.12.051Google ScholarDigital Library
Ke Zhai and Jordan Boyd-Graber. 2013. Online latent Dirichlet allocation with infinite vocabulary. In Proceedings of the International Conference on Machine Learning. 561–569.Google Scholar
Ke Zhai, Jordan Boyd-Graber, Nima Asadi, and Mohamad L. Alkhouja. 2012. Mr. LDA: A flexible large scale topic modeling package using variational inference in MapReduce. In Proceedings of the 21st International Conference on World Wide Web. ACM, 879–888. DOI:https://doi.org/10.1145/2187836.2187955Google ScholarDigital Library
Jianwen Zhang, Yangqiu Song, Changshui Zhang, and Shixia Liu. 2010. Evolutionary hierarchical Dirichlet processes for multiple correlated time-varying corpora. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1079–1088.Google ScholarDigital Library
Tao Zhang, Kang Liu, Jun Zhao, et al. 2013. Cross lingual entity linking with bilingual topic model.Proceedings of the International Joint Conference on Artificial Intelligence. 2218–2224.Google Scholar
Bing Zhao and Eric P. Xing. 2006. BiTAM: Bilingual topic admixture models for word alignment. In Proceedings of the COLING/ACL on Main Conference Poster Sessions. Association for Computational Linguistics, 969–976.Google Scholar
Bing Zhao and Eric P. Xing. 2007. HM-BiTAM: Bilingual topic exploration, word alignment, and translation. Advances in Neural Information Processing Systems 20 (2007), 1689–1696.Google Scholar
Feng Zhao, Yajun Zhu, Hai Jin, and Laurence T. Yang. 2016. A personalized hashtag recommendation approach using LDA-based topic model in microblog environment. Fut. Gen. Comput. Syst. 65 (2016), 196–206.Google ScholarDigital Library
Huasha Zhao, Biye Jiang, John F. Canny, and Bobby Jaros. 2015. Same but different: Fast and high quality Gibbs parameter estimation. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1495–1502.Google ScholarDigital Library
Wenjun Zhu, Liqing Zhang, and Qianwei Bian. 2012. A hierarchical latent topic model based on sparse coding. Neurocomputing 76, 1 (2012), 28–35. DOI:https://doi.org/10.1016/j.neucom.2010.11.038Google ScholarDigital Library
Elaine Zosa and Mark Granroth-Wilding. 2019. Multilingual dynamic topic model. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP’19). 1388–1396.Google ScholarCross Ref
Jialing Zou, Qixiang Ye, Yanting Cui, Fang Wan, Kun Fu, and Jianbin Jiao. 2016. Collective motion pattern inference via locally consistent latent Dirichlet allocation. Neurocomputing 184 (2016), 221–231. DOI:https://doi.org/10.1016/j.neucom.2015.08.108Google ScholarDigital Library
Yuan Zuo, Junjie Wu, Hui Zhang, Hao Lin, Fei Wang, Ke Xu, and Hui Xiong. 2016. Topic modeling of short texts: A pseudo-document view. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2105–2114.Google ScholarDigital Library

Index Terms

Recommendations

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Topic modeling is one of the most powerful techniques in text mining for data mining, latent data discovery, and finding relationships among data and text documents. Researchers have published many articles in the field of topic modeling and applied in ...
Read More
Latent dirichlet allocation based multi-document summarization
AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text data

Extraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. In this article we use Latent Dirichlet Allocation to capture the events being ...
Read More
Obtaining single document summaries using latent dirichlet allocation
ICONIP'12: Proceedings of the 19th international conference on Neural Information Processing - Volume Part IV

In this paper, we present a novel approach that makes use of topic models based on Latent Dirichlet allocation(LDA) for generating single document summaries. Our approach is distinguished from other LDA based approaches in that we identify the summary ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 54, Issue 7
September 2022
778 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3476825
Editor:
Albert Zomaya
University of Sydney, Australia
Issue’s Table of Contents
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 September 2021
- Accepted: 1 April 2021
- Revised: 1 March 2021
- Received: 1 April 2020
Published in csur Volume 54, Issue 7

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Topic modeling
gibbs sampling
latent dirichlet allocation
probabilistic model
statistical inference
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 33
  Total Citations
  View Citations
- 2,966
  Total Downloads
- Downloads (Last 12 months)1,201
- Downloads (Last 6 weeks)201
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Topic Modeling Using Latent Dirichlet allocation: A Survey

ACM Computing Surveys

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Latent dirichlet allocation based multi-document summarization

Obtaining single document summaries using latent dirichlet allocation