skip to main content
research-article

Topic Modeling Using Latent Dirichlet allocation: A Survey

Authors Info & Claims
Published:17 September 2021Publication History
Skip Abstract Section

Abstract

We are not able to deal with a mammoth text corpus without summarizing them into a relatively small subset. A computational tool is extremely needed to understand such a gigantic pool of text. Probabilistic Topic Modeling discovers and explains the enormous collection of documents by reducing them in a topical subspace. In this work, we study the background and advancement of topic modeling techniques. We first introduce the preliminaries of the topic modeling techniques and review its extensions and variations, such as topic modeling over various domains, hierarchical topic modeling, word embedded topic models, and topic models in multilingual perspectives. Besides, the research work for topic modeling in a distributed environment, topic visualization approaches also have been explored. We also covered the implementation and evaluation techniques for topic models in brief. Comparison matrices have been shown over the experimental results of the various categories of topic modeling. Diverse technical challenges and future directions have been discussed.

Skip Supplemental Material Section

Supplemental Material

References

  1. Nikolaos Aletras and Mark Stevenson. 2013. Evaluating topic coherence using distributional semantics. In Proceedings of the 10th International Conference on Computational Semantics. 13–22.Google ScholarGoogle Scholar
  2. Rubayyi Alghamdi and Khalid Alfalqi. 2015. A survey of topic modeling in text mining. Int. J. Adv. Comput. Sci. Appl. 6, 1 (2015).Google ScholarGoogle Scholar
  3. Loulwah AlSumait, Daniel Barbará, and Carlotta Domeniconi. 2008. On-line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking. In Proceedings of the 8th IEEE International Conference on Data Mining. IEEE, 3–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Arthur Asuncion, Max Welling, Padhraic Smyth, and Yee Whye Teh. 2009. On smoothing and inference for topic models. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence. 27–34.Google ScholarGoogle Scholar
  5. Hazeline U. Asuncion, Arthur U. Asuncion, and Richard N. Taylor. 2010. Software traceability with topic modeling. In Proceedings of the ACM/IEEE 32nd International Conference on Software Engineering, Vol. 1. IEEE, 95–104.Google ScholarGoogle Scholar
  6. D. K. JinYeong Bak and A. Oh. 2012. Distributed online learning for latent Dirichlet allocation. In Proceedings of the NIPS Workshop on Big Learning. 1–8.Google ScholarGoogle Scholar
  7. Parantapa Bhattacharya, Muhammad Bilal Zafar, Niloy Ganguly, Saptarshi Ghosh, and Krishna P. Gummadi. 2014. Inferring user interests in the Twitter social network. In Proceedings of the 8th ACM Conference on Recommender Systems. ACM, 357–360.Google ScholarGoogle Scholar
  8. David M. Blei. 2012. Probabilistic topic models. Commun. ACM 55, 4 (2012), 77–84. DOI:https://doi.org/doi:10.1145/2133806.2133826Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. David M. Blei, Thomas L. Griffiths, and Michael I. Jordan. 2010. The nested chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. J. ACM 57, 2 (2010), 7. DOI:https://doi.org/10.1145/1667053.1667056Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. David M. Blei and John D. Lafferty. 2006. Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 113–120. DOI:https://doi.org/10.1145/1143844.1143859Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. David M. Blei and John D. Lafferty. 2007. A correlated topic model of science. Ann. Appl. Statist. (2007), 17–35.Google ScholarGoogle Scholar
  12. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, Jan. (2003), 993–1022. DOI:https://doi.org/10.1162/jmlr.2003.3.4-5.993Google ScholarGoogle Scholar
  13. Jordan Boyd-Graber, David Mimno, and David Newman. 2014. Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements. Vol. 225255. CRC Press, Boca Raton, FL.Google ScholarGoogle Scholar
  14. Samuel Brody and Mirella Lapata. 2009. Bayesian word sense induction. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 103–111.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Stefan Bunk and Ralf Krestel. 2018. WELDA: Enhancing topic models by incorporating local word context. In Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries. 293–302.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. George Casella and Edward I. George. 1992. Explaining the Gibbs sampler. Amer. Statist. 46, 3 (1992), 167–174.Google ScholarGoogle Scholar
  17. Jonathan Chang. 2012. Collapsed Gibbs sampling methods for topic models. R package: lda (version 1.3.2). http://cran.r-project.org/web/packages/lda/index.html.Google ScholarGoogle Scholar
  18. Jonathan Chang and David Blei. 2009. Relational topic models for document networks. In Artificial Intelligence and Statistics. PMLR, 81–88.Google ScholarGoogle Scholar
  19. Ying-Lang Chang and Jen-Tzung Chien. 2009. Latent Dirichlet learning for document summarization. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 1689–1692. DOI:https://doi.org/10.1109/ICASSP.2009.4959927Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Tse-Hsun Chen, Weiyi Shang, Meiyappan Nagappan, Ahmed E. Hassan, and Stephen W. Thomas. 2017. Topic-based software defect explanation. J. Syst. Softw. 129 (2017), 79–106. DOI:https://doi.org/10.1016/j.jss.2016.05.015Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Xueqi Cheng, Xiaohui Yan, Yanyan Lan, and Jiafeng Guo. 2014. BTM: Topic modeling over short texts. IEEE Trans. Knowl. Data Eng. 26, 12 (2014), 2928–2941. DOI:https://doi.org/10.1109/TKDE.2014.2313872Google ScholarGoogle ScholarCross RefCross Ref
  22. Jason Chuang, Christopher D. Manning, and Jeffrey Heer. 2012. Termite: Visualization techniques for assessing textual topic models. In Proceedings of the International Working Conference on Advanced Visual Interfaces. ACM, 74–77. DOI:https://doi.org/10.1145/2254556.2254572Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Raphael Cohen, Iddo Aviram, Michael Elhadad, and Noémie Elhadad. 2014. Redundancy-aware topic modeling for patient record notes. PloS One 9, 2 (2014), e87555. DOI:https://doi.org/10.1371/journal.pone.0087555Google ScholarGoogle ScholarCross RefCross Ref
  24. Mário Cordeiro. 2012. Twitter event detection: Combining wavelet analysis and topic inference summarization. In Doctoral Symposium on Informatics Engineering. 11–16.Google ScholarGoogle Scholar
  25. Christopher S. Corley, Kostadin Damevski, and Nicholas A. Kraft. 2020. Changeset-based topic modeling of software repositories. IEEE Trans. Softw. Eng. 46, 10 (2020), 1068–1080. DOI:10.1109/TSE.2018.2874960Google ScholarGoogle ScholarCross RefCross Ref
  26. Rajarshi Das, Manzil Zaheer, and Chris Dyer. 2015. Gaussian LDA for topic models with word embeddings. In Proceedings of the Meeting of the Association for Computational Linguistics. 795–804.Google ScholarGoogle ScholarCross RefCross Ref
  27. Ali Daud, Juanzi Li, Lizhu Zhou, and Faqir Muhammad. 2010. Knowledge discovery through directed probabilistic topic models: A survey. Front. Comput. Sci. China 4, 2 (2010), 280–301.Google ScholarGoogle ScholarCross RefCross Ref
  28. Wim De Smet and Marie-Francine Moens. 2009. Cross-language linking of news stories on the web using interlingual topic modelling. In Proceedings of the 2nd ACM Workshop on Social Web Search and Mining. ACM, 57–64.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107–113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Stefan Debortoli, Oliver Müller, Iris Junglas, and Jan vom Brocke. 2016. Text mining for information systems researchers: An annotated topic modeling tutorial. Commun. Assoc. Inf. Syst. 39, 1 (2016), 7.Google ScholarGoogle ScholarCross RefCross Ref
  31. Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Inf. Sci. 41, 6 (1990), 391. DOI:https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9Google ScholarGoogle ScholarCross RefCross Ref
  32. Mohamed Dermouche, Julien Velcin, Leila Khouas, and Sabine Loudcher. 2014. A joint model for topic-sentiment evolution over time. In Proceedings of the IEEE International Conference on Data Mining (ICDM’14). IEEE, 773–778. DOI:https://doi.org/10.1109/ICDM.2014.82Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Adji B. Dieng, Francisco J. R. Ruiz, and David M. Blei. 2019. The dynamic embedded topic model. arXiv preprint arXiv:1907.05545 (2019).Google ScholarGoogle Scholar
  34. Adji B. Dieng, Francisco J. R. Ruiz, and David M. Blei. 2020. Topic modeling in embedding spaces. Trans. Assoc. Comput. Ling. 8 (2020), 439–453.Google ScholarGoogle ScholarCross RefCross Ref
  35. Tarek Elguebaly and Nizar Bouguila. 2013. Simultaneous Bayesian clustering and feature selection using RJMCMC-based learning of finite generalized Dirichlet mixture models. Sig. Process. 93, 6 (2013), 1531–1546.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Katayoun Farrahi and Daniel Gatica-Perez. 2011. Discovering routines from large-scale human locations using probabilistic topic models. ACM Trans. Intell. Syst. Technol. 2, 1 (2011), 3.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Xianghua Fu, Kun Yang, Joshua Zhexue Huang, and Laizhong Cui. 2015. Dynamic non-parametric joint sentiment topic mixture model. Knowl.-based Syst. 82 (2015), 102–114.Google ScholarGoogle Scholar
  38. Debasis Ganguly, Manisha Ganguly, Johannes Leveling, and Gareth J. F. Jones. 2013. TopicVis: A GUI for topic-based feedback and navigation. DOI:https://doi.org/10.1145/2484028.2484202Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Debasis Ganguly, Johannes Leveling, and Gareth J. F. Jones. 2012. Cross-lingual topical relevance models. DOI:https://doi.org/10.1145/564405.564408Google ScholarGoogle Scholar
  40. Brynjar Gretarsson, John O’Donovan, Svetlin Bostandjiev, Tobias Höllerer, Arthur Asuncion, David Newman, and Padhraic Smyth. 2012. Topicnets: Visual analysis of large text corpora with topic modeling. ACM Trans. Intell. Syst. Technol. 3, 2 (2012), 23. DOI:https://doi.org/10.1126/science.1178206Google ScholarGoogle ScholarCross RefCross Ref
  41. Tom Griffiths. 2002. Gibbs sampling in the generative model of latent Dirichlet allocation. DOI:https://doi.org/10.1145/1401890.1401960Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Thomas L. Griffiths and Mark Steyvers. 2004. Finding scientific topics. Proc. Nat. Acad. Sci. 101, suppl 1 (2004), 5228–5235. DOI:https://doi.org/10.1073/pnas.0307752101Google ScholarGoogle ScholarCross RefCross Ref
  43. Loni Hagen. 2018. Content analysis of e-petitions with topic modeling: How to train and evaluate LDA models?Inf. Proc. Manag. 54, 6 (2018), 1292–1307.Google ScholarGoogle ScholarCross RefCross Ref
  44. Aria Haghighi and Lucy Vanderwende. 2009. Exploring content models for multi-document summarization. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 362–370.Google ScholarGoogle ScholarCross RefCross Ref
  45. Xingwei He, Hua Xu, Jia Li, Liu He, and Linlin Yu. 2017. FastBTM: Reducing the sampling time for biterm topic model. Knowl.-Based Syst 132 (2017), 11–20.Google ScholarGoogle ScholarCross RefCross Ref
  46. Gregor Heinrich. 2008. Parameter Estimation for Text Analysis. Technical Report. University of Leipzig. 1–32.Google ScholarGoogle Scholar
  47. Go Eun Heo, Keun Young Kang, Min Song, and Jeong-Hoon Lee. 2017. Analyzing the field of bioinformatics with the multi-faceted topic modeling technique. BMC Bioinf 18, 7 (2017), 251.Google ScholarGoogle ScholarCross RefCross Ref
  48. Matthew Hoffman, Francis R. Bach, and David M. Blei. 2010. Online learning for latent Dirichlet allocation. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 856–864. DOI:https://doi.org/10.1.1.187.1883Google ScholarGoogle Scholar
  49. Thomas Hofmann. 1999. Probabilistic latent semantic analysis. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., 289–296. DOI:https://doi.org/10.1162/jmlr.2003.3.4-5.993Google ScholarGoogle Scholar
  50. Thomas Hofmann. 2001. Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42, 1 (2001), 177–196.Google ScholarGoogle ScholarCross RefCross Ref
  51. Liangjie Hong, Ovidiu Dan, and Brian D. Davison. 2011. Predicting popular messages in Twitter. In Proceedings of the 20th International Conference Companion on World Wide Web. ACM, 57–58.Google ScholarGoogle Scholar
  52. Pengfei Hu, Wenju Liu, Wei Jiang, and Zhanlei Yang. 2014. Latent topic model for audio retrieval. Pattern Recog. 47, 3 (2014), 1138–1143. DOI:https://doi.org/10.1016/j.patcog.2013.06.010Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, and Alison Smith. 2014. Interactive topic modeling. Mach. Learn. 95, 3 (2014), 423–469.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Dongping Huang, Shuyu Hu, Yi Cai, and Huaqing Min. 2014. Discovering event evolution graphs based on news articles relationships. In Proceedings of the IEEE 11th International Conference on e-Business Engineering (ICEBE’14). IEEE, 246–251.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Hamed Jelodar, Yongli Wang, Chi Yuan, Xia Feng, Xiahui Jiang, Yanchao Li, and Liang Zhao. 2019. Latent Dirichlet allocation (LDA) and topic modeling: Models, applications, a survey. Multimedia Tools. Applic. 78, 11 (2019), 15169–15211.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Do-Heon Jeong and Min Song. 2014. Time gap analysis by the topic model-based temporal technique. J. Informet. 8, 3 (2014), 776–790. DOI:https://doi.org/10.1016/j.joi.2014.07.005Google ScholarGoogle ScholarCross RefCross Ref
  57. Di Jiang, Yongxin Tong, and Yuanfeng Song. 2016. Cross-lingual topic discovery from multilingual search engine query log. ACM Trans. Inf. Syst. 35, 2 (2016), 9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Efsun Sarioglu Kayi, Kabir Yadav, James M. Chamberlain, and Hyeong-Ah Choi. 2017. Topic modeling for classification of clinical reports. arXiv preprint arXiv:1706.06177 (2017).Google ScholarGoogle Scholar
  59. Muhammad Taimoor Khan, Mehr Durrani, Shehzad Khalid, and Furqan Aziz. 2016. Online knowledge-based model for big data topic extraction. Comput. Intell. Neurosci. DOI:https://doi.org/10.1155/2016/6081804Google ScholarGoogle Scholar
  60. Milad Kharratzadeh, Benjamin Renard, and Mark J. Coates. 2015. Bayesian topic model approaches to online and time-dependent clustering. Dig. Sig. Process. 47 (2015), 25–35. DOI:https://doi.org/10.1016/j.dsp.2015.03.010Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Dongwoo Kim and Alice Oh. 2011. Accounting for data dependencies within a hierarchical Dirichlet process mixture model. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM, 873–878.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Dongwoo Kim and Alice Oh. 2011. Topic chains for understanding a news corpus. Comput. Ling. Intell. Text Process.. DOI:https://doi.org/10.1007/978-3-642-19437-5_13Google ScholarGoogle Scholar
  63. Dongwoo Kim and Alice Oh. 2014. Hierarchical Dirichlet scaling process. In Proceedings of the International Conference on Machine Learning. 973–981.Google ScholarGoogle Scholar
  64. Joon Hee Kim, Dongwoo Kim, Suin Kim, and Alice Oh. 2012. Modeling topic hierarchies with the recursive Chinese restaurant process. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 783–792. DOI:https://doi.org/10.1145/2396761.2396861Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Younghoon Kim and Kyuseok Shim. 2014. TWILITE: A recommendation system for Twitter using a probabilistic model based on latent Dirichlet allocation. Inf. Syst. 42 (2014), 59–77. DOI:https://doi.org/10.1016/j.is.2013.11.003Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques. The MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Julian F. P. Kooij, Gwenn Englebienne, and Dariu M. Gavrila. 2015. Identifying multiple objects from their appearance in inaccurate detections. Comput. Vis. Image Underst. 136 (2015), 103–116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Guy Lansley and Paul A. Longley. 2016. The geography of Twitter topics in London. Comput. Environ. Urb. Syst. 58 (2016), 85–96. DOI:https://doi.org/10.1016/j.compenvurbsys.2016.04.002Google ScholarGoogle ScholarCross RefCross Ref
  69. Jey Han Lau and Timothy Baldwin. 2016. The sensitivity of topic coherence evaluation to topic cardinality. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.483–487.Google ScholarGoogle ScholarCross RefCross Ref
  70. Jey Han Lau, David Newman, and Timothy Baldwin. 2014. Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality.Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics.530–539.Google ScholarGoogle Scholar
  71. Jure Leskovec, Lars Backstrom, and Jon Kleinberg. 2009. Meme-tracking and the dynamics of the news cycle. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 497–506.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Chenliang Li, Yu Duan, Haoran Wang, Zhiqian Zhang, Aixin Sun, and Zongyang Ma. 2017. Enhancing topic modeling for short texts with auxiliary word embeddings. ACM Trans. Inf. Syst. 36, 2 (2017), 11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Chenliang Li, Haoran Wang, Zhiqian Zhang, Aixin Sun, and Zongyang Ma. 2016. Topic modeling for short texts with auxiliary word embeddings. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 165–174.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Weifeng Li, Junming Yin, and Hsinchsun Chen. 2017. Supervised topic modeling using hierarchical Dirichlet process-based inverse regression: Experiments on e-commerce applications. IEEE Trans. Knowl. Data Eng. 30, 6 (2017), 1192–1205.Google ScholarGoogle ScholarCross RefCross Ref
  75. Tianyi Lin, Wentao Tian, Qiaozhu Mei, and Hong Cheng. 2014. The dual-sparse topic model: Mining focused topics and focused terms in short text. In Proceedings of the 23rd International Conference on World Wide Web. 539–550.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Erik Linstead, Paul Rigor, Sushil Bajracharya, Cristina Lopes, and Pierre Baldi. 2007. Mining concepts from code with probabilistic topic models. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering. ACM, 461–464.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Jun S. Liu. 1994. The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. J. Amer. Statist. Assoc. 89, 427 (1994), 958–966.Google ScholarGoogle ScholarCross RefCross Ref
  78. Shuhua Liu and Patrick Jansson. 2017. Topic Modelling Analysis of Instagram Data for the Greater Helsinki Region.Google ScholarGoogle Scholar
  79. Xiaodong Liu, Kevin Duh, and Yuji Matsumoto. 2015. Multilingual topic models for bilingual dictionary extraction. ACM Trans. Asian Low-resour. Lang. Inf. Process. 14, 3 (2015), 11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Xiao Liu, Mingli Song, Qi Zhao, Dacheng Tao, Chun Chen, and Jiajun Bu. 2012. Attribute-restricted latent topic model for person re-identification. Pattern Recog. 45, 12 (2012), 4204–4213.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Zhiyuan Liu, Yuzhou Zhang, Edward Y. Chang, and Maosong Sun. 2011. PLDA+: Parallel latent Dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol. 2, 3 (2011), 26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Kun Lu and Dietmar Wolfram. 2012. Measuring author research relatedness: A comparison of word-based, topic-based, and author cocitation approaches. J. Amer. Soc. Inf. Sci. Technol. 63, 10 (2012), 1973–1986.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Zhiwu Lu and Yuxin Peng. 2013. Latent semantic learning with structured sparse representation for human action recognition. Pattern Recog. 46, 7 (2013), 1799–1809. DOI:https://doi.org/10.1016/j.patcog.2012.09.027Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Stacy K. Lukins, Nicholas A. Kraft, and Letha H. Etzkorn. 2010. Bug localization using latent Dirichlet allocation. Inf. Softw. Technol. 52, 9 (2010), 972–990.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Minnan Luo, Feiping Nie, Xiaojun Chang, Yi Yang, Alexander Hauptmann, and Qinghua Zheng. 2017. Probabilistic non-negative matrix factorization and its robust extensions for topic modeling. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  86. Baizhang Ma, Dongsong Zhang, Zhijun Yan, and Taeha Kim. 2013. An LDA and synonym lexicon based approach to product feature extraction from online consumer product reviews. J. Electron. Commer. Res. 14, 4 (2013), 304. DOI:https://doi.org/10.1016/j.im.2015.02.002Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Hui-Fang Ma. 2011. Hot topic extraction using time window. In Proceedings of the International Conference on Machine Learning and Cybernetics (ICMLC’11). IEEE, 56–60.Google ScholarGoogle ScholarCross RefCross Ref
  88. Masoud Makrehchi. 2011. Social link recommendation by learning hidden topics. In Proceedings of the 5th ACM Conference on Recommender Systems. ACM, 189–196.Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela H. Byers. 2011. Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute.Google ScholarGoogle Scholar
  90. Jon D. Mcauliffe and David M. Blei. 2008. Supervised topic models. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 121–128.Google ScholarGoogle Scholar
  91. Andrew Kachites McCallum. 2002. MALLET: A Machine Learning for Language Toolkit. (2002). Retrieved from http://mallet.cs.umass.edu.Google ScholarGoogle Scholar
  92. Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, and ChengXiang Zhai. 2007. Topic sentiment mixture: Modeling facets and opinions in weblogs. In Proceedings of the 16th International Conference on World Wide Web. ACM, 171–180.Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. David Mimno and Andrew McCallum. 2007. Expertise modeling for matching papers with reviewers. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 500–509.Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. David Mimno and Andrew McCallum. 2007. Organizing the OCA: Learning faceted subjects from a library of digital books. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries. ACM, 376–385.Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. David Mimno, Hanna M. Wallach, Jason Naradowsky, David A. Smith, and Andrew McCallum. 2009. Polylingual topic models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 880–889. DOI:https://doi.org/10.3115/1699571.1699627Google ScholarGoogle ScholarCross RefCross Ref
  96. Christopher E. Moody. 2016. Mixing Dirichlet topic models and word embeddings to make lda2vec. arXiv preprint arXiv:1605.02019 (2016).Google ScholarGoogle Scholar
  97. Gordon E. Moon, Israt Nisa, Aravind Sukumaran-Rajam, Bortik Bandyopadhyay, Srinivasan Parthasarathy, and P. Sadayappan. 2018. Parallel latent Dirichlet allocation on GPUs. In Proceedings of the International Conference on Computational Science. Springer, 259–272.Google ScholarGoogle Scholar
  98. N. K. Nagwani. 2015. Summarizing large text collection using topic modeling and clustering based on MapReduce framework. J. Big Data 2, 1 (2015), 6.Google ScholarGoogle ScholarCross RefCross Ref
  99. Ramesh Nallapati, William Cohen, and John Lafferty. 2007. Parallelized variational EM for latent Dirichlet allocation: An experimental evaluation of speed and scalability. In Proceedings of the International Conference on Data Mining Workshops (ICDMW’07). IEEE, 349–354.Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. David Newman, Arthur Asuncion, Padhraic Smyth, and Max Welling. 2009. Distributed algorithms for topic models. J. Mach. Learn. Res. 10, Aug. (2009), 1801–1828.Google ScholarGoogle Scholar
  101. David Newman, Jey Han Lau, Karl Grieser, and Timothy Baldwin. 2010. Automatic evaluation of topic coherence. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 100–108.Google ScholarGoogle Scholar
  102. David Newman, Padhraic Smyth, and Mark Steyvers. 2006. Scalable parallel topic models. J. Intell. Commun. Res. Devel. 5 (2006). DOI:https://doi.org/10.7551/mitpress/9486.003.0011Google ScholarGoogle Scholar
  103. David Newman, Padhraic Smyth, Max Welling, and Arthur U. Asuncion. 2008. Distributed inference for latent Dirichlet allocation. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 1081–1088.Google ScholarGoogle Scholar
  104. Zhenxing Niu, Gang Hua, Le Wang, and Xinbo Gao. 2017. Knowledge-based topic model for unsupervised object discovery and localization. IEEE Trans. Image Process. 27, 1 (2017), 50–63.Google ScholarGoogle ScholarCross RefCross Ref
  105. Michael J. Paul and Mark Dredze. 2014. Discovering health topics in social media using topic models. PloS One 9, 8 (2014), e103408. DOI:https://doi.org/10.1371/journal.pone.0103408Google ScholarGoogle ScholarCross RefCross Ref
  106. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 (2011), 2825–2830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. Nanyun Peng, Yiming Wang, and Mark Dredze. 2014. Learning polylingual topic models from code-switched social media documents. In Proceedings of the 52nd Meeting of the Association for Computational Linguistics. 674–679.Google ScholarGoogle ScholarCross RefCross Ref
  108. James Petterson, Wray Buntine, Shravan M. Narayanamurthy, Tibério S. Caetano, and Alex J. Smola. 2010. Word features for latent Dirichlet allocation. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 1921–1929.Google ScholarGoogle Scholar
  109. Ian Porteous, David Newman, Alexander Ihler, Arthur Asuncion, Padhraic Smyth, and Max Welling. 2008. Fast collapsed Gibbs sampling for latent Dirichlet allocation. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 569–577.Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. Jipeng Qiang, Zhenyu Qian, Yun Li, Yunhao Yuan, and Xindong Wu. 2020. Short text topic modeling techniques, applications, and performance: A survey. IEEE Transactions on Knowledge and Data Engineering.Google ScholarGoogle ScholarCross RefCross Ref
  111. Xiaojun Quan, Chunyu Kit, Yong Ge, and Sinno Jialin Pan. 2015. Short and sparse text topic modeling via self-aggregation. In Proceedings of the 24th International Joint Conference on Artificial Intelligence.Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. Daniel Ramage, Susan Dumais, and Dan Liebling. 2010. Characterizing microblogs with topic models. In Proceedings of the 4th International AAAI Conference on Weblogs and Social Media.Google ScholarGoogle ScholarCross RefCross Ref
  113. Daniel Ramage, David Hall, Ramesh Nallapati, and Christopher D. Manning. 2009. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 248–256.Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. Daniel Ramage, Christopher D. Manning, and Susan Dumais. 2011. Partially labeled topic models for interpretable text mining. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 457–465. DOI:https://doi.org/10.1145/2020408.2020481Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. Radim Řehůřek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC Workshop on New Challenges for NLP Frameworks. ELRA, 45–50.Google ScholarGoogle Scholar
  116. Joseph Reisinger, Austin Waters, Bryan Silverthorn, and Raymond J. Mooney. 2010. Spherical topic models. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). 903–910. DOI:https://doi.org/10.1007/s10955-009-9892-0Google ScholarGoogle Scholar
  117. Yafeng Ren, Ruimin Wang, and Donghong Ji. 2016. A topic-enhanced word embedding for Twitter sentiment classification. Inf. Sci. 369 (2016), 188–198.Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. Philip Resnik and Eric Hardisty. 2010. Gibbs sampling for the uninitiated. Maryland Univ College Park Inst for Advanced Computer Studies.Google ScholarGoogle Scholar
  119. Kirk Roberts, Michael A. Roach, Joseph Johnson, Josh Guthrie, and Sanda M. Harabagiu. 2012. EmpaTweet: Annotating and detecting emotions on Twitter. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12). Citeseer, 3806–3813.Google ScholarGoogle Scholar
  120. Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, and Padhraic Smyth. 2004. The author-topic model for authors and documents. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. AUAI Press, 487–494. DOI:https://doi.org/10.1016/j.nima.2010.11.062Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. Karim Sayadi, Quang Vu Bui, and Marc Bui. 2016. Distributed implementation of the latent Dirichlet allocation on Spark. In Proceedings of the 7th Symposium on Information and Communication Technology. ACM, 92–98.Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. Alexandra Schofield, Måns Magnusson, and David Mimno. 2017. Pulling out the stops: Rethinking stopword removal for topic models. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. 432–436.Google ScholarGoogle ScholarCross RefCross Ref
  123. Karthick Seshadri, S. Mercy Shalinie, and Chidambaram Kollengode. 2015. Design and evaluation of a parallel algorithm for inferring topic hierarchies. Inf. Proc. Manag. 51, 5 (2015), 662–676. DOI:https://doi.org/10.1016/j.ipm.2015.06.006Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. Carson Sievert and Kenneth Shirley. 2014. LDAvis: A method for visualizing and interpreting topics. In Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces. 63–70.Google ScholarGoogle ScholarCross RefCross Ref
  125. Bradley Skaggs and Lise Getoor. 2014. Topic modeling for Wikipedia link disambiguation. ACM Trans. Inf. Syst. 32, 3 (2014), 10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. Alison Smith, Jason Chuang, Yuening Hu, Jordan Boyd-Graber, and Leah Findlater. 2014. Concurrent visualization of relationships between words and topics in topic models. In Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces. 79–82.Google ScholarGoogle ScholarCross RefCross Ref
  127. Alexander Smola and Shravan Narayanamurthy. 2010. An architecture for parallel topic models. Proc. VLDB Endow. 3, 1-2 (2010), 703–710.Google ScholarGoogle ScholarDigital LibraryDigital Library
  128. Padhraic Smyth, Max Welling, and Arthur U. Asuncion. 2009. Asynchronous distributed learning of topic models. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 81–88.Google ScholarGoogle Scholar
  129. Mark Steyvers and Tom Griffiths. 2007. Probabilistic topic models. Handb. Latent Semant. Anal. 427, 7 (2007), 424–440.Google ScholarGoogle Scholar
  130. Xiaobing Sun, Bixin Li, Hareton Leung, Bin Li, and Yun Li. 2015. MSR4SM: Using topic models to effectively mining software repositories for software maintenance tasks. Inf. Softw. Technol. 66 (2015), 1–12. DOI:https://doi.org/10.1016/j.infsof.2015.05.003Google ScholarGoogle ScholarDigital LibraryDigital Library
  131. Yee W. Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. 2005. Sharing clusters among related groups: Hierarchical Dirichlet processes. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 1385–1392.Google ScholarGoogle Scholar
  132. Rajeev Thakur, Rolf Rabenseifner, and William Gropp. 2005. Optimization of collective communication operations in MPICH. Int. J. High Perf. Comput. Applic. 19, 1 (2005), 49–66.Google ScholarGoogle ScholarDigital LibraryDigital Library
  133. Stephen W. Thomas, Bram Adams, Ahmed E. Hassan, and Dorothea Blostein. 2014. Studying software evolution using topic models. Sci. Comput. Prog. 80 (2014), 457–479. DOI:https://doi.org/10.1016/j.scico.2012.08.003Google ScholarGoogle ScholarCross RefCross Ref
  134. Kai Tian, Meghan Revelle, and Denys Poshyvanyk. 2009. Using latent Dirichlet allocation for automatic categorization of software. In Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories. IEEE, 163–166.Google ScholarGoogle ScholarDigital LibraryDigital Library
  135. Zhongyuan Tian, Harumichi Yokoyama, and Takuya Araki. 2019. Parallel latent Dirichlet allocation using vector processors. In Proceedings of the IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, 1548–1555.Google ScholarGoogle Scholar
  136. Calin Rares Turliuc, Luke Dickens, Alessandra Russo, and Krysia Broda. 2016. Probabilistic abductive logic programming using Dirichlet priors. Int. J. Approx. Reas. 78 (2016), 223–240. DOI:https://doi.org/10.1016/j.ijar.2016.07.001Google ScholarGoogle ScholarDigital LibraryDigital Library
  137. Duc-Thuan Vo and Cheol-Young Ock. 2015. Learning to classify short text from scientific documents using topic models with various types of knowledge. Exp. Syst. Applic. 42, 3 (2015), 1684–1698. DOI:https://doi.org/10.1016/j.eswa.2014.09.031Google ScholarGoogle ScholarDigital LibraryDigital Library
  138. Konstantin Vorontsov, Oleksandr Frei, Murat Apishev, Peter Romov, and Marina Dudarenko. 2015. BigARTM: Open source library for regularized multimodal topic modeling of large collections. In Proceedings of the International Conference on Analysis of Images, Social Networks and Texts. Springer, 370–381.Google ScholarGoogle ScholarCross RefCross Ref
  139. Konstantin Vorontsov and Anna Potapenko. 2015. Additive regularization of topic models. Mach. Learn. 101, 1–3 (2015), 303–323.Google ScholarGoogle ScholarDigital LibraryDigital Library
  140. Nicholas Vretos, Nikos Nikolaidis, and Ioannis Pitas. 2012. Video fingerprinting using latent Dirichlet allocation and facial images. Pattern Recog. 45, 7 (2012), 2489–2498. DOI:https://doi.org/10.1016/j.patcog.2011.12.022Google ScholarGoogle ScholarDigital LibraryDigital Library
  141. Ivan Vulić, Wim De Smet, and Marie-Francine Moens. 2013. Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora. Inf. Retr. 16, 3 (2013), 331–368. DOI:https://doi.org/10.1007/s10791-012-9200-5Google ScholarGoogle ScholarDigital LibraryDigital Library
  142. Ivan Vulić, Wim De Smet, Jie Tang, and Marie-Francine Moens. 2015. Probabilistic topic modeling in multilingual settings: An overview of its methodology and applications. Inf. Proc. Manag. 51, 1 (2015), 111–147. DOI:https://doi.org/10.1016/j.ipm.2014.08.003Google ScholarGoogle ScholarCross RefCross Ref
  143. Martin J. Wainwright, Michael I. Jordan et al. 2008. Graphical models, exponential families, and variational inference. Found. Trends® Mach. Learn. 1, 1–2 (2008), 1–305.Google ScholarGoogle ScholarDigital LibraryDigital Library
  144. Hanna M Wallach. 2006. Topic modeling: Beyond bag-of-words. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 977–984.Google ScholarGoogle ScholarDigital LibraryDigital Library
  145. Hanna M. Wallach, Iain Murray, Ruslan Salakhutdinov, and David Mimno. 2009. Evaluation methods for topic models. In Proceedings of the 26th International Conference on Machine Learning.1105–1112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  146. Chong Wang, David Blei, and David Heckerman. 2012. Continuous time dynamic topic models. arXiv preprint arXiv:1206.3298 (2012).Google ScholarGoogle Scholar
  147. Di Wang and Ahmad Al-Rubaie. 2015. Incremental learning with partial-supervision based on hierarchical Dirichlet process and the application for document classification. Appl. Soft Comput. 33 (2015), 250–262. DOI:https://doi.org/10.1016/j.asoc.2015.04.044Google ScholarGoogle ScholarDigital LibraryDigital Library
  148. Jin Wang, Xiangping Sun, Mary F. H. She, Abbas Kouzani, and Saeid Nahavandi. 2013. Unsupervised mining of long time series based on latent topic model. Neurocomputing 103 (2013), 93–103. DOI:https://doi.org/10.1016/j.neucom.2012.09.008Google ScholarGoogle ScholarDigital LibraryDigital Library
  149. Xuerui Wang and Andrew McCallum. 2006. Topics over time: A non-Markov continuous-time model of topical trends. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 424–433.Google ScholarGoogle ScholarDigital LibraryDigital Library
  150. Xuerui Wang, Andrew McCallum, and Xing Wei. 2007. Topical n-grams: Phrase and topic discovery, with an application to information retrieval. In Proceedings of the 7th IEEE International Conference on Data Mining (ICDM’07). IEEE, 697–702.Google ScholarGoogle ScholarDigital LibraryDigital Library
  151. Xiang Wang, Kai Zhang, Xiaoming Jin, and Dou Shen. 2009. Mining common topics from multiple asynchronous text streams. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining. ACM, 192–201.Google ScholarGoogle ScholarDigital LibraryDigital Library
  152. Yi Wang, Hongjie Bai, Matt Stanton, Wen-Yen Chen, and Edward Y. Chang. 2009. PLDA: Parallel latent Dirichlet allocation for large-scale applications. In Proceedings of the International Conference on Algorithmic Applications in Management. 301–314. DOI:https://doi.org/10.1007/978-3-642-02158-9_26Google ScholarGoogle ScholarDigital LibraryDigital Library
  153. Yu Wang, Jiebo Luo, Richard Niemi, Yuncheng Li, and Tianran Hu. 2016. Catching fire via “likes”: Inferring topic preferences of Trump followers on Twitter. In Proceedings of the 10th International AAAI Conference on Web and Social Media.Google ScholarGoogle Scholar
  154. Yi Wang, Xuemin Zhao, Zhenlong Sun, Hao Yan, Lifeng Wang, Zhihui Jin, Liubin Wang, Yang Gao, Jia Zeng, Qiang Yang et al. 2014. Towards topic modeling for big data. arXiv preprint arXiv:1405.4402 (2014).Google ScholarGoogle Scholar
  155. Lino Wehrheim. 2019. Economic history goes digital: Topic modeling the journal of economic history. Cliometrica 13, 1 (2019), 83–125.Google ScholarGoogle ScholarCross RefCross Ref
  156. Jianshu Weng, Ee-Peng Lim, Jing Jiang, and Qi He. 2010. Twitterrank: Finding topic-sensitive influential Twitterers. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. ACM, 261–270.Google ScholarGoogle ScholarDigital LibraryDigital Library
  157. Erik Wiener, Jan O. Pedersen, Andreas S. Weigend, et al. 1995. A neural network approach to topic spotting. In Proceedings of the 4th Symposium on Document Analysis and Information Retrieval.Google ScholarGoogle Scholar
  158. Andrew T. Wilson and Peter A. Chew. 2010. Term weighting schemes for latent Dirichlet allocation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. 465–473. DOI:https://doi.org/1857999.1858069Google ScholarGoogle Scholar
  159. Yueshen Xu, Jianwei Yin, Jianbin Huang, and Yuyu Yin. 2018. Hierarchical topic modeling with automatic knowledge mining. Exp. Syst. Applic. 103 (2018), 106–117.Google ScholarGoogle ScholarCross RefCross Ref
  160. Yueshen Xu, Yuyu Yin, and Jianwei Yin. 2017. Tackling topic general words in topic modeling. Eng. Applic. Artif. Intell. 62 (2017), 124–133.Google ScholarGoogle ScholarDigital LibraryDigital Library
  161. Guangxu Xun, Yaliang Li, Wayne Xin Zhao, Jing Gao, and Aidong Zhang. 2017. A correlated topic model using word embeddings. In Proceedings of the International Joint Conference on Artificial Intelligence. 4207–4213.Google ScholarGoogle ScholarCross RefCross Ref
  162. Feng Yan, Ningyi Xu, and Yuan Qi. 2009. Parallel inference for latent Dirichlet allocation on graphics processing units. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 2134–2142.Google ScholarGoogle Scholar
  163. Shuang Yang, Chunfeng Yuan, Weiming Hu, and Xinmiao Ding. 2014. A hierarchical model based on latent Dirichlet allocation for action recognition. In Proceedings of the 22nd International Conference on Pattern Recognition. IEEE, 2613–2618. DOI:https://doi.org/10.1109/ICPR.2014.451Google ScholarGoogle ScholarDigital LibraryDigital Library
  164. Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik. 2019. A multilingual topic model for learning weighted topic links across corpora with low comparability. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 1243–1248.Google ScholarGoogle ScholarCross RefCross Ref
  165. Yi Yang, Doug Downey, and Jordan Boyd-Graber. 2015. Efficient methods for incorporating knowledge into topic models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 308–317.Google ScholarGoogle ScholarCross RefCross Ref
  166. Limin Yao, David Mimno, and Andrew McCallum. 2009. Efficient methods for topic model inference on streaming document collections. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 937–946. DOI:https://doi.org/10.1145/1557019.1557121Google ScholarGoogle ScholarDigital LibraryDigital Library
  167. Liang Yao, Yin Zhang, Baogang Wei, Lei Li, Fei Wu, Peng Zhang, and Yali Bian. 2016. Concept over time: the combination of probabilistic topic model with wikipedia knowledge. Exp. Syst. Applic. 60 (2016), 27–38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  168. Chyi-Kwei Yau, Alan Porter, Nils Newman, and Arho Suominen. 2014. Clustering scientific documents with topic modeling. Scientometrics 100, 3 (2014), 767–786.Google ScholarGoogle ScholarDigital LibraryDigital Library
  169. Hsiang-Fu Yu, Cho-Jui Hsieh, Hyokun Yun, S. V. N. Vishwanathan, and Inderjit S. Dhillon. 2015. A scalable asynchronous distributed algorithm for topic modeling. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1340–1350.Google ScholarGoogle Scholar
  170. Bo Yuan, Xinbo Gao, Zhenxing Niu, and Qi Tian. 2019. Discovering latent topics by Gaussian latent Dirichlet allocation and spectral clustering. ACM Trans. Multimedia Comput. Commun. Applic. 15, 1 (2019), 25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  171. Lele Yut, Ce Zhang, Yingxia Shao, and Bin Cui. 2017. LDA* a robust and large-scale topic modeling system. Proc. VLDB Endow. 10, 11 (2017), 1406–1417.Google ScholarGoogle ScholarDigital LibraryDigital Library
  172. Manzil Zaheer, Amr Ahmed, and Alexander J. Smola. 2017. Latent LSTM allocation joint clustering and non-linear dynamic modeling of sequential data. In Proceedings of the 34th International Conference on Machine Learning. JMLR.org, 3967–3976.Google ScholarGoogle Scholar
  173. Jianping Zeng, Jiangjiao Duan, Wenjun Cao, and Chengrong Wu. 2012. Topics modeling based on selective Zipf distribution. Exp. Syst. Applic. 39, 7 (2012), 6541–6546. DOI:https://doi.org/10.1016/j.eswa.2011.12.051Google ScholarGoogle ScholarDigital LibraryDigital Library
  174. Ke Zhai and Jordan Boyd-Graber. 2013. Online latent Dirichlet allocation with infinite vocabulary. In Proceedings of the International Conference on Machine Learning. 561–569.Google ScholarGoogle Scholar
  175. Ke Zhai, Jordan Boyd-Graber, Nima Asadi, and Mohamad L. Alkhouja. 2012. Mr. LDA: A flexible large scale topic modeling package using variational inference in MapReduce. In Proceedings of the 21st International Conference on World Wide Web. ACM, 879–888. DOI:https://doi.org/10.1145/2187836.2187955Google ScholarGoogle ScholarDigital LibraryDigital Library
  176. Jianwen Zhang, Yangqiu Song, Changshui Zhang, and Shixia Liu. 2010. Evolutionary hierarchical Dirichlet processes for multiple correlated time-varying corpora. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1079–1088.Google ScholarGoogle ScholarDigital LibraryDigital Library
  177. Tao Zhang, Kang Liu, Jun Zhao, et al. 2013. Cross lingual entity linking with bilingual topic model.Proceedings of the International Joint Conference on Artificial Intelligence. 2218–2224.Google ScholarGoogle Scholar
  178. Bing Zhao and Eric P. Xing. 2006. BiTAM: Bilingual topic admixture models for word alignment. In Proceedings of the COLING/ACL on Main Conference Poster Sessions. Association for Computational Linguistics, 969–976.Google ScholarGoogle Scholar
  179. Bing Zhao and Eric P. Xing. 2007. HM-BiTAM: Bilingual topic exploration, word alignment, and translation. Advances in Neural Information Processing Systems 20 (2007), 1689–1696.Google ScholarGoogle Scholar
  180. Feng Zhao, Yajun Zhu, Hai Jin, and Laurence T. Yang. 2016. A personalized hashtag recommendation approach using LDA-based topic model in microblog environment. Fut. Gen. Comput. Syst. 65 (2016), 196–206.Google ScholarGoogle ScholarDigital LibraryDigital Library
  181. Huasha Zhao, Biye Jiang, John F. Canny, and Bobby Jaros. 2015. Same but different: Fast and high quality Gibbs parameter estimation. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1495–1502.Google ScholarGoogle ScholarDigital LibraryDigital Library
  182. Wenjun Zhu, Liqing Zhang, and Qianwei Bian. 2012. A hierarchical latent topic model based on sparse coding. Neurocomputing 76, 1 (2012), 28–35. DOI:https://doi.org/10.1016/j.neucom.2010.11.038Google ScholarGoogle ScholarDigital LibraryDigital Library
  183. Elaine Zosa and Mark Granroth-Wilding. 2019. Multilingual dynamic topic model. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP’19). 1388–1396.Google ScholarGoogle ScholarCross RefCross Ref
  184. Jialing Zou, Qixiang Ye, Yanting Cui, Fang Wan, Kun Fu, and Jianbin Jiao. 2016. Collective motion pattern inference via locally consistent latent Dirichlet allocation. Neurocomputing 184 (2016), 221–231. DOI:https://doi.org/10.1016/j.neucom.2015.08.108Google ScholarGoogle ScholarDigital LibraryDigital Library
  185. Yuan Zuo, Junjie Wu, Hui Zhang, Hao Lin, Fei Wang, Ke Xu, and Hui Xiong. 2016. Topic modeling of short texts: A pseudo-document view. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2105–2114.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Topic Modeling Using Latent Dirichlet allocation: A Survey

                    Recommendations

                    Comments

                    Login options

                    Check if you have access through your login credentials or your institution to get full access on this article.

                    Sign in

                    Full Access

                    • Published in

                      cover image ACM Computing Surveys
                      ACM Computing Surveys  Volume 54, Issue 7
                      September 2022
                      778 pages
                      ISSN:0360-0300
                      EISSN:1557-7341
                      DOI:10.1145/3476825
                      Issue’s Table of Contents

                      Copyright © 2021 ACM

                      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                      Publisher

                      Association for Computing Machinery

                      New York, NY, United States

                      Publication History

                      • Published: 17 September 2021
                      • Accepted: 1 April 2021
                      • Revised: 1 March 2021
                      • Received: 1 April 2020
                      Published in csur Volume 54, Issue 7

                      Permissions

                      Request permissions about this article.

                      Request Permissions

                      Check for updates

                      Qualifiers

                      • research-article
                      • Research
                      • Refereed

                    PDF Format

                    View or Download as a PDF file.

                    PDF

                    eReader

                    View online with eReader.

                    eReader

                    HTML Format

                    View this article in HTML Format .

                    View HTML Format