Abstract
We are not able to deal with a mammoth text corpus without summarizing them into a relatively small subset. A computational tool is extremely needed to understand such a gigantic pool of text. Probabilistic Topic Modeling discovers and explains the enormous collection of documents by reducing them in a topical subspace. In this work, we study the background and advancement of topic modeling techniques. We first introduce the preliminaries of the topic modeling techniques and review its extensions and variations, such as topic modeling over various domains, hierarchical topic modeling, word embedded topic models, and topic models in multilingual perspectives. Besides, the research work for topic modeling in a distributed environment, topic visualization approaches also have been explored. We also covered the implementation and evaluation techniques for topic models in brief. Comparison matrices have been shown over the experimental results of the various categories of topic modeling. Diverse technical challenges and future directions have been discussed.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Topic Modeling Using Latent Dirichlet allocation: A Survey
- Nikolaos Aletras and Mark Stevenson. 2013. Evaluating topic coherence using distributional semantics. In Proceedings of the 10th International Conference on Computational Semantics. 13–22.Google Scholar
- Rubayyi Alghamdi and Khalid Alfalqi. 2015. A survey of topic modeling in text mining. Int. J. Adv. Comput. Sci. Appl. 6, 1 (2015).Google Scholar
- Loulwah AlSumait, Daniel Barbará, and Carlotta Domeniconi. 2008. On-line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking. In Proceedings of the 8th IEEE International Conference on Data Mining. IEEE, 3–12.Google ScholarDigital Library
- Arthur Asuncion, Max Welling, Padhraic Smyth, and Yee Whye Teh. 2009. On smoothing and inference for topic models. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence. 27–34.Google Scholar
- Hazeline U. Asuncion, Arthur U. Asuncion, and Richard N. Taylor. 2010. Software traceability with topic modeling. In Proceedings of the ACM/IEEE 32nd International Conference on Software Engineering, Vol. 1. IEEE, 95–104.Google Scholar
- D. K. JinYeong Bak and A. Oh. 2012. Distributed online learning for latent Dirichlet allocation. In Proceedings of the NIPS Workshop on Big Learning. 1–8.Google Scholar
- Parantapa Bhattacharya, Muhammad Bilal Zafar, Niloy Ganguly, Saptarshi Ghosh, and Krishna P. Gummadi. 2014. Inferring user interests in the Twitter social network. In Proceedings of the 8th ACM Conference on Recommender Systems. ACM, 357–360.Google Scholar
- David M. Blei. 2012. Probabilistic topic models. Commun. ACM 55, 4 (2012), 77–84. DOI:https://doi.org/doi:10.1145/2133806.2133826Google ScholarDigital Library
- David M. Blei, Thomas L. Griffiths, and Michael I. Jordan. 2010. The nested chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. J. ACM 57, 2 (2010), 7. DOI:https://doi.org/10.1145/1667053.1667056Google ScholarDigital Library
- David M. Blei and John D. Lafferty. 2006. Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 113–120. DOI:https://doi.org/10.1145/1143844.1143859Google ScholarDigital Library
- David M. Blei and John D. Lafferty. 2007. A correlated topic model of science. Ann. Appl. Statist. (2007), 17–35.Google Scholar
- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, Jan. (2003), 993–1022. DOI:https://doi.org/10.1162/jmlr.2003.3.4-5.993Google Scholar
- Jordan Boyd-Graber, David Mimno, and David Newman. 2014. Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements. Vol. 225255. CRC Press, Boca Raton, FL.Google Scholar
- Samuel Brody and Mirella Lapata. 2009. Bayesian word sense induction. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 103–111.Google ScholarDigital Library
- Stefan Bunk and Ralf Krestel. 2018. WELDA: Enhancing topic models by incorporating local word context. In Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries. 293–302.Google ScholarDigital Library
- George Casella and Edward I. George. 1992. Explaining the Gibbs sampler. Amer. Statist. 46, 3 (1992), 167–174.Google Scholar
- Jonathan Chang. 2012. Collapsed Gibbs sampling methods for topic models. R package: lda (version 1.3.2). http://cran.r-project.org/web/packages/lda/index.html.Google Scholar
- Jonathan Chang and David Blei. 2009. Relational topic models for document networks. In Artificial Intelligence and Statistics. PMLR, 81–88.Google Scholar
- Ying-Lang Chang and Jen-Tzung Chien. 2009. Latent Dirichlet learning for document summarization. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 1689–1692. DOI:https://doi.org/10.1109/ICASSP.2009.4959927Google ScholarDigital Library
- Tse-Hsun Chen, Weiyi Shang, Meiyappan Nagappan, Ahmed E. Hassan, and Stephen W. Thomas. 2017. Topic-based software defect explanation. J. Syst. Softw. 129 (2017), 79–106. DOI:https://doi.org/10.1016/j.jss.2016.05.015Google ScholarDigital Library
- Xueqi Cheng, Xiaohui Yan, Yanyan Lan, and Jiafeng Guo. 2014. BTM: Topic modeling over short texts. IEEE Trans. Knowl. Data Eng. 26, 12 (2014), 2928–2941. DOI:https://doi.org/10.1109/TKDE.2014.2313872Google ScholarCross Ref
- Jason Chuang, Christopher D. Manning, and Jeffrey Heer. 2012. Termite: Visualization techniques for assessing textual topic models. In Proceedings of the International Working Conference on Advanced Visual Interfaces. ACM, 74–77. DOI:https://doi.org/10.1145/2254556.2254572Google ScholarDigital Library
- Raphael Cohen, Iddo Aviram, Michael Elhadad, and Noémie Elhadad. 2014. Redundancy-aware topic modeling for patient record notes. PloS One 9, 2 (2014), e87555. DOI:https://doi.org/10.1371/journal.pone.0087555Google ScholarCross Ref
- Mário Cordeiro. 2012. Twitter event detection: Combining wavelet analysis and topic inference summarization. In Doctoral Symposium on Informatics Engineering. 11–16.Google Scholar
- Christopher S. Corley, Kostadin Damevski, and Nicholas A. Kraft. 2020. Changeset-based topic modeling of software repositories. IEEE Trans. Softw. Eng. 46, 10 (2020), 1068–1080. DOI:10.1109/TSE.2018.2874960Google ScholarCross Ref
- Rajarshi Das, Manzil Zaheer, and Chris Dyer. 2015. Gaussian LDA for topic models with word embeddings. In Proceedings of the Meeting of the Association for Computational Linguistics. 795–804.Google ScholarCross Ref
- Ali Daud, Juanzi Li, Lizhu Zhou, and Faqir Muhammad. 2010. Knowledge discovery through directed probabilistic topic models: A survey. Front. Comput. Sci. China 4, 2 (2010), 280–301.Google ScholarCross Ref
- Wim De Smet and Marie-Francine Moens. 2009. Cross-language linking of news stories on the web using interlingual topic modelling. In Proceedings of the 2nd ACM Workshop on Social Web Search and Mining. ACM, 57–64.Google ScholarDigital Library
- Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107–113.Google ScholarDigital Library
- Stefan Debortoli, Oliver Müller, Iris Junglas, and Jan vom Brocke. 2016. Text mining for information systems researchers: An annotated topic modeling tutorial. Commun. Assoc. Inf. Syst. 39, 1 (2016), 7.Google ScholarCross Ref
- Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Inf. Sci. 41, 6 (1990), 391. DOI:https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9Google ScholarCross Ref
- Mohamed Dermouche, Julien Velcin, Leila Khouas, and Sabine Loudcher. 2014. A joint model for topic-sentiment evolution over time. In Proceedings of the IEEE International Conference on Data Mining (ICDM’14). IEEE, 773–778. DOI:https://doi.org/10.1109/ICDM.2014.82Google ScholarDigital Library
- Adji B. Dieng, Francisco J. R. Ruiz, and David M. Blei. 2019. The dynamic embedded topic model. arXiv preprint arXiv:1907.05545 (2019).Google Scholar
- Adji B. Dieng, Francisco J. R. Ruiz, and David M. Blei. 2020. Topic modeling in embedding spaces. Trans. Assoc. Comput. Ling. 8 (2020), 439–453.Google ScholarCross Ref
- Tarek Elguebaly and Nizar Bouguila. 2013. Simultaneous Bayesian clustering and feature selection using RJMCMC-based learning of finite generalized Dirichlet mixture models. Sig. Process. 93, 6 (2013), 1531–1546.Google ScholarDigital Library
- Katayoun Farrahi and Daniel Gatica-Perez. 2011. Discovering routines from large-scale human locations using probabilistic topic models. ACM Trans. Intell. Syst. Technol. 2, 1 (2011), 3.Google ScholarDigital Library
- Xianghua Fu, Kun Yang, Joshua Zhexue Huang, and Laizhong Cui. 2015. Dynamic non-parametric joint sentiment topic mixture model. Knowl.-based Syst. 82 (2015), 102–114.Google Scholar
- Debasis Ganguly, Manisha Ganguly, Johannes Leveling, and Gareth J. F. Jones. 2013. TopicVis: A GUI for topic-based feedback and navigation. DOI:https://doi.org/10.1145/2484028.2484202Google ScholarDigital Library
- Debasis Ganguly, Johannes Leveling, and Gareth J. F. Jones. 2012. Cross-lingual topical relevance models. DOI:https://doi.org/10.1145/564405.564408Google Scholar
- Brynjar Gretarsson, John O’Donovan, Svetlin Bostandjiev, Tobias Höllerer, Arthur Asuncion, David Newman, and Padhraic Smyth. 2012. Topicnets: Visual analysis of large text corpora with topic modeling. ACM Trans. Intell. Syst. Technol. 3, 2 (2012), 23. DOI:https://doi.org/10.1126/science.1178206Google ScholarCross Ref
- Tom Griffiths. 2002. Gibbs sampling in the generative model of latent Dirichlet allocation. DOI:https://doi.org/10.1145/1401890.1401960Google ScholarDigital Library
- Thomas L. Griffiths and Mark Steyvers. 2004. Finding scientific topics. Proc. Nat. Acad. Sci. 101, suppl 1 (2004), 5228–5235. DOI:https://doi.org/10.1073/pnas.0307752101Google ScholarCross Ref
- Loni Hagen. 2018. Content analysis of e-petitions with topic modeling: How to train and evaluate LDA models?Inf. Proc. Manag. 54, 6 (2018), 1292–1307.Google ScholarCross Ref
- Aria Haghighi and Lucy Vanderwende. 2009. Exploring content models for multi-document summarization. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 362–370.Google ScholarCross Ref
- Xingwei He, Hua Xu, Jia Li, Liu He, and Linlin Yu. 2017. FastBTM: Reducing the sampling time for biterm topic model. Knowl.-Based Syst 132 (2017), 11–20.Google ScholarCross Ref
- Gregor Heinrich. 2008. Parameter Estimation for Text Analysis. Technical Report. University of Leipzig. 1–32.Google Scholar
- Go Eun Heo, Keun Young Kang, Min Song, and Jeong-Hoon Lee. 2017. Analyzing the field of bioinformatics with the multi-faceted topic modeling technique. BMC Bioinf 18, 7 (2017), 251.Google ScholarCross Ref
- Matthew Hoffman, Francis R. Bach, and David M. Blei. 2010. Online learning for latent Dirichlet allocation. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 856–864. DOI:https://doi.org/10.1.1.187.1883Google Scholar
- Thomas Hofmann. 1999. Probabilistic latent semantic analysis. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., 289–296. DOI:https://doi.org/10.1162/jmlr.2003.3.4-5.993Google Scholar
- Thomas Hofmann. 2001. Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42, 1 (2001), 177–196.Google ScholarCross Ref
- Liangjie Hong, Ovidiu Dan, and Brian D. Davison. 2011. Predicting popular messages in Twitter. In Proceedings of the 20th International Conference Companion on World Wide Web. ACM, 57–58.Google Scholar
- Pengfei Hu, Wenju Liu, Wei Jiang, and Zhanlei Yang. 2014. Latent topic model for audio retrieval. Pattern Recog. 47, 3 (2014), 1138–1143. DOI:https://doi.org/10.1016/j.patcog.2013.06.010Google ScholarDigital Library
- Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, and Alison Smith. 2014. Interactive topic modeling. Mach. Learn. 95, 3 (2014), 423–469.Google ScholarDigital Library
- Dongping Huang, Shuyu Hu, Yi Cai, and Huaqing Min. 2014. Discovering event evolution graphs based on news articles relationships. In Proceedings of the IEEE 11th International Conference on e-Business Engineering (ICEBE’14). IEEE, 246–251.Google ScholarDigital Library
- Hamed Jelodar, Yongli Wang, Chi Yuan, Xia Feng, Xiahui Jiang, Yanchao Li, and Liang Zhao. 2019. Latent Dirichlet allocation (LDA) and topic modeling: Models, applications, a survey. Multimedia Tools. Applic. 78, 11 (2019), 15169–15211.Google ScholarDigital Library
- Do-Heon Jeong and Min Song. 2014. Time gap analysis by the topic model-based temporal technique. J. Informet. 8, 3 (2014), 776–790. DOI:https://doi.org/10.1016/j.joi.2014.07.005Google ScholarCross Ref
- Di Jiang, Yongxin Tong, and Yuanfeng Song. 2016. Cross-lingual topic discovery from multilingual search engine query log. ACM Trans. Inf. Syst. 35, 2 (2016), 9.Google ScholarDigital Library
- Efsun Sarioglu Kayi, Kabir Yadav, James M. Chamberlain, and Hyeong-Ah Choi. 2017. Topic modeling for classification of clinical reports. arXiv preprint arXiv:1706.06177 (2017).Google Scholar
- Muhammad Taimoor Khan, Mehr Durrani, Shehzad Khalid, and Furqan Aziz. 2016. Online knowledge-based model for big data topic extraction. Comput. Intell. Neurosci. DOI:https://doi.org/10.1155/2016/6081804Google Scholar
- Milad Kharratzadeh, Benjamin Renard, and Mark J. Coates. 2015. Bayesian topic model approaches to online and time-dependent clustering. Dig. Sig. Process. 47 (2015), 25–35. DOI:https://doi.org/10.1016/j.dsp.2015.03.010Google ScholarDigital Library
- Dongwoo Kim and Alice Oh. 2011. Accounting for data dependencies within a hierarchical Dirichlet process mixture model. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM, 873–878.Google ScholarDigital Library
- Dongwoo Kim and Alice Oh. 2011. Topic chains for understanding a news corpus. Comput. Ling. Intell. Text Process.. DOI:https://doi.org/10.1007/978-3-642-19437-5_13Google Scholar
- Dongwoo Kim and Alice Oh. 2014. Hierarchical Dirichlet scaling process. In Proceedings of the International Conference on Machine Learning. 973–981.Google Scholar
- Joon Hee Kim, Dongwoo Kim, Suin Kim, and Alice Oh. 2012. Modeling topic hierarchies with the recursive Chinese restaurant process. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 783–792. DOI:https://doi.org/10.1145/2396761.2396861Google ScholarDigital Library
- Younghoon Kim and Kyuseok Shim. 2014. TWILITE: A recommendation system for Twitter using a probabilistic model based on latent Dirichlet allocation. Inf. Syst. 42 (2014), 59–77. DOI:https://doi.org/10.1016/j.is.2013.11.003Google ScholarDigital Library
- Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques. The MIT Press.Google ScholarDigital Library
- Julian F. P. Kooij, Gwenn Englebienne, and Dariu M. Gavrila. 2015. Identifying multiple objects from their appearance in inaccurate detections. Comput. Vis. Image Underst. 136 (2015), 103–116.Google ScholarDigital Library
- Guy Lansley and Paul A. Longley. 2016. The geography of Twitter topics in London. Comput. Environ. Urb. Syst. 58 (2016), 85–96. DOI:https://doi.org/10.1016/j.compenvurbsys.2016.04.002Google ScholarCross Ref
- Jey Han Lau and Timothy Baldwin. 2016. The sensitivity of topic coherence evaluation to topic cardinality. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.483–487.Google ScholarCross Ref
- Jey Han Lau, David Newman, and Timothy Baldwin. 2014. Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality.Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics.530–539.Google Scholar
- Jure Leskovec, Lars Backstrom, and Jon Kleinberg. 2009. Meme-tracking and the dynamics of the news cycle. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 497–506.Google ScholarDigital Library
- Chenliang Li, Yu Duan, Haoran Wang, Zhiqian Zhang, Aixin Sun, and Zongyang Ma. 2017. Enhancing topic modeling for short texts with auxiliary word embeddings. ACM Trans. Inf. Syst. 36, 2 (2017), 11.Google ScholarDigital Library
- Chenliang Li, Haoran Wang, Zhiqian Zhang, Aixin Sun, and Zongyang Ma. 2016. Topic modeling for short texts with auxiliary word embeddings. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 165–174.Google ScholarDigital Library
- Weifeng Li, Junming Yin, and Hsinchsun Chen. 2017. Supervised topic modeling using hierarchical Dirichlet process-based inverse regression: Experiments on e-commerce applications. IEEE Trans. Knowl. Data Eng. 30, 6 (2017), 1192–1205.Google ScholarCross Ref
- Tianyi Lin, Wentao Tian, Qiaozhu Mei, and Hong Cheng. 2014. The dual-sparse topic model: Mining focused topics and focused terms in short text. In Proceedings of the 23rd International Conference on World Wide Web. 539–550.Google ScholarDigital Library
- Erik Linstead, Paul Rigor, Sushil Bajracharya, Cristina Lopes, and Pierre Baldi. 2007. Mining concepts from code with probabilistic topic models. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering. ACM, 461–464.Google ScholarDigital Library
- Jun S. Liu. 1994. The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. J. Amer. Statist. Assoc. 89, 427 (1994), 958–966.Google ScholarCross Ref
- Shuhua Liu and Patrick Jansson. 2017. Topic Modelling Analysis of Instagram Data for the Greater Helsinki Region.Google Scholar
- Xiaodong Liu, Kevin Duh, and Yuji Matsumoto. 2015. Multilingual topic models for bilingual dictionary extraction. ACM Trans. Asian Low-resour. Lang. Inf. Process. 14, 3 (2015), 11.Google ScholarDigital Library
- Xiao Liu, Mingli Song, Qi Zhao, Dacheng Tao, Chun Chen, and Jiajun Bu. 2012. Attribute-restricted latent topic model for person re-identification. Pattern Recog. 45, 12 (2012), 4204–4213.Google ScholarDigital Library
- Zhiyuan Liu, Yuzhou Zhang, Edward Y. Chang, and Maosong Sun. 2011. PLDA+: Parallel latent Dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol. 2, 3 (2011), 26.Google ScholarDigital Library
- Kun Lu and Dietmar Wolfram. 2012. Measuring author research relatedness: A comparison of word-based, topic-based, and author cocitation approaches. J. Amer. Soc. Inf. Sci. Technol. 63, 10 (2012), 1973–1986.Google ScholarDigital Library
- Zhiwu Lu and Yuxin Peng. 2013. Latent semantic learning with structured sparse representation for human action recognition. Pattern Recog. 46, 7 (2013), 1799–1809. DOI:https://doi.org/10.1016/j.patcog.2012.09.027Google ScholarDigital Library
- Stacy K. Lukins, Nicholas A. Kraft, and Letha H. Etzkorn. 2010. Bug localization using latent Dirichlet allocation. Inf. Softw. Technol. 52, 9 (2010), 972–990.Google ScholarDigital Library
- Minnan Luo, Feiping Nie, Xiaojun Chang, Yi Yang, Alexander Hauptmann, and Qinghua Zheng. 2017. Probabilistic non-negative matrix factorization and its robust extensions for topic modeling. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
- Baizhang Ma, Dongsong Zhang, Zhijun Yan, and Taeha Kim. 2013. An LDA and synonym lexicon based approach to product feature extraction from online consumer product reviews. J. Electron. Commer. Res. 14, 4 (2013), 304. DOI:https://doi.org/10.1016/j.im.2015.02.002Google ScholarDigital Library
- Hui-Fang Ma. 2011. Hot topic extraction using time window. In Proceedings of the International Conference on Machine Learning and Cybernetics (ICMLC’11). IEEE, 56–60.Google ScholarCross Ref
- Masoud Makrehchi. 2011. Social link recommendation by learning hidden topics. In Proceedings of the 5th ACM Conference on Recommender Systems. ACM, 189–196.Google ScholarDigital Library
- James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela H. Byers. 2011. Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute.Google Scholar
- Jon D. Mcauliffe and David M. Blei. 2008. Supervised topic models. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 121–128.Google Scholar
- Andrew Kachites McCallum. 2002. MALLET: A Machine Learning for Language Toolkit. (2002). Retrieved from http://mallet.cs.umass.edu.Google Scholar
- Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, and ChengXiang Zhai. 2007. Topic sentiment mixture: Modeling facets and opinions in weblogs. In Proceedings of the 16th International Conference on World Wide Web. ACM, 171–180.Google ScholarDigital Library
- David Mimno and Andrew McCallum. 2007. Expertise modeling for matching papers with reviewers. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 500–509.Google ScholarDigital Library
- David Mimno and Andrew McCallum. 2007. Organizing the OCA: Learning faceted subjects from a library of digital books. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries. ACM, 376–385.Google ScholarDigital Library
- David Mimno, Hanna M. Wallach, Jason Naradowsky, David A. Smith, and Andrew McCallum. 2009. Polylingual topic models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 880–889. DOI:https://doi.org/10.3115/1699571.1699627Google ScholarCross Ref
- Christopher E. Moody. 2016. Mixing Dirichlet topic models and word embeddings to make lda2vec. arXiv preprint arXiv:1605.02019 (2016).Google Scholar
- Gordon E. Moon, Israt Nisa, Aravind Sukumaran-Rajam, Bortik Bandyopadhyay, Srinivasan Parthasarathy, and P. Sadayappan. 2018. Parallel latent Dirichlet allocation on GPUs. In Proceedings of the International Conference on Computational Science. Springer, 259–272.Google Scholar
- N. K. Nagwani. 2015. Summarizing large text collection using topic modeling and clustering based on MapReduce framework. J. Big Data 2, 1 (2015), 6.Google ScholarCross Ref
- Ramesh Nallapati, William Cohen, and John Lafferty. 2007. Parallelized variational EM for latent Dirichlet allocation: An experimental evaluation of speed and scalability. In Proceedings of the International Conference on Data Mining Workshops (ICDMW’07). IEEE, 349–354.Google ScholarDigital Library
- David Newman, Arthur Asuncion, Padhraic Smyth, and Max Welling. 2009. Distributed algorithms for topic models. J. Mach. Learn. Res. 10, Aug. (2009), 1801–1828.Google Scholar
- David Newman, Jey Han Lau, Karl Grieser, and Timothy Baldwin. 2010. Automatic evaluation of topic coherence. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 100–108.Google Scholar
- David Newman, Padhraic Smyth, and Mark Steyvers. 2006. Scalable parallel topic models. J. Intell. Commun. Res. Devel. 5 (2006). DOI:https://doi.org/10.7551/mitpress/9486.003.0011Google Scholar
- David Newman, Padhraic Smyth, Max Welling, and Arthur U. Asuncion. 2008. Distributed inference for latent Dirichlet allocation. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 1081–1088.Google Scholar
- Zhenxing Niu, Gang Hua, Le Wang, and Xinbo Gao. 2017. Knowledge-based topic model for unsupervised object discovery and localization. IEEE Trans. Image Process. 27, 1 (2017), 50–63.Google ScholarCross Ref
- Michael J. Paul and Mark Dredze. 2014. Discovering health topics in social media using topic models. PloS One 9, 8 (2014), e103408. DOI:https://doi.org/10.1371/journal.pone.0103408Google ScholarCross Ref
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 (2011), 2825–2830.Google ScholarDigital Library
- Nanyun Peng, Yiming Wang, and Mark Dredze. 2014. Learning polylingual topic models from code-switched social media documents. In Proceedings of the 52nd Meeting of the Association for Computational Linguistics. 674–679.Google ScholarCross Ref
- James Petterson, Wray Buntine, Shravan M. Narayanamurthy, Tibério S. Caetano, and Alex J. Smola. 2010. Word features for latent Dirichlet allocation. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 1921–1929.Google Scholar
- Ian Porteous, David Newman, Alexander Ihler, Arthur Asuncion, Padhraic Smyth, and Max Welling. 2008. Fast collapsed Gibbs sampling for latent Dirichlet allocation. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 569–577.Google ScholarDigital Library
- Jipeng Qiang, Zhenyu Qian, Yun Li, Yunhao Yuan, and Xindong Wu. 2020. Short text topic modeling techniques, applications, and performance: A survey. IEEE Transactions on Knowledge and Data Engineering.Google ScholarCross Ref
- Xiaojun Quan, Chunyu Kit, Yong Ge, and Sinno Jialin Pan. 2015. Short and sparse text topic modeling via self-aggregation. In Proceedings of the 24th International Joint Conference on Artificial Intelligence.Google ScholarDigital Library
- Daniel Ramage, Susan Dumais, and Dan Liebling. 2010. Characterizing microblogs with topic models. In Proceedings of the 4th International AAAI Conference on Weblogs and Social Media.Google ScholarCross Ref
- Daniel Ramage, David Hall, Ramesh Nallapati, and Christopher D. Manning. 2009. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 248–256.Google ScholarDigital Library
- Daniel Ramage, Christopher D. Manning, and Susan Dumais. 2011. Partially labeled topic models for interpretable text mining. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 457–465. DOI:https://doi.org/10.1145/2020408.2020481Google ScholarDigital Library
- Radim Řehůřek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC Workshop on New Challenges for NLP Frameworks. ELRA, 45–50.Google Scholar
- Joseph Reisinger, Austin Waters, Bryan Silverthorn, and Raymond J. Mooney. 2010. Spherical topic models. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). 903–910. DOI:https://doi.org/10.1007/s10955-009-9892-0Google Scholar
- Yafeng Ren, Ruimin Wang, and Donghong Ji. 2016. A topic-enhanced word embedding for Twitter sentiment classification. Inf. Sci. 369 (2016), 188–198.Google ScholarDigital Library
- Philip Resnik and Eric Hardisty. 2010. Gibbs sampling for the uninitiated. Maryland Univ College Park Inst for Advanced Computer Studies.Google Scholar
- Kirk Roberts, Michael A. Roach, Joseph Johnson, Josh Guthrie, and Sanda M. Harabagiu. 2012. EmpaTweet: Annotating and detecting emotions on Twitter. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12). Citeseer, 3806–3813.Google Scholar
- Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, and Padhraic Smyth. 2004. The author-topic model for authors and documents. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. AUAI Press, 487–494. DOI:https://doi.org/10.1016/j.nima.2010.11.062Google ScholarDigital Library
- Karim Sayadi, Quang Vu Bui, and Marc Bui. 2016. Distributed implementation of the latent Dirichlet allocation on Spark. In Proceedings of the 7th Symposium on Information and Communication Technology. ACM, 92–98.Google ScholarDigital Library
- Alexandra Schofield, Måns Magnusson, and David Mimno. 2017. Pulling out the stops: Rethinking stopword removal for topic models. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. 432–436.Google ScholarCross Ref
- Karthick Seshadri, S. Mercy Shalinie, and Chidambaram Kollengode. 2015. Design and evaluation of a parallel algorithm for inferring topic hierarchies. Inf. Proc. Manag. 51, 5 (2015), 662–676. DOI:https://doi.org/10.1016/j.ipm.2015.06.006Google ScholarDigital Library
- Carson Sievert and Kenneth Shirley. 2014. LDAvis: A method for visualizing and interpreting topics. In Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces. 63–70.Google ScholarCross Ref
- Bradley Skaggs and Lise Getoor. 2014. Topic modeling for Wikipedia link disambiguation. ACM Trans. Inf. Syst. 32, 3 (2014), 10.Google ScholarDigital Library
- Alison Smith, Jason Chuang, Yuening Hu, Jordan Boyd-Graber, and Leah Findlater. 2014. Concurrent visualization of relationships between words and topics in topic models. In Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces. 79–82.Google ScholarCross Ref
- Alexander Smola and Shravan Narayanamurthy. 2010. An architecture for parallel topic models. Proc. VLDB Endow. 3, 1-2 (2010), 703–710.Google ScholarDigital Library
- Padhraic Smyth, Max Welling, and Arthur U. Asuncion. 2009. Asynchronous distributed learning of topic models. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 81–88.Google Scholar
- Mark Steyvers and Tom Griffiths. 2007. Probabilistic topic models. Handb. Latent Semant. Anal. 427, 7 (2007), 424–440.Google Scholar
- Xiaobing Sun, Bixin Li, Hareton Leung, Bin Li, and Yun Li. 2015. MSR4SM: Using topic models to effectively mining software repositories for software maintenance tasks. Inf. Softw. Technol. 66 (2015), 1–12. DOI:https://doi.org/10.1016/j.infsof.2015.05.003Google ScholarDigital Library
- Yee W. Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. 2005. Sharing clusters among related groups: Hierarchical Dirichlet processes. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 1385–1392.Google Scholar
- Rajeev Thakur, Rolf Rabenseifner, and William Gropp. 2005. Optimization of collective communication operations in MPICH. Int. J. High Perf. Comput. Applic. 19, 1 (2005), 49–66.Google ScholarDigital Library
- Stephen W. Thomas, Bram Adams, Ahmed E. Hassan, and Dorothea Blostein. 2014. Studying software evolution using topic models. Sci. Comput. Prog. 80 (2014), 457–479. DOI:https://doi.org/10.1016/j.scico.2012.08.003Google ScholarCross Ref
- Kai Tian, Meghan Revelle, and Denys Poshyvanyk. 2009. Using latent Dirichlet allocation for automatic categorization of software. In Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories. IEEE, 163–166.Google ScholarDigital Library
- Zhongyuan Tian, Harumichi Yokoyama, and Takuya Araki. 2019. Parallel latent Dirichlet allocation using vector processors. In Proceedings of the IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, 1548–1555.Google Scholar
- Calin Rares Turliuc, Luke Dickens, Alessandra Russo, and Krysia Broda. 2016. Probabilistic abductive logic programming using Dirichlet priors. Int. J. Approx. Reas. 78 (2016), 223–240. DOI:https://doi.org/10.1016/j.ijar.2016.07.001Google ScholarDigital Library
- Duc-Thuan Vo and Cheol-Young Ock. 2015. Learning to classify short text from scientific documents using topic models with various types of knowledge. Exp. Syst. Applic. 42, 3 (2015), 1684–1698. DOI:https://doi.org/10.1016/j.eswa.2014.09.031Google ScholarDigital Library
- Konstantin Vorontsov, Oleksandr Frei, Murat Apishev, Peter Romov, and Marina Dudarenko. 2015. BigARTM: Open source library for regularized multimodal topic modeling of large collections. In Proceedings of the International Conference on Analysis of Images, Social Networks and Texts. Springer, 370–381.Google ScholarCross Ref
- Konstantin Vorontsov and Anna Potapenko. 2015. Additive regularization of topic models. Mach. Learn. 101, 1–3 (2015), 303–323.Google ScholarDigital Library
- Nicholas Vretos, Nikos Nikolaidis, and Ioannis Pitas. 2012. Video fingerprinting using latent Dirichlet allocation and facial images. Pattern Recog. 45, 7 (2012), 2489–2498. DOI:https://doi.org/10.1016/j.patcog.2011.12.022Google ScholarDigital Library
- Ivan Vulić, Wim De Smet, and Marie-Francine Moens. 2013. Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora. Inf. Retr. 16, 3 (2013), 331–368. DOI:https://doi.org/10.1007/s10791-012-9200-5Google ScholarDigital Library
- Ivan Vulić, Wim De Smet, Jie Tang, and Marie-Francine Moens. 2015. Probabilistic topic modeling in multilingual settings: An overview of its methodology and applications. Inf. Proc. Manag. 51, 1 (2015), 111–147. DOI:https://doi.org/10.1016/j.ipm.2014.08.003Google ScholarCross Ref
- Martin J. Wainwright, Michael I. Jordan et al. 2008. Graphical models, exponential families, and variational inference. Found. Trends® Mach. Learn. 1, 1–2 (2008), 1–305.Google ScholarDigital Library
- Hanna M Wallach. 2006. Topic modeling: Beyond bag-of-words. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 977–984.Google ScholarDigital Library
- Hanna M. Wallach, Iain Murray, Ruslan Salakhutdinov, and David Mimno. 2009. Evaluation methods for topic models. In Proceedings of the 26th International Conference on Machine Learning.1105–1112.Google ScholarDigital Library
- Chong Wang, David Blei, and David Heckerman. 2012. Continuous time dynamic topic models. arXiv preprint arXiv:1206.3298 (2012).Google Scholar
- Di Wang and Ahmad Al-Rubaie. 2015. Incremental learning with partial-supervision based on hierarchical Dirichlet process and the application for document classification. Appl. Soft Comput. 33 (2015), 250–262. DOI:https://doi.org/10.1016/j.asoc.2015.04.044Google ScholarDigital Library
- Jin Wang, Xiangping Sun, Mary F. H. She, Abbas Kouzani, and Saeid Nahavandi. 2013. Unsupervised mining of long time series based on latent topic model. Neurocomputing 103 (2013), 93–103. DOI:https://doi.org/10.1016/j.neucom.2012.09.008Google ScholarDigital Library
- Xuerui Wang and Andrew McCallum. 2006. Topics over time: A non-Markov continuous-time model of topical trends. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 424–433.Google ScholarDigital Library
- Xuerui Wang, Andrew McCallum, and Xing Wei. 2007. Topical n-grams: Phrase and topic discovery, with an application to information retrieval. In Proceedings of the 7th IEEE International Conference on Data Mining (ICDM’07). IEEE, 697–702.Google ScholarDigital Library
- Xiang Wang, Kai Zhang, Xiaoming Jin, and Dou Shen. 2009. Mining common topics from multiple asynchronous text streams. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining. ACM, 192–201.Google ScholarDigital Library
- Yi Wang, Hongjie Bai, Matt Stanton, Wen-Yen Chen, and Edward Y. Chang. 2009. PLDA: Parallel latent Dirichlet allocation for large-scale applications. In Proceedings of the International Conference on Algorithmic Applications in Management. 301–314. DOI:https://doi.org/10.1007/978-3-642-02158-9_26Google ScholarDigital Library
- Yu Wang, Jiebo Luo, Richard Niemi, Yuncheng Li, and Tianran Hu. 2016. Catching fire via “likes”: Inferring topic preferences of Trump followers on Twitter. In Proceedings of the 10th International AAAI Conference on Web and Social Media.Google Scholar
- Yi Wang, Xuemin Zhao, Zhenlong Sun, Hao Yan, Lifeng Wang, Zhihui Jin, Liubin Wang, Yang Gao, Jia Zeng, Qiang Yang et al. 2014. Towards topic modeling for big data. arXiv preprint arXiv:1405.4402 (2014).Google Scholar
- Lino Wehrheim. 2019. Economic history goes digital: Topic modeling the journal of economic history. Cliometrica 13, 1 (2019), 83–125.Google ScholarCross Ref
- Jianshu Weng, Ee-Peng Lim, Jing Jiang, and Qi He. 2010. Twitterrank: Finding topic-sensitive influential Twitterers. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. ACM, 261–270.Google ScholarDigital Library
- Erik Wiener, Jan O. Pedersen, Andreas S. Weigend, et al. 1995. A neural network approach to topic spotting. In Proceedings of the 4th Symposium on Document Analysis and Information Retrieval.Google Scholar
- Andrew T. Wilson and Peter A. Chew. 2010. Term weighting schemes for latent Dirichlet allocation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. 465–473. DOI:https://doi.org/1857999.1858069Google Scholar
- Yueshen Xu, Jianwei Yin, Jianbin Huang, and Yuyu Yin. 2018. Hierarchical topic modeling with automatic knowledge mining. Exp. Syst. Applic. 103 (2018), 106–117.Google ScholarCross Ref
- Yueshen Xu, Yuyu Yin, and Jianwei Yin. 2017. Tackling topic general words in topic modeling. Eng. Applic. Artif. Intell. 62 (2017), 124–133.Google ScholarDigital Library
- Guangxu Xun, Yaliang Li, Wayne Xin Zhao, Jing Gao, and Aidong Zhang. 2017. A correlated topic model using word embeddings. In Proceedings of the International Joint Conference on Artificial Intelligence. 4207–4213.Google ScholarCross Ref
- Feng Yan, Ningyi Xu, and Yuan Qi. 2009. Parallel inference for latent Dirichlet allocation on graphics processing units. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 2134–2142.Google Scholar
- Shuang Yang, Chunfeng Yuan, Weiming Hu, and Xinmiao Ding. 2014. A hierarchical model based on latent Dirichlet allocation for action recognition. In Proceedings of the 22nd International Conference on Pattern Recognition. IEEE, 2613–2618. DOI:https://doi.org/10.1109/ICPR.2014.451Google ScholarDigital Library
- Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik. 2019. A multilingual topic model for learning weighted topic links across corpora with low comparability. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 1243–1248.Google ScholarCross Ref
- Yi Yang, Doug Downey, and Jordan Boyd-Graber. 2015. Efficient methods for incorporating knowledge into topic models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 308–317.Google ScholarCross Ref
- Limin Yao, David Mimno, and Andrew McCallum. 2009. Efficient methods for topic model inference on streaming document collections. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 937–946. DOI:https://doi.org/10.1145/1557019.1557121Google ScholarDigital Library
- Liang Yao, Yin Zhang, Baogang Wei, Lei Li, Fei Wu, Peng Zhang, and Yali Bian. 2016. Concept over time: the combination of probabilistic topic model with wikipedia knowledge. Exp. Syst. Applic. 60 (2016), 27–38.Google ScholarDigital Library
- Chyi-Kwei Yau, Alan Porter, Nils Newman, and Arho Suominen. 2014. Clustering scientific documents with topic modeling. Scientometrics 100, 3 (2014), 767–786.Google ScholarDigital Library
- Hsiang-Fu Yu, Cho-Jui Hsieh, Hyokun Yun, S. V. N. Vishwanathan, and Inderjit S. Dhillon. 2015. A scalable asynchronous distributed algorithm for topic modeling. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1340–1350.Google Scholar
- Bo Yuan, Xinbo Gao, Zhenxing Niu, and Qi Tian. 2019. Discovering latent topics by Gaussian latent Dirichlet allocation and spectral clustering. ACM Trans. Multimedia Comput. Commun. Applic. 15, 1 (2019), 25.Google ScholarDigital Library
- Lele Yut, Ce Zhang, Yingxia Shao, and Bin Cui. 2017. LDA* a robust and large-scale topic modeling system. Proc. VLDB Endow. 10, 11 (2017), 1406–1417.Google ScholarDigital Library
- Manzil Zaheer, Amr Ahmed, and Alexander J. Smola. 2017. Latent LSTM allocation joint clustering and non-linear dynamic modeling of sequential data. In Proceedings of the 34th International Conference on Machine Learning. JMLR.org, 3967–3976.Google Scholar
- Jianping Zeng, Jiangjiao Duan, Wenjun Cao, and Chengrong Wu. 2012. Topics modeling based on selective Zipf distribution. Exp. Syst. Applic. 39, 7 (2012), 6541–6546. DOI:https://doi.org/10.1016/j.eswa.2011.12.051Google ScholarDigital Library
- Ke Zhai and Jordan Boyd-Graber. 2013. Online latent Dirichlet allocation with infinite vocabulary. In Proceedings of the International Conference on Machine Learning. 561–569.Google Scholar
- Ke Zhai, Jordan Boyd-Graber, Nima Asadi, and Mohamad L. Alkhouja. 2012. Mr. LDA: A flexible large scale topic modeling package using variational inference in MapReduce. In Proceedings of the 21st International Conference on World Wide Web. ACM, 879–888. DOI:https://doi.org/10.1145/2187836.2187955Google ScholarDigital Library
- Jianwen Zhang, Yangqiu Song, Changshui Zhang, and Shixia Liu. 2010. Evolutionary hierarchical Dirichlet processes for multiple correlated time-varying corpora. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1079–1088.Google ScholarDigital Library
- Tao Zhang, Kang Liu, Jun Zhao, et al. 2013. Cross lingual entity linking with bilingual topic model.Proceedings of the International Joint Conference on Artificial Intelligence. 2218–2224.Google Scholar
- Bing Zhao and Eric P. Xing. 2006. BiTAM: Bilingual topic admixture models for word alignment. In Proceedings of the COLING/ACL on Main Conference Poster Sessions. Association for Computational Linguistics, 969–976.Google Scholar
- Bing Zhao and Eric P. Xing. 2007. HM-BiTAM: Bilingual topic exploration, word alignment, and translation. Advances in Neural Information Processing Systems 20 (2007), 1689–1696.Google Scholar
- Feng Zhao, Yajun Zhu, Hai Jin, and Laurence T. Yang. 2016. A personalized hashtag recommendation approach using LDA-based topic model in microblog environment. Fut. Gen. Comput. Syst. 65 (2016), 196–206.Google ScholarDigital Library
- Huasha Zhao, Biye Jiang, John F. Canny, and Bobby Jaros. 2015. Same but different: Fast and high quality Gibbs parameter estimation. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1495–1502.Google ScholarDigital Library
- Wenjun Zhu, Liqing Zhang, and Qianwei Bian. 2012. A hierarchical latent topic model based on sparse coding. Neurocomputing 76, 1 (2012), 28–35. DOI:https://doi.org/10.1016/j.neucom.2010.11.038Google ScholarDigital Library
- Elaine Zosa and Mark Granroth-Wilding. 2019. Multilingual dynamic topic model. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP’19). 1388–1396.Google ScholarCross Ref
- Jialing Zou, Qixiang Ye, Yanting Cui, Fang Wan, Kun Fu, and Jianbin Jiao. 2016. Collective motion pattern inference via locally consistent latent Dirichlet allocation. Neurocomputing 184 (2016), 221–231. DOI:https://doi.org/10.1016/j.neucom.2015.08.108Google ScholarDigital Library
- Yuan Zuo, Junjie Wu, Hui Zhang, Hao Lin, Fei Wang, Ke Xu, and Hui Xiong. 2016. Topic modeling of short texts: A pseudo-document view. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2105–2114.Google ScholarDigital Library
Index Terms
- Topic Modeling Using Latent Dirichlet allocation: A Survey
Recommendations
Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey
Topic modeling is one of the most powerful techniques in text mining for data mining, latent data discovery, and finding relationships among data and text documents. Researchers have published many articles in the field of topic modeling and applied in ...
Latent dirichlet allocation based multi-document summarization
AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text dataExtraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. In this article we use Latent Dirichlet Allocation to capture the events being ...
Obtaining single document summaries using latent dirichlet allocation
ICONIP'12: Proceedings of the 19th international conference on Neural Information Processing - Volume Part IVIn this paper, we present a novel approach that makes use of topic models based on Latent Dirichlet allocation(LDA) for generating single document summaries. Our approach is distinguished from other LDA based approaches in that we identify the summary ...
Comments