Editorial Notes
The editors have requested minor, non-substantive changes to the VoR and, in accordance with ACM policies, a Corrected VoR was published on February 9, 2021. For reference purposes the VoR may still be accessed via the Supplemental Material section on this page.
Abstract
The explosion of news text and the development of artificial intelligence provide a new opportunity and challenge to provide high-quality media monitoring service. In this article, we propose a semantic analysis approach based on the Latent Dirichlet Allocation (LDA) and Apriori algorithm, and we realize application to improve media monitoring reports by mining large-scale news text. First, we propose to use LDA model to mine news text topic words and reducing news dimensionality. Then, we propose to use Apriori algorithm to discovering the relationship of topic words. Finally, we discovery the relevance of news text topic words and show the intensity and dependency among topic words through drawing. This application can realize to extract the news topics and discover the correlation and dependency among news topics in mass news text. The results show that the method based on LDA and Apriori can help the media monitoring staff to better understand the hidden knowledge in the news text and improve the media analysis report.
Supplemental Material
Available for Download
Version of Record for "Knowledge Discovery of News Text Based on Artificial Intelligence" by Guangce et al., ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 20, Issue 1 (TALLIP 20:1).
- Zhao Ai-hua, Liu Pei-yu, and Zheng Yan. 2013. Subtopic division in news topic based on latent dirichlet allocation. J. Chinese Comput. Syst. 34, 4 (2013), 732--737.Google Scholar
- R. Agarwal and Swami A. N. Imielinskit. 1993. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 207--216.Google Scholar
- Fan Bingsi. 2012. Text mining: information analysis method for the social science. Library Info. Service 56, 8 (2012), 6--9.Google Scholar
- D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent dirichlet allocation. J. Mach. Learn. Res. 3, 3 (2003), 993--1022.Google ScholarCross Ref
- Christopher M. Bishop. 2006. Pattern recognition and machine learning. J. Electr. Imag. 16, 4 (2006), 140--155.Google Scholar
- D. M. Blei. 2012. Probabilistic topic models. Commun. ACM 55, 4 (2012), 77--84.Google ScholarDigital Library
- Chen Chao. 2015. How to face the information work of many favorable policies? Compet. Intell. 4 (2015), 3.Google Scholar
- H. Cherfi, A. Napoli, and Y. Toussaint. 2006. Towards a text mining methodology using association rule extraction. Soft Comput. 10, 5 (2006), 431--441.Google ScholarDigital Library
- M. Y. Chen, M. N. Wu, C. C. Chen, Y. L Chen, and H. E. Lin. 2014. Recommendation-aware smartphone sensing system. J. Appl. Res. Technol. 26, 6 (2014), 1040--1050.Google ScholarCross Ref
- He Defang and Zeng Jianli. 2012. Study on in-depth integration of library collections based on semantics. J. Library Sci. China 4, (2012), 36--40.Google Scholar
- Li Gang and Li Yang. 2016. Decision-oriented collaborative innovation intelligence service of think-tank: The functional orientation and system construction. Library Info. 1 (2016), 36--43Google Scholar
- T. L. Griffiths and M. Steyvers. 2004. Finding scientific topics. Proc. Natl. Acad. Sci. U.S.A. 101 (2004), 5228--5235.Google ScholarCross Ref
- Thomas Hofmann. 2001. Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42 (2001), 177--196.Google ScholarCross Ref
- Qui Junping and Yu Fan. 2012. Theoretical research on semantization of library resources based on informetric analysis. J. Library Sci. China 7, (2012), 71--78.Google Scholar
- Cao Lina and Tang Xijin. 2014. Trends of BBS topics based on dynamic topic model. J. Manage. Sci. China 17, 11 (2014), 109--121.Google Scholar
- P. Lenca, B. Valiant, and S. Lallich. 2006. On the robustness of association rules. In Proceedings of the IEEE Conference on Cybernetics and Intelligent Systems. 2006. 1--6. IEEE.Google Scholar
- Hemant Misra, François Yvon, Olivier Cappé, et al. 2011. Text segmentation: A topic modeling perspective original research. Info. Process. Manage. 47, 4 (2011), 528--544.Google ScholarDigital Library
- Baojun Ma, Nan Zhang, Guannan Liu, et al. 2016. Semantic search for public opinions on urban affairs: A probabilistic topic modeling-based approach. Info. Process. Manage. 52 (2016), 430--445.Google ScholarDigital Library
- B. Minaei-Bidgoli, R. Barmaki, and M. Nasiri. 2013. Mining numerical association rules via multi-objective genetic algorithms. Info. Sci. 233, 2 (2013), 15--24.Google Scholar
- Jay M. Ponte and W. Bruce Croft. 1998. A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 275--281.Google Scholar
- C. H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. 1998. Latent semantic indexing: A probabilistic analysis. J. Comput. Syst. Sci. 61, 2 (1998), 217--235.Google ScholarDigital Library
- Fabián Riquelme and Pablo González-Cantergiani. 2016. Measuring user influence on Twitter: A survey. Info. Process. Manage. 52, 5 (2016), 949--975.Google ScholarDigital Library
- J. Rong, HQ Vu, R. Law, and G. Li. 2012. A behavioral analysis of web sharers and browsers in hong kong using targeted association rule mining. Tour. Manage. 33, 4 (2012), 731--740.Google ScholarCross Ref
- G. Salton, A. Wong, and C. S. Yang. 1975. A vector space model for automatic indexing. Commun. ACM 18, 11 (1975), 613--620.Google ScholarDigital Library
- A. Shutz and P. Buitelaar. 2005. RelExt: A tool for relation extraction from text in ontology extension. In Proceedings of the 4th International Semantic Web Conference (ISWC’05). Springer, Berlin, 593--606.Google Scholar
- Y. A. Sekhavat and O. Hoeber. 2013. Visualizing association rules using linked matrix, graph, and detail views. Int. J. Intell. Sci. 3, 1 (2013), 34--49.Google ScholarCross Ref
- Chen Xiaomei, Bi Qiang, Teng Guangqing, et al. 2014. A study on the knowledge discovery dimension frame for digital library based on semantic web. J. China Soc. Sci. Tech. Info. 33, 2 (2014), 148--157.Google Scholar
- Zhenlei Yan and Jie Zhou. 2015. Optimal answerer ranking for new questions in community question answering. Info. Process. Manage. 51, 1 (2015), 163--178.Google ScholarCross Ref
- Yongwook Yoon and Gary G. Lee. 2013. Two scalable algorithms for associative text classification. Info. Process. Manage. 49, 2 (2013), 484--496.Google ScholarDigital Library
- M. J. Zaki. 2000. Scalable algorithm for association mining. IEEE Trans. Knowl. Data Eng. 12, (2000), 372--390.Google ScholarDigital Library
Index Terms
- Knowledge Discovery of News Text Based on Artificial Intelligence
Recommendations
Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...
Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm
AbstractThe Latent Dirichlet Allocation (LDA) topic model is a popular research topic in the field of text mining. In this paper, Sentiment Word Co-occurrence and Knowledge Pair Feature Extraction based LDA Short Text Clustering Algorithm (SKP-LDA) is ...
Heterogeneous-Length Text Topic Modeling for Reader-Aware Multi-Document Summarization
More and more user comments like Tweets are available, which often contain user concerns. In order to meet the demands of users, a good summary generating from multiple documents should consider reader interests as reflected in reader comments. In this ...
Comments