research-article

Knowledge Discovery of News Text Based on Artificial Intelligence

Authors:
Ruan Guangce

Information Management Department, East China Normal University, Minhang, Shanghai, China

Information Management Department, East China Normal University, Minhang, Shanghai, China
View Profile

,
Xia Lei

Lecture & Exhibition Center, Shanghai Library, Huai Hai Zhong Lu, Shanghai, China

Lecture & Exhibition Center, Shanghai Library, Huai Hai Zhong Lu, Shanghai, China
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 20 Issue 1Article No.: 6pp 1–18https://doi.org/10.1145/3418062

Published:23 November 2020Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Editorial Notes

The editors have requested minor, non-substantive changes to the VoR and, in accordance with ACM policies, a Corrected VoR was published on February 9, 2021. For reference purposes the VoR may still be accessed via the Supplemental Material section on this page.

Abstract

The explosion of news text and the development of artificial intelligence provide a new opportunity and challenge to provide high-quality media monitoring service. In this article, we propose a semantic analysis approach based on the Latent Dirichlet Allocation (LDA) and Apriori algorithm, and we realize application to improve media monitoring reports by mining large-scale news text. First, we propose to use LDA model to mine news text topic words and reducing news dimensionality. Then, we propose to use Apriori algorithm to discovering the relationship of topic words. Finally, we discovery the relevance of news text topic words and show the intensity and dependency among topic words through drawing. This application can realize to extract the news topics and discover the correlation and dependency among news topics in mass news text. The results show that the method based on LDA and Apriori can help the media monitoring staff to better understand the hidden knowledge in the news text and improve the media analysis report.

Supplemental Material

Available for Download

pdf

3418062-vor.pdf (3.7 MB)

Version of Record for "Knowledge Discovery of News Text Based on Artificial Intelligence" by Guangce et al., ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 20, Issue 1 (TALLIP 20:1).

References

Zhao Ai-hua, Liu Pei-yu, and Zheng Yan. 2013. Subtopic division in news topic based on latent dirichlet allocation. J. Chinese Comput. Syst. 34, 4 (2013), 732--737.Google Scholar
R. Agarwal and Swami A. N. Imielinskit. 1993. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 207--216.Google Scholar
Fan Bingsi. 2012. Text mining: information analysis method for the social science. Library Info. Service 56, 8 (2012), 6--9.Google Scholar
D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent dirichlet allocation. J. Mach. Learn. Res. 3, 3 (2003), 993--1022.Google ScholarCross Ref
Christopher M. Bishop. 2006. Pattern recognition and machine learning. J. Electr. Imag. 16, 4 (2006), 140--155.Google Scholar
D. M. Blei. 2012. Probabilistic topic models. Commun. ACM 55, 4 (2012), 77--84.Google ScholarDigital Library
Chen Chao. 2015. How to face the information work of many favorable policies? Compet. Intell. 4 (2015), 3.Google Scholar
H. Cherfi, A. Napoli, and Y. Toussaint. 2006. Towards a text mining methodology using association rule extraction. Soft Comput. 10, 5 (2006), 431--441.Google ScholarDigital Library
M. Y. Chen, M. N. Wu, C. C. Chen, Y. L Chen, and H. E. Lin. 2014. Recommendation-aware smartphone sensing system. J. Appl. Res. Technol. 26, 6 (2014), 1040--1050.Google ScholarCross Ref
He Defang and Zeng Jianli. 2012. Study on in-depth integration of library collections based on semantics. J. Library Sci. China 4, (2012), 36--40.Google Scholar
Li Gang and Li Yang. 2016. Decision-oriented collaborative innovation intelligence service of think-tank: The functional orientation and system construction. Library Info. 1 (2016), 36--43Google Scholar
T. L. Griffiths and M. Steyvers. 2004. Finding scientific topics. Proc. Natl. Acad. Sci. U.S.A. 101 (2004), 5228--5235.Google ScholarCross Ref
Thomas Hofmann. 2001. Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42 (2001), 177--196.Google ScholarCross Ref
Qui Junping and Yu Fan. 2012. Theoretical research on semantization of library resources based on informetric analysis. J. Library Sci. China 7, (2012), 71--78.Google Scholar
Cao Lina and Tang Xijin. 2014. Trends of BBS topics based on dynamic topic model. J. Manage. Sci. China 17, 11 (2014), 109--121.Google Scholar
P. Lenca, B. Valiant, and S. Lallich. 2006. On the robustness of association rules. In Proceedings of the IEEE Conference on Cybernetics and Intelligent Systems. 2006. 1--6. IEEE.Google Scholar
Hemant Misra, François Yvon, Olivier Cappé, et al. 2011. Text segmentation: A topic modeling perspective original research. Info. Process. Manage. 47, 4 (2011), 528--544.Google ScholarDigital Library
Baojun Ma, Nan Zhang, Guannan Liu, et al. 2016. Semantic search for public opinions on urban affairs: A probabilistic topic modeling-based approach. Info. Process. Manage. 52 (2016), 430--445.Google ScholarDigital Library
B. Minaei-Bidgoli, R. Barmaki, and M. Nasiri. 2013. Mining numerical association rules via multi-objective genetic algorithms. Info. Sci. 233, 2 (2013), 15--24.Google Scholar
Jay M. Ponte and W. Bruce Croft. 1998. A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 275--281.Google Scholar
C. H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. 1998. Latent semantic indexing: A probabilistic analysis. J. Comput. Syst. Sci. 61, 2 (1998), 217--235.Google ScholarDigital Library
Fabián Riquelme and Pablo González-Cantergiani. 2016. Measuring user influence on Twitter: A survey. Info. Process. Manage. 52, 5 (2016), 949--975.Google ScholarDigital Library
J. Rong, HQ Vu, R. Law, and G. Li. 2012. A behavioral analysis of web sharers and browsers in hong kong using targeted association rule mining. Tour. Manage. 33, 4 (2012), 731--740.Google ScholarCross Ref
G. Salton, A. Wong, and C. S. Yang. 1975. A vector space model for automatic indexing. Commun. ACM 18, 11 (1975), 613--620.Google ScholarDigital Library
A. Shutz and P. Buitelaar. 2005. RelExt: A tool for relation extraction from text in ontology extension. In Proceedings of the 4th International Semantic Web Conference (ISWC’05). Springer, Berlin, 593--606.Google Scholar
Y. A. Sekhavat and O. Hoeber. 2013. Visualizing association rules using linked matrix, graph, and detail views. Int. J. Intell. Sci. 3, 1 (2013), 34--49.Google ScholarCross Ref
Chen Xiaomei, Bi Qiang, Teng Guangqing, et al. 2014. A study on the knowledge discovery dimension frame for digital library based on semantic web. J. China Soc. Sci. Tech. Info. 33, 2 (2014), 148--157.Google Scholar
Zhenlei Yan and Jie Zhou. 2015. Optimal answerer ranking for new questions in community question answering. Info. Process. Manage. 51, 1 (2015), 163--178.Google ScholarCross Ref
Yongwook Yoon and Gary G. Lee. 2013. Two scalable algorithms for associative text classification. Info. Process. Manage. 49, 2 (2013), 484--496.Google ScholarDigital Library
M. J. Zaki. 2000. Scalable algorithm for association mining. IEEE Trans. Knowl. Data Eng. 12, (2000), 372--390.Google ScholarDigital Library

Index Terms

Knowledge Discovery of News Text Based on Artificial Intelligence
1. Computing methodologies
  1. Machine learning

Recommendations

Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02

Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...
Read More
Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm
Abstract
The Latent Dirichlet Allocation (LDA) topic model is a popular research topic in the field of text mining. In this paper, Sentiment Word Co-occurrence and Knowledge Pair Feature Extraction based LDA Short Text Clustering Algorithm (SKP-LDA) is ...
Read More
Heterogeneous-Length Text Topic Modeling for Reader-Aware Multi-Document Summarization

More and more user comments like Tweets are available, which often contain user concerns. In order to meet the demands of users, a good summary generating from multiple documents should consider reader interests as reflected in reader comments. In this ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 20, Issue 1
Special issue on Deep Learning for Low-Resource Natural Language Processing, Part 1 and Regular Papers
January 2021
332 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3439335
Editor:
Imed Zitouni
Google, USA
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 November 2020
- Accepted: 1 August 2020
- Revised: 1 July 2020
- Received: 1 February 2020
Published in tallip Volume 20, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
LDA
association rules
knowledge discovery
news text
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 283
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Knowledge Discovery of News Text Based on Artificial Intelligence

ACM Transactions on Asian and Low-Resource Language Information Processing

Editorial Notes

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Research on Multi-document Summarization Based on LDA Topic Model

Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm

Heterogeneous-Length Text Topic Modeling for Reader-Aware Multi-Document Summarization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Knowledge Discovery of News Text Based on Artificial Intelligence

ACM Transactions on Asian and Low-Resource Language Information Processing

Editorial Notes

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Research on Multi-document Summarization Based on LDA Topic Model

Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm

Heterogeneous-Length Text Topic Modeling for Reader-Aware Multi-Document Summarization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media