当前位置: X-MOL 学术Sci. Program. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Hotspot Information Extraction Hybrid Solution of Online Posts’ Textual Data
Scientific Programming Pub Date : 2021-04-15 , DOI: 10.1155/2021/6619712
HuiRu Cao 1 , Xiaomin Li 2 , Songyao Lian 3 , Choujun Zhan 4
Affiliation  

Online posts have gradually become a major carrier of network public opinion in social media, and the social network hotspots are the important basis for the study of network public opinion. Therefore, it is significant to extract hotspots for monitoring Internet public opinion from online posts textual big data. However, the current hotspot extraction methods are focused on the users’ features that are based on textual big data with spam and low-quality content. Meanwhile, these methods seldomly consider the time span of posts and the popularity of users. Accordingly, this article presents a hotspots information extraction hybrid solution of online posts’ textual data. Firstly, a filtering strategy to obtain more high-quality textual data is designed. Secondly, the topic hot degree is presented by considering the average number of replies and the popularity of the participant. Thirdly, an improved co-word analysis technology is used to search the same topic posts and Bisecting k-means clustering algorithm using repliers’ popularity and key posts are designed for studying and monitoring the hotspots of online posts in a valid big data environment. Finally, the proposed algorithms are verified in experiments by extracting the hotspots of online posts from the dataset. The results show that the data filtering strategy can help to obtain more valuable information and decrease the computing time. The results also demonstrate that the proposed solution can help to obtain hotspots comparing the traditional methods, and the hot degree can reflect the trend of the online post by comparing the traditional methods.

中文翻译:

在线帖子文本数据的热点信息提取混合解决方案

网络帖子逐渐成为社交媒体中网络舆论的主要载体,社交网络热点是研究网络舆情的重要基础。因此,从在线帖子文本大数据中提取热点以监视Internet民意具有重要意义。但是,当前的热点提取方法专注于基于具有垃圾邮件和低质量内容的文本大数据的用户功能。同时,这些方法很少考虑发布的时间跨度和用户的受欢迎程度。因此,本文提出了一种在线帖子文本数据的热点信息提取混合解决方案。首先,设计了一种获取更多高质量文本数据的过滤策略。第二,通过考虑平均答复数和参与者的受欢迎程度来呈现主题热门程度。第三,改进的共词分析技术用于搜索相同的主题帖子,并利用复制者的受欢迎程度对等分的k均值聚类算法,并设计关键帖子以研究和监视有效大数据环境中在线帖子的热点。最后,通过从数据集中提取在线帖子的热点,在实验中验证了所提出的算法。结果表明,该数据过滤策略可以帮助获取更多有价值的信息,并减少计算时间。结果还表明,所提出的解决方案可以帮助比较传统方法获得热点,而热点程度可以通过比较传统方法来反映在线帖子的趋势。
更新日期:2021-04-15
down
wechat
bug