Efficient Classification Model of Web News Documents using Machine Learning Algorithms,Computers & Security

当前位置： X-MOL 学术 › Comput. Secur. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Efficient Classification Model of Web News Documents using Machine Learning Algorithms
Computers & Security ( IF 4.8 ) Pub Date : 2020-11-01 , DOI: 10.1016/j.cose.2020.102006
Aos Mulahuwaish , Kevin Gyorick , Kayhan Zrar Ghafoor , Halgurd S. Maghdid , Danda B. Rawat

Abstract Web applications are regarded as a popular platform to exchange information with users. These applications have to be able to process Big-Data quickly and to serve users in a timely manner with accurate information posted in news portals which can be a huge challenge to overcome. Huge computation power is needed to crawl the web and process big-data and the methods are needed to be developed to reduce space and time complexity of this process. Data mining is considered to be a solution to mitigate the aforementioned challenges by extracting specific information based on explicit features. This paper proposes an efficient model for web that extracts news information and sorts news documents into four different categories business, technology & science, health and entertainment. Four different machine learning classifiers Support Vector Machine (SVM), K-Nearest Neighbors (kNN), Decision Tree (DT) and Long Short-Term Memory (LSTM) are compared. These classifiers are implemented separately and are then compared using accuracy and receiver operating characteristic curves. The attained results show that the accuracy of kNN was the worst at 88.72% and SVM was the best at 95.04%.

中文翻译：

使用机器学习算法的网络新闻文档的高效分类模型

摘要 Web 应用程序被认为是与用户交换信息的流行平台。这些应用程序必须能够快速处理大数据，并通过发布在新闻门户中的准确信息及时为用户提供服务，这可能是一个需要克服的巨大挑战。爬网和处理大数据需要巨大的计算能力，需要开发方法来减少这个过程的空间和时间复杂度。数据挖掘被认为是通过基于显式特征提取特定信息来缓解上述挑战的解决方案。本文提出了一种有效的网络模型，可以提取新闻信息并将新闻文档分为商业、技术与科学、健康和娱乐四个不同的类别。四种不同的机器学习分类器支持向量机 (SVM)、K-最近邻 (kNN)、决策树 (DT) 和长短期记忆 (LSTM) 进行了比较。这些分类器分别实施，然后使用准确性和接收器操作特性曲线进行比较。得到的结果表明，kNN 的准确率最差，为 88.72%，SVM 的准确率最好，为 95.04%。

更新日期：2020-11-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11