当前位置: X-MOL 学术Wirel. Commun. Mob. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Embedded-Based Weighted Feature Selection Algorithm for Classifying Web Document
Wireless Communications and Mobile Computing Pub Date : 2020-09-15 , DOI: 10.1155/2020/8879054
G. Siva Shankar 1 , P. Ashokkumar 1 , R. Vinayakumar 2 , Uttam Ghosh 3 , Wathiq Mansoor 4 , Waleed S. Alnumay 5
Affiliation  

With the exponential increase in a number of web pages daily, it makes it very difficult for a search engine to list relevant web pages. In this paper, we propose a machine learning-based classification model that can learn the best features in each web page and helps in search engine listing. The existing methods for listing have lots of drawbacks like interfacing the normal operations of the website and crawling lots of useless information. Our proposed algorithm provides an optimal classification for websites which has a large number of web pages such as Wikipedia by just considering core information like link text, side information, and header text. We implemented our algorithm with standard benchmark datasets, and the results show that our algorithm outperforms the existing algorithms.

中文翻译:

基于嵌入式的Web文档加权特征选择算法

随着每天网页数量的成倍增加,搜索引擎很难列出相关的网页。在本文中,我们提出了一种基于机器学习的分类模型,该模型可以学习每个网页中的最佳功能并有助于搜索引擎列表。现有的列出方法有很多缺点,例如与网站的正常操作相接以及爬取许多无用的信息。我们提出的算法通过考虑诸如链接文本,辅助信息和标题文本之类的核心信息,为具有大量网页(如Wikipedia)的网站提供了最佳分类。我们使用标准基准数据集实现了我们的算法,结果表明我们的算法优于现有算法。
更新日期:2020-09-15
down
wechat
bug