当前位置: X-MOL 学术Arab. J. Sci. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Evolutionary-based Random Weight Networks with Taguchi Method for Arabic Web Pages Classification
Arabian Journal for Science and Engineering ( IF 2.9 ) Pub Date : 2021-02-05 , DOI: 10.1007/s13369-020-05301-z
Arwa Shawabkeh , Hossam Faris , Ibrahim Aljarah , Bilal Abu-Salih , Dabiah Alboaneen , Nouh Alhindawi

Nowadays, a huge number of web documents are available on the Internet, which makes the retrieval process of a specific topic very difficult, where some irrelevant pages may be retrieved as well. The automatic classification of web documents and pages has an essential application in different domains such as medicine, health, science, and information technology. A large number of web pages classification methods have been proposed to improve the search capabilities, especially in English language. In addition, the current classification methods attempt to classify the English web pages, and at the same time to reduce the high dimensionality of features extracted from these web pages. Due to the lack of classification methods for other languages, this paper focuses on Arabic web pages classification according to its scarcity as well as the importance of the Arabic language. In particular, we propose an evolutionary model based on binary particle swarm optimization (BPSO) combined with random weight networks (RWNs) as an induction algorithm to reduce the high dimensionality of features in the Arabic web pages and to perform document classification automatically. The datasets used in this paper were collected from popular Arabic websites. We collected three different datasets relating to three different fields, namely Computer Science, Science, and Health. Further, Taguchi method is incorporated to locate the best parameters of the proposed algorithm. The experimental results showed that the proposed model gives better performance results for Arabic web pages classification. In addition, an analysis study was conducted to identify the most important features learned from the proposed model as well as the most important tags. The results showed that list tag has obtained the highest percentage, which reflect its effectiveness on the classification of Arabic web pages.



中文翻译:

Taguchi方法的基于进化的随机权重网络用于阿拉伯网页分类

如今,Internet上有大量的Web文档可用,这使得特定主题的检索过程变得非常困难,其中一些不相关的页面也可能被检索到。Web文档和页面的自动分类在医学,保健,科学和信息技术等不同领域具有重要的应用。已经提出了许多网页分类方法以改善搜索能力,尤其是英语。另外,当前的分类方法试图对英语网页进行分类,同时降低从这些网页提取的特征的高维性。由于缺乏其他语言的分类方法,本文根据阿拉伯语网页的稀缺性以及阿拉伯语的重要性,着重介绍阿拉伯语网页的分类。特别是,我们提出了一种基于二进制粒子群优化(BPSO)结合随机权重网络(RWNs)的进化模型作为归纳算法,以减少阿拉伯语网页中特征的高维性并自动执行文档分类。本文使用的数据集来自流行的阿拉伯语网站。我们收集了与三个不同领域(即计算机科学,科学和健康)相关的三个不同数据集。此外,采用Taguchi方法来定位所提出算法的最佳参数。实验结果表明,该模型对阿拉伯网页分类具有较好的性能。此外,进行了一项分析研究,以确定从建议的模型中学到的最重要的功能以及最重要的标签。结果表明,列表标记获得了最高的百分比,这反映了其在阿拉伯网页分类中的有效性。

更新日期:2021-02-05
down
wechat
bug