当前位置: X-MOL 学术The Electronic Library › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Phishing web site detection using diverse machine learning algorithms
The Electronic Library ( IF 1.5 ) Pub Date : 2020-01-02 , DOI: 10.1108/el-05-2019-0118
Ammara Zamir , Hikmat Ullah Khan , Tassawar Iqbal , Nazish Yousaf , Farah Aslam , Almas Anjum , Maryam Hamdani

This paper aims to present a framework to detect phishing websites using stacking model. Phishing is a type of fraud to access users’ credentials. The attackers access users’ personal and sensitive information for monetary purposes. Phishing affects diverse fields, such as e-commerce, online business, banking and digital marketing, and is ordinarily carried out by sending spam emails and developing identical websites resembling the original websites. As people surf the targeted website, the phishers hijack their personal information.,Features of phishing data set are analysed by using feature selection techniques including information gain, gain ratio, Relief-F and recursive feature elimination (RFE) for feature selection. Two features are proposed combining the strongest and weakest attributes. Principal component analysis with diverse machine learning algorithms including (random forest [RF], neural network [NN], bagging, support vector machine, Naive Bayes and k-nearest neighbour) is applied on proposed and remaining features. Afterwards, two stacking models: Stacking1 (RF + NN + Bagging) and Stacking2 (kNN + RF + Bagging) are applied by combining highest scoring classifiers to improve the classification accuracy.,The proposed features played an important role in improving the accuracy of all the classifiers. The results show that RFE plays an important role to remove the least important feature from the data set. Furthermore, Stacking1 (RF + NN + Bagging) outperformed all other classifiers in terms of classification accuracy to detect phishing website with 97.4% accuracy.,This research is novel in this regard that no previous research focusses on using feed forward NN and ensemble learners for detecting phishing websites.

中文翻译:

使用不同机器学习算法进行钓鱼网站检测

本文旨在提出一种使用堆叠模型检测网络钓鱼网站的框架。网络钓鱼是一种访问用户凭据的欺诈行为。攻击者出于金钱目的访问用户的个人和敏感信息。网络钓鱼影响电子商务、在线业务、银行和数字营销等多个领域,通常通过发送垃圾邮件和开发与原始网站相同的网站来实施。当人们浏览目标网站时,网络钓鱼者会劫持他们的个人信息。利用信息增益、增益比、Relief-F 和递归特征消除(RFE)等特征选择技术来分析网络钓鱼数据集的特征以进行特征选择。提出了结合最强和最弱属性的两个特征。使用各种机器学习算法(包括随机森林 [RF]、神经网络 [NN]、装袋、支持向量机、朴素贝叶斯和 k-最近邻)的主成分分析应用于提议和剩余特征。之后,通过结合得分最高的分类器,应用了两个堆叠模型:Stacking1 (RF + NN + Bagging) 和 Stacking2 (kNN + RF + Bagging),以提高分类准确率。提出的特征在提高所有分类准确率方面发挥了重要作用。分类器。结果表明,RFE 在从数据集中去除最不重要的特征方面发挥着重要作用。此外,Stacking1 (RF + NN + Bagging) 在分类准确率方面优于所有其他分类器,以 97.4% 的准确率检测网络钓鱼网站。
更新日期:2020-01-02
down
wechat
bug