A novel approach for phishing URLs detection using lexical based machine learning in a real-time environment,Computer Communications

当前位置： X-MOL 学术 › Comput. Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A novel approach for phishing URLs detection using lexical based machine learning in a real-time environment
Computer Communications ( IF 6 ) Pub Date : 2021-04-30 , DOI: 10.1016/j.comcom.2021.04.023
Brij B. Gupta , Krishna Yadav , Imran Razzak , Konstantinos Psannis , Arcangelo Castiglione , Xiaojun Chang

In recent times, we can see a massive increase in the number of devices that are being connected to the internet. These devices include but are not limited to smartphones, IoT, and cloud networks. In comparison to other possible cyber-attacks, these days, hackers are targeting these devices with phishing attacks since it exploits human vulnerabilities rather than system vulnerabilities. In a phishing attack, an online user is deceived by a seemingly trusted entity to give their personal data, i.e., login credentials or credit card details. When this private information is leaked to the hackers, this information becomes the source of other sophisticated attacks. In recent times many researchers have proposed the machine learning-based approach to solve phishing attacks; however, they have used a large number of features to develop reliable phishing detection techniques. A large number of features requires large processing powers to detect phishing, which makes it very much unsuitable for resource constrained devices. To address this issue, we have developed a phishing detection approach that only needs nine lexical features for effectively detecting phishing attacks. We used ISCXURL-2016 dataset for our experimental purpose, where 11964 instances of legitimate and phishing URLs are used. We have tested our approach against different machine learning classifiers and have obtained the highest accuracy of 99.57% with the Random forest algorithm.

中文翻译：

在实时环境中使用基于词法的机器学习进行网络钓鱼URL检测的新方法

最近，我们可以看到连接到互联网的设备数量大量增加。这些设备包括但不限于智能手机，IoT和云网络。与其他可能的网络攻击相比，如今，黑客利用网络钓鱼攻击将这些设备作为攻击目标，因为它利用的是人类漏洞而不是系统漏洞。在网络钓鱼攻击中，看似受信任的实体欺骗了在线用户，以提供其个人数据，即登录凭据或信用卡详细信息。当此私人信息泄露给黑客时，此信息将成为其他复杂攻击的来源。最近，许多研究人员提出了基于机器学习的方法来解决网络钓鱼攻击。然而，他们使用了大量功能来开发可靠的网络钓鱼检测技术。大量功能需要大量处理能力才能检测网络钓鱼，这使其非常不适合资源受限的设备。为了解决此问题，我们开发了一种网络钓鱼检测方法，该方法仅需要九种词汇特征即可有效地检测网络钓鱼攻击。我们将ISCXURL-2016数据集用于实验目的，其中使用了11964年的合法URL和网络钓鱼URL实例。我们已经针对不同的机器学习分类器测试了我们的方法，并使用随机森林算法获得了99.57％的最高准确性。为了解决此问题，我们开发了一种网络钓鱼检测方法，该方法仅需要九种词汇特征即可有效地检测网络钓鱼攻击。我们将ISCXURL-2016数据集用于实验目的，其中使用了11964年的合法URL和网络钓鱼URL实例。我们已经针对不同的机器学习分类器测试了我们的方法，并使用随机森林算法获得了99.57％的最高准确性。为了解决此问题，我们开发了一种网络钓鱼检测方法，该方法仅需要九种词汇特征即可有效地检测网络钓鱼攻击。我们将ISCXURL-2016数据集用于实验目的，其中使用了11964年的合法URL和网络钓鱼URL实例。我们已经针对不同的机器学习分类器测试了我们的方法，并使用随机森林算法获得了99.57％的最高准确性。

更新日期：2021-05-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>