当前位置: X-MOL 学术Neural Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CNN-MHSA: A Convolutional Neural Network and multi-head self-attention combined approach for detecting phishing websites.
Neural Networks ( IF 7.8 ) Pub Date : 2020-02-29 , DOI: 10.1016/j.neunet.2020.02.013
Xi Xiao 1 , Dianyan Zhang 2 , Guangwu Hu 3 , Yong Jiang 1 , Shutao Xia 1
Affiliation  

Increasing phishing sites today have posed great threats due to their terribly imperceptible hazard. They expect users to mistake them as legitimate ones so as to steal user information and properties without notice. The conventional way to mitigate such threats is to set up blacklists. However, it cannot detect one-time Uniform Resource Locators (URL) that have not appeared in the list. As an improvement, deep learning methods are applied to increase detection accuracy and reduce the misjudgment ratio. However, some of them only focus on the characters in URLs but ignore the relationships between characters, which results in that the detection accuracy still needs to be improved. Considering the multi-head self-attention (MHSA) can learn the inner structures of URLs, in this paper, we propose CNN-MHSA, an Convolutional Neural Network (CNN) and the MHSA combined approach for highly-precise. To achieve this goal, CNN-MHSA first takes a URL string as the input data and feeds it into a mature CNN model so as to extract its features. In the meanwhile, MHSA is applied to exploit characters’ relationships in the URL so as to calculate the corresponding weights for the CNN learned features. Finally, CNN-MHSA can produce highly-precise detection result for a URL object by integrating its features and their weights. The thorough experiments on a dataset collected in real environment demonstrate that our method achieves 99.84% accuracy, which outperforms the classical method CNN-LSTM and at least 6.25% higher than other similar methods on average.



中文翻译:

CNN-MHSA:一种卷积神经网络和多头自注意组合方法,用于检测网络钓鱼网站。

如今,越来越多的网络钓鱼站点由于其难以察觉的危害而构成了巨大威胁。他们希望用户将其误认为是合法用户,以窃取用户信息和财产,恕不另行通知。减轻此类威胁的常规方法是建立黑名单。但是,它无法检测到尚未出现在列表中的一次性统一资源定位符(URL)。作为改进,应用了深度学习方法来提高检测精度并减少误判率。然而,其中一些仅关注URL中的字符,而忽略了字符之间的关系,这导致检测精度仍需要提高。考虑到多头自注意力(MHSA)可以学习URL的内部结构,因此,我们提出了CNN-MHSA,卷积神经网络(CNN)和MHSA结合使用,可实现高精度。为了实现此目标,CNN-MHSA首先将URL字符串作为输入数据,并将其输入到成熟的CNN模型中,以提取其特征。同时,将MHSA应用于URL中字符之间的关系,以计算CNN学习特征的相应权重。最后,CNN-MHSA通过集成其特征及其权重,可以为URL对象生成高精度的检测结果。在实际环境中收集的数据集上进行的彻底实验表明,我们的方法达到了99.84%的准确性,优于经典方法CNN-LSTM,并且比其他类似方法平均高出至少6.25%。CNN-MHSA首先将URL字符串作为输入数据,并将其输入到成熟的CNN模型中,以提取其特征。同时,将MHSA应用于URL中字符之间的关系,以计算CNN学习特征的相应权重。最后,CNN-MHSA通过集成其特征及其权重,可以为URL对象生成高精度的检测结果。在实际环境中收集的数据集上进行的彻底实验表明,我们的方法达到了99.84%的准确性,优于经典方法CNN-LSTM,并且比其他类似方法平均高出至少6.25%。CNN-MHSA首先将URL字符串作为输入数据,并将其输入到成熟的CNN模型中,以提取其特征。同时,将MHSA应用于URL中字符之间的关系,以计算CNN学习特征的相应权重。最后,CNN-MHSA通过集成其特征及其权重,可以为URL对象生成高精度的检测结果。在实际环境中收集的数据集上进行的彻底实验表明,我们的方法达到了99.84%的准确性,优于经典方法CNN-LSTM,并且比其他类似方法平均高出至少6.25%。MHSA用于在URL中利用字符之间的关系,以便为CNN学习的功能计算相应的权重。最后,CNN-MHSA通过集成其特征及其权重,可以为URL对象生成高精度的检测结果。在实际环境中收集的数据集上进行的彻底实验表明,我们的方法达到了99.84%的准确性,优于经典方法CNN-LSTM,并且比其他类似方法平均高出至少6.25%。MHSA用于在URL中利用字符之间的关系,以便为CNN学习的功能计算相应的权重。最后,CNN-MHSA通过集成其特征及其权重,可以为URL对象生成高精度的检测结果。在实际环境中收集的数据集上进行的彻底实验表明,我们的方法达到了99.84%的准确性,优于经典方法CNN-LSTM,并且比其他类似方法平均高出至少6.25%。

更新日期:2020-03-02
down
wechat
bug