Elsevier

Computers & Security

Volume 110, November 2021, 102421
Computers & Security

An efficient multistage phishing website detection model based on the CASE feature framework: Aiming at the real web environment

https://doi.org/10.1016/j.cose.2021.102421Get rights and content
Under a Creative Commons license
open access

Abstract

Phishing has become a favorite method of hackers for committing data theft and continues to evolve. As long as phishing websites continue to operate, many more people and companies will suffer privacy leaks or financial losses. Therefore, the demand for fast and accurate phishing website detection grows stronger. However, the existing phishing detection methods do not fully analyze the features of phishing, and the performance and efficiency of the models only apply to certain limited datasets and need to be improved to be applied to the real web environment. This paper fully considers the social engineering principles of phishing, proposes a comprehensive and interpretable CASE feature framework and designs a multistage phishing detection model to effectively detect phishing sites, especially in the real web environment, where high efficiency and performance and extremely low false alarm rates are required. To fully verify the proposed method, two kinds of data experiments were carried out. One was the comparative experiments among different features and different detection models on CASE, which covers both classic machine learning and deep learning algorithms based on a constructed complex dataset. The other was a one-year phishing discovery experiment in the real web environment. The proposed method achieves better detection results under the premise of significantly shortening the execution time and works well in real phishing discovery, which proves its high practicability in reality.

Keywords

Phishing detection
CASE feature framework
Multistage model
Machine learning
Real web environment

Cited by (0)

Dong-Jie Liu is currently working toward the Ph.D. degree with Computer Network Information Center, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China. Her research interest includes machine learning, network security and blockchain.

Guang-Gang Geng received his Ph.D. degree from the State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China. He is currently a professor with the College of Cyber Security, Jinan University, Guangzhou. His current research interest include machine learning, web abuse detection and web search.

Xiao-Bo Jin received the Ph.D. degree in pattern recognition and intelligent systems from the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 2009. He is currently an associate professor with Department of Intelligent Science, School of Advanced Technology, Xi’an Jiaotong-Liverpool University, Suzhou, China. His current research interests include web mining and machine learning, pattern recognition, and neurocomputing.

Wei Wang received the Ph.D. degree in Nankai University, Tianjin, China. He once worked in CNNIC and Google and is currently a professor in Computer Network Information Center, Chinese Academy of Sciences, Beijing, China. His research interest includes domain name and blockchain.