Skip to main content
Log in

Efficient deep learning techniques for the detection of phishing websites

  • Published:
Sādhanā Aims and scope Submit manuscript

Abstract

Phishing is a fraudulent practice and a form of cyber-attack designed and executed with the sole purpose of gathering sensitive information by masquerading the genuine websites. Phishers fool users by replicating the original and genuine contents to reveal personal information such as security number, credit card number, password, etc. There are many anti-phishing techniques such as blacklist- or whitelist-, heuristic-feature- and visual-similarity-based methods proposed as of today. Modern browsers adapt to reduce the chances of users getting trapped into a vicious agenda, but still users fall as prey to phishers and end up revealing their secret information. In a previous work, the authors proposed a machine learning approach based on heuristic features for phishing website detection and achieved an accuracy of 99.5% using 18 features. In this paper, we have proposed novel phishing URL detection models using (a) Deep Neural Network (DNN), (b) Long Short-Term Memory (LSTM) and (c) Convolution Neural Network (CNN) using only 10 features of our earlier work. The proposed technique achieves an accuracy of 99.52% for DNN, 99.57% for LSTM and 99.43% for CNN. The proposed techniques utilize only one third-party service feature, thus making it more robust to failure and increases the speed of phishing detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12

Similar content being viewed by others

Notes

  1. https://www.alexa.com/topsites.

  2. https://www.whois.com.

  3. http://archive.ics.uci.edu/ml/index.php.

  4. http://www.phishtank.com/index.php.

References

  1. Rao R S and Pais A R 2019 Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput. Appl. 31: 3851–3873

    Article  Google Scholar 

  2. APWG 2018 Phishing attack trends reports, first quarter 2018. https://docs.apwg.org//reports/apwg_trends_report_q1_2018.pdf, published July 31, 2018

  3. Fu A Y, Wenyin L and Deng X 2006 Detecting phishing web pages with visual similarity assessment based on earth mover’s distance (emd). IEEE Trans. Dependable Secure Comput. 3: 301–311

    Article  Google Scholar 

  4. Wenyin L, Huang G, Xiaoyue L, Min Z and Deng X 2005 Detection of phishing webpages based on visual similarity. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, ACM, pp. 1060–1061

  5. Hara M, Yamada A and Miyake Y 2009 Visual similarity-based phishing detection without victim site information. In: Proceedings of the IEEE Symposium on Computational Intelligence in Cyber Security, CICS’09, IEEE, pp. 30–36

  6. Rao R S and Ali S T 2015 A computer vision technique to detect phishing attacks. In: Proceedings of the Fifth International Conference on Communication Systems and network technologies (CSNT), IEEE, pp. 596–601

  7. Khonji M, Iraqi Y and Jones A 2013 Phishing detection: a literature survey. IEEE Commun. Surv. Tutor. 15: 2091–2121

    Article  Google Scholar 

  8. Zhang N and Yuan Y 2012 Phishing detection using neural network. Technical Report, Department of Computer Science, Department of Statistics, Stanford University (CS229 Lecture Notes)

  9. Le H, Pham Q, Sahoo D and Hoi S C 2018 URLNet: learning a URL representation with deep learning for malicious URL detection. arXiv preprint: arXiv:180203162

  10. Bahnsen A C, Bohorquez E C, Villegas S, Vargas J and González F A 2017 Classifying phishing URLs using recurrent neural networks. In: Proceedings of the APWG Symposium on Electronic Crime Research (eCrime), IEEE, pp. 1–8

  11. Whittaker C, Ryner B and Nazif M 2010 Large-scale automatic classification of phishing pages. In: Proceedings of the Network and Distributed System Security Symposium (NDSS), vol. 10

  12. Huh J H and Kim H 2011 Phishing detection with popular search engines: simple and effective. In: Proceedings of the International Symposium on Foundations and Practice of Security. Springer, pp. 194–207

  13. Jain A K and Gupta B B 2018 Two-level authentication approach to protect from phishing attacks in real time. J. Ambient Intell. Humaniz. Comput. 9: 1783–1796

    Article  Google Scholar 

  14. APWG 2014 Global phishing reports first half 2014. https://docs.apwg.org//reports/APWG_Global_Phishing_Report_1H_2014.pdf, published 25 September 2014

  15. Cao Y, Han W and Le Y 2008 Anti-phishing based on automated individual white-list. In: Proceedings of the 4th ACM Workshop on Digital Identity Management, ACM, pp. 51–60

  16. Zhang J, Porras P A and Ullrich J 2008 Highly predictive blacklisting. In: Proceedings of the USENIX Security Symposium, pp. 107–122

  17. Rao R S and Pais A R 2017 An enhanced blacklist method to detect phishing websites. In: Proceedings of the International Conference on Information Systems Security. Springer, pp. 323–333

  18. Zhang Y, Hong J I and Cranor L F 2007 Cantina: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th International Conference on World Wide Web, ACM, pp. 639–648

  19. Pan Y and Ding X 2006 December Anomaly based web phishing page detection. In: Proceedings of the 2006 22nd Annual Computer Security Applications Conference (ACSAC’06), IEEE, pp. 381–392

  20. Horng M H S, Fan P, Khan M, Run R and Chen J L R 2011 An efficient phishing webpage detector. Expert Syst. Appl. Int. J. 38: 12018–12027

    Article  Google Scholar 

  21. Gowtham R and Krishnamurthi I 2014 A comprehensive and efficacious architecture for detecting phishing webpages. Comput. Secur. 40: 23–37

    Article  Google Scholar 

  22. Srinivasa Rao R and Pais A R 2017 Detecting phishing websites using automation of human behavior. In: Proceedings of the 3rd ACM Workshop on Cyber-Physical System Security, ACM, pp. 33–42

  23. Xiang G, Hong J, Rose C P and Cranor L 2011 Cantina+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. 14(2): 1–28

    Article  Google Scholar 

  24. Zhang D, Yan Z, Jiang H and Kim T 2014 A domain-feature enhanced classification model for the detection of Chinese phishing e-business websites. Inf. Manag. 51: 845–853

    Article  Google Scholar 

  25. Chiew K L, Chang E H and Tiong W K 2015 Utilisation of website logo for phishing detection. Comput. Secur. 54: 16–26

    Article  Google Scholar 

  26. Moghimi M and Varjani A Y 2016 New rule-based phishing detection method. Expert Syst. Appl. 53: 231–242

    Article  Google Scholar 

  27. Aggarwal A, Rajadesingan A and Kumaraguru P 2012 Phishari: automatic realtime phishing detection on twitter. In: Proceedings of the eCrime Researchers Summit (eCrime), IEEE, pp. 1–12

  28. Marchal S, Armano G, Gröndahl T, Saari K, Singh N and Asokan N 2017 Off-the-hook: an efficient and usable client-side phishing prevention application. IEEE Trans. Comput. 66: 1717–1733

    Article  MathSciNet  Google Scholar 

  29. Sahingoz OK, Buber E, Demir O and Diri B 2019 Machine learning based phishing detection from URLs. Expert Syst. Appl. 117: 345–357

    Article  Google Scholar 

  30. Li Y, Yang Z, Chen X, Yuan H and Liu W 2019 A stacking model using URL and HTML features for phishing webpage detection. Future Gener. Comput. Syst. 94: 27–39

    Article  Google Scholar 

  31. Jain A K and Gupta B B 2018 Towards detection of phishing websites on client-side using machine learning based approach. Telecommun. Syst. 68: 687–700

    Article  Google Scholar 

  32. Yang P, Zhao G and Zeng P 2019 Phishing website detection based on multidimensional features driven by deep learning. IEEE Access 7: 15196–15209

    Article  Google Scholar 

  33. El-Alfy ESM 2017 Detection of phishing websites based on probabilistic neural networks and K-medoids clustering. Comput. J. 60: 1745–1759

    Article  Google Scholar 

  34. Zhao J, Wang N, Ma Q and Cheng Z 2018 Classifying malicious URLs using gated recurrent neural networks. In: Proceedings of the International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing. Springer, pp. 385–394

  35. Mohammad R M, Thabtah F and McCluskey L 2014 Predicting phishing websites based on self-structuring neural network. Neural Comput. Appl. 25: 443–458

    Article  Google Scholar 

  36. Feng F, Zhou Q, Shen Z, Yang X, Han L and Wang J 2018 The application of a novel neural network in the detection of phishing websites. J. Ambient Intelli. Humaniz. Comput. 1–15

  37. Yi P, Guan Y, Zou F, Yao Y, Wang W and Zhu T 2018 Web phishing detection using a deep learning framework. Wirel. Commun. Mobile Comput. 2018: Article ID 4678746

  38. Zhou Q, Chen H, Zhao H, Zhang G, Yong J and Shen J 2016 A local field correlated and Monte Carlo based shallow neural network model for non-linear time series prediction. EAI Endorsed Trans. Scalable Inf. Syst. 3: e5-1–e5-7

  39. Quinlan J R 1986 Induction of decision trees. Mach. Learn. 1:81–106

    Google Scholar 

  40. Smith C and Jin Y 2014 Evolutionary multi-objective generation of recurrent neural network ensembles for time series prediction. Neurocomputing 143: 302–311

    Article  Google Scholar 

  41. Mikolov T, Joulin A, Chopra S, Mathieu M and Ranzato M A 2014 Learning longer memory in recurrent neural networks. arXiv preprint: arXiv:1412.7753

  42. Jozefowicz R, Zaremba W and Sutskever I 2015 An empirical exploration of recurrent network architectures. In: Proceedings of the International Conference on Machine Learning, pp. 2342–2350

  43. Hochreiter S, Schmidhuber J 1997 Long short-term memory. Neural Comput. 9: 1735–1780

    Article  Google Scholar 

  44. Krizhevsky A, Sutskever I and Hinton G E 2012 Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105

  45. Pham N Q, Kruszewski G and Boleda G 2016 Convolutional neural network language models. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1153–1162

  46. Ramesh G, Krishnamurthi I and Kumar K S S 2014 An efficacious method for detecting phishing webpages through target domain identification. Decis. Support Syst. 61: 12–22

    Article  Google Scholar 

  47. He M, Horng S J, Fan P, Khan M K, Run R S, Lai J L, Chen R J and Sutanto A 2011 An efficient phishing webpage detector. Expert Syst. Appl. 38: 12,018–12,027

    Article  Google Scholar 

  48. Marchal S, Armano G, Gröndahl T, Saari K, Singh N and Asokan N 2017 Off-the-hook: an efficient and usable client-side phishing prevention application. IEEE Trans. Comput. 66: 1717–1733

    Article  MathSciNet  Google Scholar 

  49. Gowtham R and Krishnamurthi I 2014 A comprehensive and efficacious architecture for detecting phishing webpages. Comput Secur 40: 23–37

    Article  Google Scholar 

Download references

Acknowledgements

This research was funded by the Ministry of Electronics and Information Technology (MeitY), Government of India. The authors sincerely thank MeitY for financial support. The authors thank the anonymous referees for their comments and criticism, which have helped to improve the quality of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M Somesha.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Somesha, M., Pais, A.R., Rao, R.S. et al. Efficient deep learning techniques for the detection of phishing websites. Sādhanā 45, 165 (2020). https://doi.org/10.1007/s12046-020-01392-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12046-020-01392-4

Keywords

Navigation