Skip to main content
Log in

Phishing website detection using support vector machines and nature-inspired optimization algorithms

  • Published:
Telecommunication Systems Aims and scope Submit manuscript

Abstract

Phishing websites are amongst the biggest threats Internet users face today, and existing methods like blacklisting, using SSL certificates, etc. often fail to keep up with the increasing number of threats. This paper aims to utilise different properties of a website URL, and use a machine learning model to classify websites as phishing and non-phishing. These properties include the IP address length, the authenticity of the HTTPs request being sent by the website, usage of pop-up windows to enter data, Server Form Handler status, etc. A Support Vector Machine binary classifier trained on an existing dataset has been used to predict if a website was a legitimate website or not, by finding an optimum hyperplane to separate the two categories. This optimum hyperplane is found with the help of four optimization algorithms, the Bat Algorithm, the Firefly Algorithm, the Grey Wolf Optimiser algorithm and the Whale Optimization Algorithm, which are inspired by various natural phenomena. Amongst the four nature-inspired optimization algorithms, it has been determined that the Grey Wolf Optimiser algorithm’s performance is significantly better than that of the Firefly Algorithm, but there is no significant difference while comparing the performance of any other pair of algorithms. However, all four nature-inspired optimization algorithms perform significantly better than the grid-search optimized Random Forest classifier model described in earlier research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Webroot. Quarterly threat trends: Phishing attacks growing in scale and sophistication; 2017. Accessed 14 Nov 2017. https://s3-us-west-1.amazonaws.com/webroot-cms-cdn/8415/0585/3084/Webroot_Quarterly_Threat_Trends_September_2017.pdf.

  2. Chiew, K. L., Yong, K. S. C., & Tan, C. L. (2018). A survey of phishing attacks: their types, vectors and technical approaches. Expert Systems with Applications, 106, 1–20.

    Article  Google Scholar 

  3. Shabtai, A., Kanonov, U., Elovici, Y., Glezer, C., & Weiss, Y. (2012). “Andromaly”: a behavioral malware detection framework for android devices. Journal of Intelligent Information Systems., 38(1), 161–190.

    Article  Google Scholar 

  4. Dhamija, R., Tygar, J.D., & Hearst, M. (2006). Why phishing works. In Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp. 581–590.

  5. Alsharnouby, M., Alaca, F., & Chiasson, S. (2015). Why phishing still works: User strategies for combating phishing attacks. International Journal of Human-Computer Studies, 82, 69–82.

    Article  Google Scholar 

  6. Aburrous, M., Hossain, M.A., Dahal, K., & Thabtah, F. (2010). Predicting phishing websites using classification mining techniques with experimental case studies. In Information technology: new generations (ITNG), 2010 7th international conference on IEEE pp. 176–181.

  7. Adebowale, M. A., Lwin, K. T., Sanchez, E., & Hossain, M. A. (2019). Intelligent web-phishing detection and protection scheme using integrated features of images, frames and text. Expert Systems with Applications, 115, 300–313.

    Article  Google Scholar 

  8. Sanglerdsinlapachai, N., & Rungsawang, A. (2010). Using domain top-page similarity feature in machine learning-based web phishing detection. In Knowledge discovery and data mining, 2010. WKDD’10. 3rd international conference on IEEE, pp. 187–190.

  9. Jagadeesan, S., Kumar, A., & Kumar, S. (2018). URL phishing analysis using random forest. International Journal of Pure and Applied Mathematics, 118(20), 4159–4163.

    Google Scholar 

  10. Fette, I., Sadeh, N., & Tomasic, A. (2007). Learning to detect phishing emails. In Proceedings of the 16th international conference on world wide web. ACM, pp. 649–656.

  11. Şentürk Ş, Yerli E, Soğukpınar İ. (2017). Email phishing detection and prevention by using data mining techniques. In 2017 International conference on computer science and engineering (UBMK). IEEE, pp. 707–712.

  12. Abdelhamid, N., Ayesh, A., & Thabtah, F. (2014). Phishing detection based associative classification data mining. Expert Systems with Applications, 41(13), 5948–5959.

    Article  Google Scholar 

  13. Hara, M., Yamada, A., & Miyake, Y. (2009). Visual similarity-based phishing detection without victim site information. In Computational intelligence in cyber security, 2009. CICS’09. IEEE symposium on. IEEE, pp. 30–36.

  14. Afroz, S., & Greenstadt, R. (2011). Phishzoo: Detecting phishing websites by looking at them. In Semantic computing (ICSC), 2011 5th IEEE international conference on. IEEE, pp. 368–375.

  15. Medvet, E., Kirda, E., & Kruegel, C. (2008). Visual-similarity-based phishing detection. In Proceedings of the 4th international conference on security and privacy in communication networks, ACM, p. 22.

  16. Wenyin, L., Huang, G., Xiaoyue, L., Min, Z., & Deng, X. (2005). Detection of phishing webpages based on visual similarity. In Special interest tracks and posters of the 14th international conference on world wide web, ACM, pp. 1060–1061.

  17. Fu, A. Y., Wenyin, L., & Deng, X. (2006). Detecting phishing web pages with visual similarity assessment based on Earth Mover’s Distance (EMD). IEEE Transactions on Dependable and Secure Computing, 3(4), 301–311.

    Article  Google Scholar 

  18. Dhamija, R., & Tygar, J.D. (2005). The battle against phishing: Dynamic security skins. In Proceedings of the 2005 symposium on usable privacy and security, ACM, pp. 77–88.

  19. Barraclough, P. A., Hossain, M. A., Tahir, M., Sexton, G., & Aslam, N. (2013). Intelligent phishing detection and protection scheme for online transactions. Expert Systems with Applications, 40(11), 4697–4706.

    Article  Google Scholar 

  20. Aburrous, M., Hossain, M. A., Dahal, K., & Thabtah, F. (2010). Intelligent phishing detection system for e-banking using fuzzy data mining. Expert Systems with Applications, 37(12), 7913–7921.

    Article  Google Scholar 

  21. Shreeram, V., Suban, M., Shanthi, P., Manjula, K. (2010). Anti-phishing detection of phishing attacks using genetic algorithm. In Communication control and computing technologies (ICCCCT), 2010 IEEE international conference on, IEEE, pp. 447–450.

  22. Mohammad, R. M., Thabtah, F., & McCluskey, L. (2014). Predicting phishing websites based on self-structuring neural network. Neural Computing and Applications, 25(2), 443–458.

    Article  Google Scholar 

  23. Kazemian, H. B., & Ahmed, S. (2015). Comparisons of machine learning techniques for detecting malicious webpages. Expert Systems with Applications, 42(3), 1166–1177.

    Article  Google Scholar 

  24. Pan, Y., & Ding, X. (2006). Anomaly based web phishing page detection. In 2006 22nd annual computer security applications conference (ACSAC’06), pp. 381–392.

  25. Miyamoto, D., Hazeyama, H., & Kadobayashi, Y. (2008). An evaluation of machine learning-based methods for detection of phishing sites. Advances in neuro-information processing (pp. 539–546). Berlin: Springer.

    Google Scholar 

  26. Dua, D., & Taniskidou, E.K. (2017). UCI Machine Learning Repository; 2017. University of California, Irvine, School of Information and Computer Sciences. Accessed 11 Oct 2017. http://archive.ics.uci.edu/ml.

  27. Mohammad, R.M., Thabtah, F., & McCluskey, L. (2012) An assessment of features related to phishing websites using an automated technique. In Internet technology and secured transactions, 2012 international conference for IEEE, pp. 492–497.

  28. Alexa Inc. How are Alexa’s traffic rankings determined?; 2018. Accessed 16 Jan 219. https://support.alexa.com/hc/en-us/articles/200449744-How-are-Alexa-s-traffic-rankings-determined-.

  29. Auria, L., & Moro, R.A. (2008). DIW. Support vector machines (SVM) as a technique for solvency analysis. DIW discussion papers.

  30. Dewang, R. K., & Singh, A. K. (2018). State-of-art approaches for review spammer detection: A survey. Journal of Intelligent Information Systems, 50(2), 231–264.

    Article  Google Scholar 

  31. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.

    Google Scholar 

  32. Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 1–27.

    Article  Google Scholar 

  33. Platt, J.C. (1998). Sequential minimal optimization: A fast algorithm for training support vector machines. MSR-TR-98-14.

  34. Ch, S., Sohani, S., Kumar, D., Malik, A., Chahar, B., Nema, A., et al. (2014). A support vector machine-firefly algorithm based forecasting model to determine malaria transmission. Neurocomputing, 129, 279–288.

    Article  Google Scholar 

  35. Chao, C. F., & Horng, M. H. (2015). The construction of support vector machine classifier using the firefly algorithm. Computational Intelligence and Neuroscience, 2015, 2.

    Article  Google Scholar 

  36. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12, 2825–2830.

    Google Scholar 

  37. Tuba, E., Mrkela, L., & Tuba, M. (2016) Support vector machine parameter tuning using firefly algorithm. In 26th International conference radioelektronika, IEEE, pp. 413–418.

  38. Chakraborty, A., & Kar, A.K. (2016). A review of bio-inspired computing methods and potential applications. In Proceedings of the international conference on signal, networks, computing, and systems. Springer, pp. 155–161.

  39. Kar, A. K. (2016). Bio inspired computing-A review of algorithms and scope of applications. Expert Systems with Applications, 59, 20–32.

    Article  Google Scholar 

  40. Yang, X. S. (2014). Nature-inspired optimization algorithms. Amsterdam: Elsevier.

    Google Scholar 

  41. Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of ICNN’95 - international conference on neural networks, vol. 4, pp. 1942–1948.

  42. Chen, K. H., Chen, L. F., & Su, C. T. (2014). A new particle swarm feature selection method for classification. Journal of Intelligent Information Systems, 42(3), 507–530.

    Article  Google Scholar 

  43. Karaboga D. (2005). An idea based on honey bee swarm for numerical optimization. TR06, Erciyes University, Engineering Faculty, Computer Engineering Department.

  44. Yang, X.S., & Deb, S. (2009). Cuckoo search via Lévy flights. In Nature and biologically inspired computing, 2009. NaBIC 2009. World Congress on, IEEE, pp. 210–214.

  45. Chakraborty, A., & Kar, A. K. (2017). Swarm intelligence: A review of algorithms. Nature-Inspired Computing and Optimization (pp. 475–494). Berlin: Springer.

    Book  Google Scholar 

  46. Yang, X.S. (2010) A new metaheuristic bat-inspired algorithm. Nature inspired cooperative strategies for optimization (NICSO 2010), pp. 65–74.

  47. Yang, X,S. (2009). Firefly algorithms for multimodal optimization. In International symposium on stochastic algorithms. Springer, pp. 169–178.

  48. Mirjalili, S., & Lewis, A. (2016). The whale optimization algorithm. Advances in Engineering Software, 95, 51–67.

    Article  Google Scholar 

  49. Mirjalili, S., Mirjalili, S. M., & Lewis, A. (2014). Grey wolf optimizer. Advances in Engineering Software, 69, 46–61.

    Article  Google Scholar 

  50. Olatomiwa, L., Mekhilef, S., Shamshirband, S., Mohammadi, K., Petković, D., & Sudheer, C. (2015). A support vector machine-firefly algorithm-based model for global solar radiation prediction. Solar Energy, 115, 632–644.

    Article  Google Scholar 

  51. Tharwat, A., Hassanien, A. E., & Elnaghi, B. E. (2017). A BA-based algorithm for parameter optimization of support vector machine. Pattern Recognition Letters, 93, 13–22.

    Article  Google Scholar 

  52. Elhariri, E., El-Bendary, N., Hassanien, A.E., & Abraham, A. (2015) Grey wolf optimization for one-against-one multi-class support vector machines. In Soft computing and pattern recognition (SoCPaR), 2015 7th international conference of, IEEE, pp. 7–12.

  53. Ala’M, A. Z., Faris, H., Hassonah, M. A., et al. (2018). Evolving support vector machines using Whale Optimization algorithm for spam profiles detection on online social networks in different lingual contexts. Knowledge Based Systems, 153, 91–104.

    Article  Google Scholar 

  54. Gupta, S., Kar, A. K., Baabdullah, A., & Al-Khowaiter, W. A. (2018). Big data with cognitive computing: A review for the future. International Journal of Information Management, 42, 78–89.

    Article  Google Scholar 

  55. Ali, H., & Kar, A. K. (2018). Discriminant analysis using ant colony optimization-an intra-algorithm exploration. Procedia Computer Science, 132, 880–889.

    Article  Google Scholar 

  56. Kar, A. K. (2015). A hybrid group decision support system for supplier selection using analytic hierarchy process, fuzzy set theory and neural network. Journal of Computational Science, 6, 23–33.

    Article  Google Scholar 

  57. Khalilpourazari, S., & Khalilpourazary, S. (2018). optimization of production time in the multi-pass milling process via a Robust Grey Wolf optimizer. Neural Computing and Applications, 29(12), 1321–1336.

    Article  Google Scholar 

  58. Mansouri, A., Aminnejad, B., & Ahmadi, H. (2018). Introducing modified version of penguins search optimization algorithm (PeSOA) and its application in optimal operation of reservoir systems. Water Science and Technology Water Supply, 18(4), 1484–1496.

    Article  Google Scholar 

  59. Xue, X., & Xiao, M. (2017). Deformation evaluation on surrounding rocks of underground caverns based on PSO-LSSVM. Tunnelling and Underground Space Technology, 69, 171–181.

    Article  Google Scholar 

  60. Yi, T. H., Zhou, G. D., Li, H. N., & Wang, C. W. (2017). Optimal placement of triaxial sensors for modal identification using hierarchic wolf algorithm. Structural Control and Health Monitoring, 24(8), e1958.

    Article  Google Scholar 

  61. Li, H., Liu, X., Huang, Z., Zeng, C., Zou, P., Chu, Z., et al. (2020). Newly emerging nature-inspired optimization-algorithm review, unified framework, evaluation, and behavioural parameter Optimization. IEEE Access, 8, 72620–72649.

    Article  Google Scholar 

  62. Molina D, Poyatos J, Del Ser J, García S, Hussain A, Herrera F. (2020). Comprehensive taxonomies of nature-and bio-inspired optimization: inspiration versus algorithmic behavior, critical analysis and recommendations. arXiv preprint arXiv:2002.08136.

  63. Kar, A. K., & Dwivedi, Y. K. (2020). Theory building with big data-driven research-moving away from the what towards the why. International Journal of Information Management, 54, 102205.

Download references

Acknowledgements

We would like to thank Dr Reema Aswani, Research Associate at Indian Institute of Technology, New Delhi, for her support and guidance. We would also like to thank the organisers and the Scientific Review Committee of the Intel IRIS Science Fair, for their critical reviews, and the organisers of the Intel International Science and Engineering Fair, where this project was initially presented.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sagnik Anupam.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Anupam, S., Kar, A.K. Phishing website detection using support vector machines and nature-inspired optimization algorithms. Telecommun Syst 76, 17–32 (2021). https://doi.org/10.1007/s11235-020-00739-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11235-020-00739-w

Keywords

Navigation