Skip to main content
Log in

Consensus and majority vote feature selection methods and a detection technique for web phishing

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Phishing is one of the most frequently occurring forms of cybercrime that Internet users face and represents a violation of cybersecurity principles. Phishing is a fraudulent attack that is performed over the Internet with the purpose of obtaining and using without authorization the sensitive information of Internet users, such as usernames, passwords, credit card details, and bank account information. Some widely used phishing attempts involve using email spoofing or instant messaging, aiming to convince a victim to visit the spoofed websites, which will result in obtaining the victim’s information. In this work, we identify and analyze the most important features needed to detect the spoofed websites in virtue of two new feature selection techniques. The first proposed feature selection technique uses underlying feature selection methods that vote on each feature, and if such methods agree on a specific feature, that feature is selected. The second feature selection technique also uses underlying feature selection methods that vote on each feature, and if the majority vote on a specific feature, the feature is selected. We also propose a phishing detection technique based on both AdaBoost and LightGBM ensemble methods to detect the spoofed websites. The proposed method achieves a very high accuracy compared to that of the existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Abutair H, Belghith A, AlAhmadi S (2019) Cbr-pds: a case-based reasoning phishing detection system. J Ambient Intell Hum Comput 10(7):2593–2606

    Article  Google Scholar 

  • Bahnsen AC, Bohorquez EC, Villegas S, Vargas J, González FA (2017) Classifying phishing urls using recurrent neural networks. In: 2017 APWG symposium on electronic crime research (eCrime), IEEE, pp 1–8

  • Basnet RB, Sung AH, Liu Q (2012) Feature selection for improved phishing detection. In: International Conference on Industrial. Springer, Engineering and Other Applications of Applied Intelligent Systems, pp 252–261

  • Chiew KL, Tan CL, Wong K, Yong KS, Tiong WK (2019) A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf Sci 484:153–166

    Article  Google Scholar 

  • Feng F, Zhou Q, Shen Z et al (2018) The application of a novel neural network in the detection of phishing websites. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-018-0786-3

  • Freund Y, Schapire RE (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: European conference on computational learning theory, Springer, pp 23–37

  • Freund Y, Schapire R, Abe N (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(771–780):1612

    Google Scholar 

  • Jain AK, Gupta BB (2018) Two-level authentication approach to protect from phishing attacks in real time. J Ambient Intell Hum Comput 9(6):1783–1796

    Article  Google Scholar 

  • Jain AK, Gupta BB (2019) A machine learning based approach for phishing detection using hyperlinks information. J Ambient Intell Hum Comput 10(5):2015–2028

    Article  Google Scholar 

  • Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) Lightgbm: a highly efficient gradient boosting decision tree. In: Advances in neural information processing systems, pp 3146–3154

  • Khonji M, Jones A, Iraqi Y (2013) An empirical evaluation for feature selection methods in phishing email classification. Int J Comput Syst Sci Eng 28(1):37–51

    Google Scholar 

  • Lastdrager EE (2014) Achieving a consensual definition of phishing based on a systematic review of the literature. Crime Sci 3(1):9

    Article  Google Scholar 

  • L’Huillier G, Hevia A, Weber R, Rios S (2010) Latent semantic analysis and keyword extraction for phishing classification. In: 2010 IEEE International Conference on intelligence and security informatics, IEEE, pp 129–131

  • Ma J, Saul LK, Savage S, Voelker GM (2009) Identifying suspicious urls: an application of large-scale online learning. In: Proceedings of the 26th annual international conference on machine learning, pp 681–688

  • Marchal S, François J, State R, Engel T (2014) Phishstorm: detecting phishing with streaming analytics. IEEE Trans Netw Serv Manag 11(4):458–471

    Article  Google Scholar 

  • Marchal S, Saari K, Singh N, Asokan N (2016) Know your phish: Novel techniques for detecting phishing sites and their targets. In: 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS), IEEE, pp 323–333

  • McCall T (2007) Gartner survey shows phishing attacks escalated in 2007; more than $3 billion lost to these attacks. Gartner. http://www.gartner.com/it/page.jsp?id=565125

  • Mohammad RM, Thabtah F, McCluskey L (2014) Predicting phishing websites based on self-structuring neural network. Neural Comput Appl 25(2):443–458

    Article  Google Scholar 

  • Mohammad R, Thabtah FA, McCluskey T (2015a) Phishing websites dataset. University of Huddersfield, v1. https://archive.ics.uci.edu/ml/datasets/phishing+websites

  • Mohammad RM, Thabtah F, McCluskey L (2015b) Tutorial and critical analysis of phishing websites methods. Comput Sci Rev 17:1–24

    Article  MathSciNet  Google Scholar 

  • Ramanathan V, Wechsler H (2013) Phishing detection and impersonated entity discovery using conditional random field and latent dirichlet allocation. Comput Secur 34:123–139

    Article  Google Scholar 

  • Rao RS, Pais AR (2019) Two level filtering mechanism to detect phishing sites using lightweight visual similarity approach. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-019-01637-z

  • Rao RS, Vaishnavi T, Pais AR (2019) Phishdump: a multi-model ensemble based technique for the detection of phishing sites in mobile devices. Pervasive Mob Comput 60:101084

    Article  Google Scholar 

  • Rao RS, Vaishnavi T, Pais AR (2020) Catchphish: detection of phishing websites by inspecting urls. J Ambient Intell Hum Comput 11(2):813–825

    Article  Google Scholar 

  • Tan CL (2018) Phishing dataset for machine learning: feature evaluation. Mendeley, v1. https://doi.org/10.17632/h3cgnj8hft.1

  • Thakur T, Verma R (2014) Catching classical and hijack-based phishing attacks. In: International Conference on information systems security, Springer, pp 318–337

  • Toolan F, Carthy J (2010) Feature selection for spam and phishing detection. In: 2010 eCrime researchers summit. IEEE, pp 1–12. https://doi.org/10.1109/ecrime.2010.5706696

  • Varshney G, Misra M, Atrey PK (2016) A survey and classification of web phishing detection schemes. Secur Commun Netw 9(18):6266–6284

    Article  Google Scholar 

  • Verma R, Dyer K (2015) On the character of phishing urls: accurate and robust statistical learning classifiers. In: Proceedings of the 5th ACM Conference on data and application security and privacy, pp 111–122

  • Wang W, Zhang F, Luo X, Zhang S (2019) Pdrcnn: precise phishing detection with recurrent convolutional neural networks. Secur Commun Netw. https://doi.org/10.1155/2019/2595794

  • Xin Y, Kong L, Liu Z, Chen Y, Li Y, Zhu H, Gao M, Hou H, Wang C (2018) Machine learning and deep learning methods for cybersecurity. IEEE Access 6:35365–35381

    Article  Google Scholar 

  • Zabihimayvan M, Doran D (2019) Fuzzy rough set feature selection to enhance phishing attack detection. In: 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), IEEE, pp 1–6

  • Zhu E, Chen Y, Ye C, Li X, Liu F (2019) Ofs-nn: an effective phishing websites detection model based on optimal feature selection and neural network. IEEE Access 7:73271–73284

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Munif Alotaibi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alotaibi, B., Alotaibi, M. Consensus and majority vote feature selection methods and a detection technique for web phishing. J Ambient Intell Human Comput 12, 717–727 (2021). https://doi.org/10.1007/s12652-020-02054-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-020-02054-3

Keywords

Navigation