Skip to main content
Log in

Vulnerability exploitation time prediction: an integrated framework for dynamic imbalanced learning

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Exploitation time is an essential factor for vulnerability assessment in cybersecurity management. In this work, we propose an integrated consecutive batch learning framework to predict the probable exploitation time of vulnerabilities. To achieve a better performance, we combine features extracted from both vulnerability descriptions and the Common Vulnerability Scoring System in the proposed framework. In particular, we design an Adaptive Sliding Window Weighted Learning (ASWWL) algorithm to tackle the dynamic multiclass imbalance problem existing in many industrial applications including exploitation time prediction. A series of experiments are carried out on a real-world dataset, containing 24,413 exploited vulnerabilities disclosed between 1990 and 2020. Experimental results demonstrate the proposed ASWWL algorithm can significantly enhance the performance of the minority classes without compromising the performance of the majority class. Besides, the proposed framework achieves the most robust and state-of-the-art performance compared with the other five consecutive batch learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://www.first.org/cvss/

  2. https://github.com/scikit-multiflow/scikit-multiflow

  3. https://nvd.nist.gov/vuln/data-feeds

  4. https://cve.mitre.org

  5. https://www.exploit-db.com/

  6. https://storage.googleapis.com/bertmodels/2018_10_18/uncased_L-12_H-768_A-12.zip

References

  1. Afzaliseresht, N., Miao, Y., Michalska, S., Liu, Q., Wang, H.: From logs to stories: human-centred data mining for cyber threat intelligence. IEEE Access 8, 19089–19099 (2020)

    Article  Google Scholar 

  2. Alazab, M., Tang, M.: Deep Learning Applications for Cyber Security. Springer, Berlin (2019)

    Book  Google Scholar 

  3. Anwar, M.M., Liu, C., Li, J.: Discovering and tracking query oriented active online social groups in dynamic information network. World Wide Web 22(4), 1819–1854 (2019)

    Article  Google Scholar 

  4. Bifet, A., Gavaldà, R.: Adaptive learning from evolving data streams. In: International Symposium on Intelligent Data Analysis, pp 249–260. Springer (2009)

  5. Bozorgi, M., Saul, L.K., Savage, S., Voelker, G.M.: Beyond heuristics: learning to classify vulnerabilities and predict exploits. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 105–114. ACM (2010)

  6. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018)

  7. Du, J., Michalska, S., Subramani, S., Wang, H., Zhang, Y.: Neural attention with character embeddings for hay fever detection from twitter. Health Inf. Sci. Sys. 7(1), 1–7 (2019)

    Article  Google Scholar 

  8. Edkrantz, M., Said, A.: Predicting cyber vulnerability exploits with machine learning. In: SCAI, pp 48–57 (2015)

  9. Elwell, R., Polikar, R.: Incremental learning of concept drift in nonstationary environments. IEEE Trans. Neural Netw. 22(10), 1517–1531 (2011)

    Article  Google Scholar 

  10. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 97–106 (2001)

  11. Islam, M.R., Kabir, M.A., Ahmed, A., Kamal, A.R.M., Wang, H., Ulhaq, A.: Depression detection from social network data using machine learning techniques. Health Inf. Sci. Sys. 6(1), 1–12 (2018)

    Article  Google Scholar 

  12. Jacobs, J., Romanosky, S., Adjerid, I., Baker, W.: Improving vulnerability remediation through better exploit prediction. J. Cybersec. 6(1), tyaa015 (2020)

    Article  Google Scholar 

  13. Jacobs, J., Romanosky, S., Edwards, B., Roytman, M., Adjerid, I.: Exploit prediction scoring system (epss). arXiv:1908.04856 (2019)

  14. Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: an ensemble method for drifting concepts. J. Mach. Learn. Res. 8, 2755–2790 (2007)

    MATH  Google Scholar 

  15. Kosina, P., Gama, J.: Very fast decision rules for classification in data streams. Data Min. Knowl. Disc. 29(1), 168–202 (2015)

    Article  MathSciNet  Google Scholar 

  16. Li, H., Wang, Y., Wang, H., Zhou, B.: Multi-window based ensemble learning for classification of imbalanced streaming data. World Wide Web 20(6), 1507–1525 (2017)

    Article  Google Scholar 

  17. Li, M., Sun, X., Wang, H., Zhang, Y., Zhang, J.: Privacy-aware access control with trust management in web service. World Wide Web 14(4), 407–430 (2011)

    Article  Google Scholar 

  18. Li, Z., Wang, X., Li, J., Zhang, Q.: Deep attributed network representation learning of complex coupling and interaction. Knowl.-Based Syst. 212, 106618 (2021)

    Article  Google Scholar 

  19. Losing, V., Hammer, B., Wersing, H.: Knn classifier with self adjusting memory for heterogeneous concept drift. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp 291–300 (2016)

  20. Montiel, J., Read, J., Bifet, A., Abdessalem, T.: Scikit-multiflow: a multi-output streaming framework. J. Mach. Learn. Res. 19(72), 1–5 (2018). http://jmlr.org/papers/v19/18-251.html

    Google Scholar 

  21. Rasool, R.U., Ashraf, U., Ahmed, K., Wang, H., Rafique, W., Anwar, Z.: Cyberpulse: a machine learning based link flooding attack mitigation system for software defined networks. IEEE Access 7, 34885–34899 (2019)

    Article  Google Scholar 

  22. Sarki, R., Ahmed, K., Wang, H., Zhang, Y.: Automated detection of mild and multi-class diabetic eye diseases using deep learning. Health Inf. Sci. Sys. 8(1), 1–9 (2020)

    Article  Google Scholar 

  23. Shen, Y., Zhang, T., Wang, Y., Wang, H., Jiang, X.: Microthings: a generic iot architecture for flexible data aggregation and scalable service cooperation. IEEE Commun. Mag. 55(9), 86–93 (2017)

    Article  Google Scholar 

  24. Tang, M., Yin, J., Alazab, M., Cao, J.C., Luo, Y.: Modelling of extreme vulnerability disclosure in smart city industrial environments. IEEE Trans. Indust. Inf., pp. 1–1 (2020)

  25. Tang, M., Alazab, M., Luo, Y.: Big data for cybersecurity: vulnerability disclosure trends and dependencies. IEEE Trans. Big Data 5(3), 317–329 (2019)

    Article  Google Scholar 

  26. Tavabi, N., Goyal, P., Almukaynizi, M., Shakarian, P., Lerman, K.: Darkembed: exploit prediction with neural language models. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

  27. Vimalachandran, P., Liu, H., Lin, Y., Ji, K., Wang, H., Zhang, Y.: Improving accessibility of the australian my health records while preserving privacy and security of the system. Health Inf. Sci. Sys. 8(1), 1–9 (2020)

    Article  Google Scholar 

  28. Wang, H., Sun, L., Bertino, E.: Building access control policy model for privacy preserving and testing policy conflicting problems. J. Comput. Syst. Sci. 80(8), 1493–1503 (2014)

    Article  MathSciNet  Google Scholar 

  29. Wang, H., Wang, Y., Taleb, T., Jiang, X.: Special issue on security and privacy in network computing. World Wide Web 23(2), 951–957 (2020)

    Article  Google Scholar 

  30. Wang, H., Yi, X., Bertino, E., Sun, L.: Protecting outsourced data in cloud computing through access management. Concur. Comput. Pract. Exp. 28 (3), 600–615 (2016)

    Article  Google Scholar 

  31. Wang, S., Minku, L.L., Yao, X.: A learning framework for online class imbalance learning. In: 2013 IEEE Symposium on Computational Intelligence and Ensemble Learning (CIEL), pp 36–45 (2013)

  32. Wang, S., Minku, L.L., Yao, X.: Dealing with multiple classes in online class imbalance learning. In: IJCAI, pp 2118–2124 (2016)

  33. Yi, X., Zhang, Y.: Privacy-preserving distributed association rule mining via semi-trusted mixer. Data Knowl Eng 63(2), 550–567 (2007)

    Article  MathSciNet  Google Scholar 

  34. Yin, J., Cao, J., Siuly, S., Wang, H.: An integrated mci detection framework based on spectral-temporal analysis. Int. J. Autom. Comput. 16(6), 786–799 (2019)

    Article  Google Scholar 

  35. Yin, J., Tang, M., Cao, J., Wang, H.: Apply transfer learning to cybersecurity: predicting exploitability of vulnerabilities by description. Knowl-Based Sys., pp. 106529. https://doi.org/10.1016/j.knosys.2020.106529 (2020)

  36. Yin, J., Tang, M., Cao, J., Wang, H., You, M., Lin, Y.: Adaptive online learning for vulnerability exploitation time prediction. In: Web Information Systems Engineering – WISE 2020, pp 252–266. Springer (2020)

  37. Yin, J., You, M., Cao, J., Wang, H., Tang, M., Ge, Y.F.: Data-driven hierarchical neural network modeling for high-pressure feedwater heater group. In: Australasian Database Conference, pp 225–233. Springer (2020)

  38. Zhang, F., Wang, Y., Liu, S., Wang, H.: Decision-based evasion attacks on tree ensemble classifiers. World Wide Web 23(5), 2957–2977 (2020)

    Article  Google Scholar 

  39. Zhang, J., Li, H., Liu, X., Luo, Y., Chen, F., Wang, H., Chang, L.: On efficient and robust anonymization for privacy protection on massive streaming categorical information. IEEE Trans Depend Sec Comput 14(5), 507–520 (2015)

    Article  Google Scholar 

  40. Zhang, J., Tao, X., Wang, H.: Outlier detection from large distributed databases. World Wide Web 17(4), 539–568 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

The first author was partially supported by the Research Program of Chongqing University of Arts and Sciences, China (Grant No. P2020RG08) and the Natural Science Foundation of Chongqing, China (Grant No. cstc2019jcyj-msxmX034).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiao Yin.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Web Information Systems Engineering 2020

Guest Editors: Hua Wang, Zhisheng Huang, and Wouter Beek

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yin, J., Tang, M., Cao, J. et al. Vulnerability exploitation time prediction: an integrated framework for dynamic imbalanced learning. World Wide Web 25, 401–423 (2022). https://doi.org/10.1007/s11280-021-00909-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-021-00909-z

Keywords

Navigation