Elsevier

Computers & Security

Volume 102, March 2021, 102152
Computers & Security

Automatically predicting cyber attack preference with attributed heterogeneous attention networks and transductive learning

https://doi.org/10.1016/j.cose.2020.102152Get rights and content

Abstract

Predicting cyber attack preference of intruders is essential for security organizations to demystify attack intents and proactively handle oncoming cyber threats. In order to automatically analyze attack preferences of intruders, this paper proposes a novel framework, namely HinAp, to predict cyber attack preference using attributed heterogeneous attention network and transductive learning. Particularly, we first build an attributed heterogeneous information network (AHIN) of attack events to model attackers, vulnerabilities, exploited scripts, compromised devices, invaded platforms, and 20 types of meta-paths describing interdependent relationships among them, in which attribute information of vulnerabilities and exploited scripts are embedded. Then, we propose the attack preference prediction model based on attention mechanism and transductive learning, respectively. Finally, an automated model for predicting cyber attack preferences is constructed by stacking these two basic prediction models, which capable of integrating more comprehensive and complex semantic information from meta-paths and meta-graphs to characterize attack preference of intruders. Experimental results based on real-world data prove that HinAp outperforms the state-of-the-art methods in predicting cyber attack preferences of intruders.

Introduction

Cyber attack has been steadily increasing over the years, and more sophisticated attack technologies (e.g., APT, 0day) are spreading across the Internet. They can effortlessly bypass traditional security protections such as firewall, anti-virus software, and intrusion detection systems (IDS) to steal sensitive data, mine, ransom and destroy infrastructure (Singh et al., 2016).

Developing an automated model to predict cyber attack preferences of intruders will be of tremendous value for proactively formulating targeted defense strategies and mitigating attacks (Kai et al., 2018). Cyber attack preference analysis can assist organizations to understand the landscape of attacks, such as “Which attackers tend to invade the target system?”, “What is the intent of hacking into the system?”, “What kind of exploit tools do intruders utilize?”, and “which attackers may belong to the same malicious group?”, etc. Generally, the task of predicting attack preference is to learn the characteristics of attackers based on collected clues that attackers expose (Hernandez-Suarez et al., 2018), and assign certain labels for them.

In fact, there are two main challenges in predicting cyber attack preference. First, cyber attack flows include multitudinous heterogeneous objects (e.g., intruder, vulnerability, attack script, device, and platform, etc), therefore, modeling heterogeneous cyber objects from a variety of data sources is a challenging task. Second, it is hard to choose proper elements to describe the intruders’ attack preference features. Obviously, inappropriate attack feature representation may result in invalid prediction performance.

In order to address the first challenge, this paper employs attributed heterogeneous information networks (AHIN) to model attackers, vulnerabilities, exploited scripts, compromised devices and platforms, and explores their underlying interactive relationships to characterize attack preferences of intruders. AHIN can effectively assemble heterogeneous cyber objects and their interactive relationships to convey rich semantic information for representing attack preference of intruders.

To overcome the second challenge, this paper leverages attention mechanism (Veličković, Cucurull, Casanova, Romero, Lio, Bengio, Wang, Ji, Shi, Wang, Cui, Yu, Ye) to learn the weights of different types of features to boost the performance of attack preference prediction. In this paper, we propose HinAp, an automated framework for predicting cyber attack preferences. More specifically, we first construct an attributed heterogeneous information network (AHIN) of attack events. And then leverage attributed heterogeneous attention networks to learn the weights of different types of nodes, meta-paths and meta-graphs for characterizing attack preference of intruders, which is equipped to jointly learn both the basic semantics (i.e., meta-paths) and high-order semantic information (i.e., meta-graphs) to build a cyber attack preference prediction model. Main contributions of this paper are summarized as follows:

  • Cyber attack events modeling. The Attributed Heterogeneous Information Network (AHIN) of cyber attack events is proposed to model the interactive relationships among attackers, vulnerabilities, exploited scripts, compromised devices and platforms, in which 20 kinds of meta-paths and 5 types of meta-graphs are investigated to assist in exploring the interactive relationships among cyber attack objects. Different from traditional heterogeneous information networks (HIN), AHIN is embedded with attributes of nodes, which is capable of conveying richer and fine-grained features for characterizing cyber attack preference of intruders.

  • Attack preference prediction based on stacking learning. This paper first performs attention mechanism on AHIN to learn the importance of different types of attack objects and meta-paths for attack preference prediction. Then a transductive learning is investigated to construct meta-graph based attack preference model. Finally, this paper leverages stacking learning to assemble the models based on attention and transductive learning, which capable of learning more comprehensive features from nodes, meta-paths and meta-graphs to boost the performance of characterizing attack preference of intruders.

  • Social data based attack preference analysis. This paper proposes a novel solution for analyzing cyber attack preference using social data instead of attack logs and traffic data, which naturally overcomes the data shortage and incompleteness for real intrusion behaviors. In addition, the security-related social media data cover a wider variety of attack events and types than isolated logs captured by single security organization. The comprehensive range of data guarantees the better generalization of the proposed attack preference model.

The rest of this paper is organized as follows: In Section 2, we review some related work involving attack preferences analysis, heterogeneous information networks. In Section 3, we portray the architecture of HinAp. In Section 4, we introduce the preliminaries and the process of building HinAp. In Section 5, we introduce the proposed method for modeling attack preference. Section 6 evaluates the effectiveness and scalability of HinAp on ground truth data. Finally, the conclusion is presented in Section 7.

Section snippets

Related work

Predicting attack preferences of intruders is crucial for security companies and organizations to resist oncoming cyber threats. Recently, machine learning and deep learning have been playing an important role in predicting attack preference and analyzing attack patterns. Du et al. (2017) proposed DeepLog, a deep neural notwork model to automatically learn log patterns and execution preference to detect anomalies. To detect threats in cloud computing system, Farshchi et al. (2018) analyzed

Architecture

In this paper, we present HinAp, an automated framework based on the attributed heterogeneous information network to model attack preference, which utilizes both node attribute-level features and structure-level information (meta-paths and meta-graph) to characterize the attack preference of intruders. The architecture of HinAp is shown in Fig. 1, which consists of four major components and the details are as follows.

  • Data collection and preprocessing. This paper first develops an incremental

Preliminaries

Predicting cyber attack preferences can assist security organizations to gain insights into the purpose and motivation of intrusions, and develop protection mechanisms to proactively evade cyber threats. In this paper, cyber attack preference prediction can be formalized as Definition 1.

Definition 1

(Attack Preference Prediction). Given the AHIN G={V,E,A}, the meta-paths set S=(N,R), and attack preference candidate set P={p1,p2,...,pi},iN. Cyber attack preference prediction of an attacker Vi in G includes

Methodology

In order to comprehensively represent the cyber attack features and effectively construct attack preference prediction model, this paper proposes HinAp, a stacking ensemble framework with attention mechanism and transductive learning. Particularly, we first build a AHIN of attack events under the guidance of 11 kinds of relations (see R1R11). Then, after performing attention operation as shown in Fig. 3 on AHIN to learn the importance of nodes and meta-paths (see Table 1) for characterizing

Data collection

We develop a data collection system to automatically collect attack event descriptions from 73 security-related data sources (please see Table 3 in Appendix) including hacker forum posts, security blogs and security news. Our data collection system leverages breadth-first search to collect attack event data, which starts the collection from the homepages of 73 data sources until no new URL can be invoked. For each URL, we first collect the HTML source codes, and then utilize XML parser to

Conclusion

In this paper, we propose HinAp, an intelligent framework for predicting attack preferences of intruders. More specifically, we first construct the AHIN with attackers, vulnerabilities, attack scripts, platforms, devices, and their interactive relations. Then, we build the attention and transductive learning based attack preference prediction models, respectively. Finally, we utilize stacking ensemble learning to further assemble the important information from different types of nodes,

CRediT authorship contribution statement

Jun Zhao: Methodology, Writing - original draft. Xudong Liu: Conceptualization. Qiben Yan: Methodology, Writing - review & editing. Bo Li: Project administration. Minglai Shao: Data curation. Hao Peng: Formal analysis. Lichao Sun: Formal analysis.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the National Key R&D Program China (2018YFB0803503), the 2018 joint Research Foundation of Ministry of Education, China Mobile (MCM20180507) and the Opening Project of Shanghai Trusted Industrial Control Platform (TICPSH202003020-ZC). Specifically, Qiben Yan is supported in part by the National Science Foundation grants CNS1950171, CNS-1949753.

Jun Zhao is a PhD candidate in the Department of Computer Science and Engineering, Beihang University, China. He received the MS degree in School of Information Science and Engineering from Shandong Normal University in Jinan, China. His current research interests include cyber threat intelligence, machine learning, and data mining.

References (33)

  • A.A. Ahmed et al.

    SAIRF: a similarity approach for attack intention recognition using fuzzy min-max neural network

    J. Comput. Sci.

    (2018)
  • L. Bao et al.

    Execution anomaly detection in large-scale systems through console log analysis

    J. Syst. Softw.

    (2018)
  • M. Farshchi et al.

    Metric selection and anomaly detection for cloud operations using log and metric correlation analysis

    J. Syst. Softw.

    (2018)
  • F.J. Aparicio et al.

    Using the pattern-of-life in networks to improve the effectiveness of intrusion detection systems

    2017 IEEE International ICC

    (2017)
  • Chin Jr.G. et al.

    Predicting and detecting emerging cyberattack patterns using streamworks

    Proceedings of the 9th Annual Cyber and Information Security Research Conference

    (2014)
  • Devlin, J., Chang, M.-W., Lee, K., Toutanova, K., 2018. BERT: pre-training of deep bidirectional transformers for...
  • Y. Dong et al.

    metapath2vec: scalable representation learning for heterogeneous networks

    23rd ACM SIGKDD

    (2017)
  • S. Dowling et al.

    Using analysis of temporal variances within a honeypot dataset to better predict attack type probability

    2017 12th International Conference for Internet Technology and Secured Transactions (ICITST)

    (2017)
  • H. Du et al.

    Discovering collaborative cyber attack patterns using social network analysis

    International Conference on Social Computing

    (2011)
  • M. Du et al.

    DeepLog: anomaly detection and diagnosis from system logs through deep learning

    Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

    (2017)
  • Y. Fan et al.

    Automatic opioid user detection from twitter: transductive ensemble built on different meta-graph based similarities over heterogeneous information network

    IJCAI

    (2018)
  • A. Grover et al.

    node2vec: scalable feature learning for networks

    Proceedings of the 22nd ACM SIGKDD

    (2016)
  • P. He et al.

    Towards automated log parsing for large-scale log data analysis

    IEEE Trans. Dependable Secure Comput.

    (2017)
  • A. Hernandez-Suarez et al.

    Social sentiment sensor in twitter for predicting cyber-attacks using L1 regularization

    Sensors

    (2018)
  • S. Hou et al.

    HinDroid: An intelligent android malware detection system based on structured heterogeneous information network

    Proceedings of the 23rd ACM SIGKDD

    (2017)
  • Jabbarand et al.

    A novel intelligent ensemble classifier for network intrusion detection system

    International Conference on Soft Computing and Pattern Recognition

    (2016)
  • Cited by (0)

    Jun Zhao is a PhD candidate in the Department of Computer Science and Engineering, Beihang University, China. He received the MS degree in School of Information Science and Engineering from Shandong Normal University in Jinan, China. His current research interests include cyber threat intelligence, machine learning, and data mining.

    Xudong Liu is a professor in the School of Computer Science and Engineering, Beihang University, China. His current research interests include big data and industrial information security.

    Qiben Yan is an Assistant Professor in Department of Computer Science and Engineering of Michigan State University. He received his Ph.D. in Computer Science department from Virginia Tech, an M.S. and a B.S. degree in Electronic Engineering from Fudan University in Shanghai, China. His current research interests include wireless communication, wireless network security and privacy, mobile and IoT security, and big data privacy.

    Bo Li is an Assistant Professor in the School of Computer Science and Engineering, Beihang University, China. He received the PhD degree in the School of Computer Science and Engineering from Beihang University. His current research interests include industrial information security, mobile and IoT security, and cyber threat intelligence.

    Minglai Shao is a PHD candidate in the Department of Computer Science and Engineering, Beihang University, China. He received the BS degree in computer science from Hebei Normal University in 2012, the MS degree from Guangxi University in 2015. His research interests include anomaly detection, graph mining, event detection and forecasting, optimization algorithm.

    Hao Peng is an Assistant Professor in the Department of Computer Science and Engineering, Beihang University, China. He received the PHD degree in the Department of Computer Science and Engineering, Beihang University, China. His research interests include data mining and cyber security.

    View full text