Identifying compromised hosts under APT using DNS request sequences

https://doi.org/10.1016/j.jpdc.2021.02.017Get rights and content

Highlights

  • We attach importance to the time-related patterns which are ignored by existing work and determine the characteristic patterns of DNS sequences requested by hosts on the timeline.

  • We maximize the value of available resource of detection.

  • Our method combines different unsupervised learning algorithms and achieves better performance than conventional detection methods.

  • Our approach puts forward more comprehensive ideas for future defense work.

Abstract

Advanced persistent threats (APTs) have become a major cyber threat to large organizations. To steal confidential data from specific organizations, attackers adopt highly targeted intrusion schemes. Prior to stealing critical data, APT activities hide themselves in legitimate activities and consistently elevate their privileges, making them very difficult to detect. The detection of malicious domains during domain name service (DNS) analysis accounts for the majority of existing detection methods. However, a limited number of available samples and rapidly changing sets of malicious domain names reduce the efficacy of such approaches. By investigating numerous APT reports, we determined that the activities of DNS requests in APT attacks exhibit clear temporal patterns that are ignored by most existing schemes. Therefore, we can analyze the DNS sequences requested by each host and their time-related features to identify compromised hosts. This paper summarizes the patterns of host DNS requests and proposes several assumptions. We take advantage of machine learning to identify compromised hosts by quantifying these assumptions in the form of feature vectors. We deployed the proposed approach into large-scale network environments and experimental evaluations demonstrated that our method is able to detect hosts compromised by APTs efficiently with a precision of 97.3% and detection rate of 96.2%.

Introduction

In recent years, advanced persistent threats (APTs) have attracted significant research attention. APTs are orchestrated by attackers for specific targets using a combination of various methods and tools. Victims are typically governments, financial institutions, or large corporations. Notorious attacks such as operation Aurora [26] and Stuxnet [36] have shocked the world. APTs are characterized by explicit purposiveness, diversity of intrusion methods, long and hidden latency, and significant damage to target organizations. Prior to intrusion, attackers with abundant resources gather as much information as possible regarding targets through social engineering and other techniques to design highly targeted intrusion schemes. Then, via spear fishing, watering hole attacks, and even storage media, malware is implanted into target systems and remains dormant for a long duration. Attackers use standard operating system tools and techniques to hide malicious activities inside legitimate traffic and consistently elevate their privileges until they are able to steal confidential data. Confidential data is eventually transmitted to specific external servers, which is the ultimate goal of attackers [40].

APTs are attracting significant attention of researchers in the security field. Researchers have described the life cycle of an APT as a kill chain and have proposed various detection methods and defense strategies for each stage. An APT’s targeted entities are considered as hosts. It is known that compromised hosts exhibit different behavioral characteristics compared to normal hosts, which will be discussed in detail in Section 3. We believe that monitoring the subjects being attacked more closely matches the nature of APTs compared to monitoring other features. Inside hosts, APTs can remain dormant or operate in a seemingly legitimate manner to evade detection by defenders. From the perspective of networking, the behavioral characteristics of APTs are recorded in network logs. Therefore, the analysis of domain name service (DNS) logs of target networks has become a hotspot for researchers. Regardless of how confidential data are stolen by malware, the data will eventually be sent to external servers set up by attackers. The HTTP/HTTPS protocols are commonly used by APTs to mimic legitimate behavior and evade routine detection. However, every seemingly innocuous malicious communication is recorded in a DNS log, where the behavioral characteristics of compromised hosts are reflected at the network level. Therefore, DNS logs can serve as the basis for detecting communication between malware, and command and control (C&C) servers. Several researchers have devoted themselves to discovering clues regarding communication between compromised hosts and C&C servers through the analysis of DNS logs.

The majority of existing detection methods are also devoted to the analysis of network traffic and DNS records. Specifically, analyzing DNS logs to identify malicious domains is the most widely studied detection method. Researches [5], [21], [28], [29], [35], [41] typically extract relevant features based on the characteristics of domains themselves and identify malicious domains using machine learning. However, the disadvantages of the above method are clear in that not every feature is universally applicable and an attacker can easily evade detection based on insufficient features, which makes detection very difficult. A few studies [3], [13], [17], [18], [20], [31], [33] have made breakthroughs in detecting malicious domains and compromised hosts because they have constructed systems based on web request graphs. Such methods either extract features, search for isolated nodes, or calculate trust values based on reputation propagation. However, incomplete or missing samples of malicious domains are difficult for such methods to handle. Some methods [25] identify suspicious hosts by analyzing high volumes of internally initiated network traffic, meaning that it is necessary to monitor huge volumes of network traffic.

The temporal characteristics of APTs are neglected in most of the studies mentioned above, regardless of whether they focus on domain-based feature extraction or web request graph construction. Although detecting communications between compromised hosts and C&C servers seems to correspond to the data exfiltration phase, not all exfiltration operations are instantaneous. For compromised hosts searching for C&C servers to complete data exfiltration, every action is carefully planned by attackers. Therefore, instead of focusing on the features of malicious domains, which are non-obvious and easy to circumvent, we concentrate on the subjects targeted by attackers, namely compromised hosts. Extensive analysis of APT reports has provided us with a sufficient understanding of the traces left by compromised hosts when they communicate with C&C servers. We analyze the DNS sequences requested by each host and their time-related patterns, and identify certain hosts or types of hosts that are suspected to be compromised via horizontal comparisons to other hosts and vertical comparisons across different periods. Specifically, we attempt to identify hosts that make requests at an abnormal frequency. This paper summarizes the DNS behavior characteristics of the hosts targeted by APT attacks, quantifies the patterns reflected by host request sequences in a timeline, and presents an unsupervised learning method for detect suspicious hosts, which plays an important role in the defense against APT attacks. We tested and evaluated the proposed approach using 70 d of DNS request records collected from a large campus network as detection data. We also validated the proposed approach on a public dataset from one of SURFnet’s authoritative DNS servers using Google’s Public DNS Resolver [7]. We used the same dataset to evaluate a similar method and compare it to the proposed method.

The main contributions of this paper can be summarized as follows:

  • We identify the characteristic patterns of DNS sequences requested by hosts on a timeline and quantify them as features that can be used to identify compromised hosts.

  • Based on the identified features, we propose an approach for detecting hosts compromised by APTs. By analyzing the temporal patterns of request sequences in DNS logs, APTs can be effectively detected without malicious domain samples.

  • We deployed the proposed approach on a real large-scale network environment and conducted comprehensive evaluations based on records collected over 70 d. We also validated our approach on a public dataset.

  • We applied the same dataset to an existing malicious domain name detection method to verify the limitations of existing methods.

The remainder of this paper is organized as follows. Section 2 describes our use of APT reports and surveys, as well as previous APT detection works in the same sphere as ours and their notable characteristics. Section 3 describes existing APT life cycle models and the assumptions and solutions we propose based on this information. Section 4 discusses our approach in detail. Section 5 describes our experimental environment, results, and analysis, and presents the results of a comparative experiment. Finally, we conclude this paper in Section 6.

Section snippets

Related work

Thus far, significant effort has been devoted to APT research, including published reports from major security vendors, such as Kaspersky and FireEye, as well as researcher summaries of such reports. Reports are the most primitive and essential asset for researchers who wish to study APT. We determine the characteristic patterns of DNS sequences requested by hosts in a timeline and quantify these patterns into features to identify compromised hosts by studying APT report surveys and the latest

Scenario & heuristics

In this section, we first describe the life cycle of APTs. We then discuss the motivation for this study and propose assumptions based on the study of reports. Finally, we summarize the proposed scheme.

Methodology

Our goal is to develop an effective method for detecting APTs. In addition to effectiveness and performance, the operability and portability of the developed method should also be considered. Therefore, the proposed method not only needs to be efficient and accurate, but also needs to be easy to implement across platforms and data types. It must be mentioned that NAT address translation generally exists in real-time network environments. Because of the strong regularity of the features proposed

Experimental evaluation

In Section 4, we described the proposed detection method in detail. No stages of the proposed method depend on any specific device or platform. The system used for all experiments described in this article was a Dell OptiPlex 7040 PC with a Core i7-6700 3.40 GHz quad-core CPU, Nvidia GeForce GTX 745 GPU, and 16 GB of RAM.

Conclusion

APT detection is an emerging topic in the field of network security. DNS logs provide important references for defenders. Significant achievements have been made in the detection of malicious domain names. However, this has led attackers to develop more elaborate designs and modifications to circumvent security methods, which has led to a gradual decline in the effectiveness of malicious domain name detection.

By monitoring the entities targeted by APT attacks, namely hosts, we developed a

CRediT authorship contribution statement

Ming Li: Writing - original draft, Data curation, Investigation, Validation, Software. Qiang Li: Funding acquisition, Project administration, Supervision, Writing - review & editing, Resources, Methodology. Guangzhe Xuan: Writing - review & editing, Data curation. Dong Guo: Writing - review & editing, Term, Conceptualization, Data curation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work is supported by the National Natural Science Foundation of China under Grant Nos. 61772229, and 62072208.

Ming Li received his B.Sc. degree in Computer Science from Jilin University, China in 2017. Currently, he is an M.Sc. candidate in the Computer Science Department at Jilin University, China. His main research focuses on the detection of APT attacks.

References (41)

  • B. Dong, Z. Chen, H. Wang, L.-A. Tang, K. Zhang, Y. Lin, Z. Li, H. Chen, Efficient discovery of abnormal event...
  • FireEye: APT30 and the mechanics of a long-running cyber espionage operation

    (2018)
  • Fireeye: Operation poisoned hurricane

    (2018)
  • Freebuf: Whats APT

    (2018)
  • HajajC. et al.

    Less is more: Robust and novel features for malicious domain detection

    (2020)
  • HuX. et al.

    MUSE: asset risk scoring in enterprise network with mutually reinforced reputation propagation

    EURASIP J. Inf. Secur.

    (2014)
  • HutchinsE.M. et al.

    Intelligence-driven computer network defense informed by analysis of adversary campaigns and intrusion kill chains

    Lead. Issues Inf. Warfare Secur. Res.

    (2011)
  • IXESHE: An APT campaign

    (2018)
  • Kaspersky: The roof is on fire: Tackling flame’s C&C servers

    (2018)
  • KhalilI. et al.

    Discovering malicious domains through passive DNS data graph analysis

  • Cited by (7)

    View all citing articles on Scopus

    Ming Li received his B.Sc. degree in Computer Science from Jilin University, China in 2017. Currently, he is an M.Sc. candidate in the Computer Science Department at Jilin University, China. His main research focuses on the detection of APT attacks.

    Qiang Li is currently a professor in Computer Science at Jilin University, China. He received his B.Sc., M.Sc., and Ph.D. degrees also from Jilin University in 1998, 2001, and 2005, respectively. His main research interests are in network security and AI security.

    Guangzhe Xuan is currently an associate professor in Computer Science at Jilin University, China. He received his B.Sc. in Chemistry from Peking University, China in 1986. His main research interests include network security and network management.

    Dong Guo is currently an associate professor in Computer Science at Jilin University, China. He received his B.Sc., M.Sc., and Ph.D. degrees also from Jilin University in 1999, 2005, and 2009, respectively. His main research interest is cloud storage security.

    View full text