MEGDroid: A model-driven event generation framework for dynamic android malware analysis
Introduction
Android is the most common operating system in mobile devices with a market share of 87% in 2020 [1]. As malware are major threat for Android, it becomes increasingly necessary to find ways to analyze Android malware in order to understand their behavior and to increase the ability to detect them. Dynamic analysis is a prominent approach for analyzing the behavior of Android apps. It includes running the code in a virtual environment (or in a real device in some cases) to understand its real behavior [2], [3], [4], [5], [6], [7], [8], [9]. Event generation is an essential technique for analyzing the behavior of Android apps in general, because it represents the first step of the analysis process and the generated events are used to guide the dynamic analysis of the samples under test. This is, however, more critical for malware analysis, because malware generally need a combination of complex UI and system events to reach the malicious payloads and explore as much as possible from the malware code. In order to achieve such complex interaction, event generators need to extract information (that can be considered as sources for these events) from the malware, either statically or dynamically. However, there are some limitations for achieving this goal due to the anti-static and anti-dynamic analysis techniques that are usually used by malware to hide their information. This causes the generated events to be insufficient to explore the real behavior of the sample under test. Therefore, there is still an enormous demand for techniques that effectively generate events to exercise more code and activate more malicious payloads in Android malware.
In this paper, we propose a novel approach based on Model-Driven Engineering (MDE) [10], [11] for automatically generating events, specifically for dynamic malware analysis. In the proposed framework, called MEGDroid, we first use a Model-Driven Reverse Engineering (MDRE) approach to initially identify the sources of the potential raising events in the code. MEGDroid automatically extracts the available information about the event sources from the malware code and represents that information as a domain-specific model named Event Source Model (ESM). ESM represents the sources of the events in the malware code, such as the used views, the requested permissions, and the registered receivers, that are involved in generating events. ESM then is analyzed and transformed into another model, called Event Production Model (EPM), using model-to-model transformations. EPM represents the events that will be generated. Finally, MEGDroid uses model-to-code transformations to automatically generate the final working events from EPM. The proposed framework considers all necessary malware app components to extract the event sources from the code, and to generate both types of UI and system events. By UI events, we mean interacting with the activity components of the app under test, which must be identical to the human interaction as much as possible in the case of malware analysis. System events also represent any other events that the malware sample expects, such as environment conditions.
We believe that using a human-in-the-loop approach [12] can be helpful in case of complex interaction scenarios such as event generation for Android malware dynamic analysis. Indeed, the human knowledge should be incorporated into event generation process to make the malware dynamic analysis more effective. However, since manual event generation is always a time-consuming process and requires considerable effort, it has not been considered yet in previous works. One important advantage of using MDE approach in the proposed framework is that it can provide the required facilities for involving the analyzer (i.e., the user during the analysis process) to perform his/her dynamic malware analysis tasks as effectively as possible by adjusting and directing the event generation process. In fact, due to their high level of abstraction, the proposed model- driven artifacts provide proper representation for the generated events that enables the analyzer to easily modify the event generation process for reaching more achievable events considering the little extractable information. Moreover, since EPM is built automatically from ESM (i.e., EPM is not built from scratch by the analyzer), the analyzer involvement is restricted to just modifying EPM according to his/her knowledge about the sample under test. This greatly saves the time required for the analyzer involvement and makes it very efficient. MEGDroid can give the impression that the generated events are coming from human users rather than automatic tools which is very important in case of malware analysis that needs relevant and complex events rather than random events to trigger the malicious payloads.
To evaluate the proposed framework, we performed extensive experimental analysis using AMD malware dataset [13], [14]. We chose 200 samples from 20 different malware families. The malware families are selected such that we can cover a set of diverse functionalities (from event generation perspective) including: a large/small number of activities, with/without activities, complex/simple views in each activity, having/lacking launchers, and having anti-reverse functionalities such as obfuscation, and dynamic load code. Moreover, thanks to the behavioral classification of malware in AMD dataset [13], we selected those families that have common malicious behaviors such as stealing device information, stealing personal information, and connecting to C&C servers. Moreover, all of these families use events to trigger their malicious payloads.
The proposed approach is evaluated and compared with Monkey [15] and DroidBot [16] that are two state of the art general-purpose and malware specific event generators respectively. Note that DroidBot is the only open-source tool that was available for us and includes prominent features and objectives that are comparable with MEGDroid. We consider code coverage, event generation performance, and the number of logged sensitive API calls as three important criteria in generating events for malware analysis. The experimental results show that MEGDroid provides better results than similar tools regarding the mentioned criteria. Comparing with other tools, MEGDroid generates a smaller number of events to reach its results, that shows the effectiveness and efficiency of the proposed approach.
The main contributions of this paper are as follows:
- •
A framework based on MDE approach to facilitate realizing the human-in-the-loop idea and using the human analyzer knowledge for efficiently directing the complex event generation process and hence increasing the effectiveness of the generated events.
- •
The Event Source Meta-model as a domain-specific modeling language that enables both modeling and extracting every possible event source from the malware code.
- •
The Event Production Meta-model as a domain-specific modeling language that enables both modeling and generating different types of events as a response to the extracted sources.
- •
Implementing the framework as an Eclipse plugin to show the feasibility and pertinence of the proposed approach. The plugin has been applied for practical analysis of 200 real-world Android malware as our experimental analysis.
The paper has been organized as follows: Section 2 presents the related work. Section 3 introduces a motivation example that we selected in order to demonstrate the research problem we worked on it. In Section 4 the proposed approach is presented in detail. In Section 5 the evaluation and the results of comparison with related works are presented. In Section 6 we discuss the limitations of the proposed approach. Finally, in Section 7 we conclude the paper.
Section snippets
Related work
Several tools are proposed to generate events for dynamic analysis of Android apps. We discuss them in two categories of general-purpose, and malware specific event generation tools. We first review the general-purpose tools and show that although these tools have many capabilities and some dynamic malware analysis frameworks use them, but they have several shortcomings that motivated some researches to develop malware specific event generators. Then we review the malware specific tools and
Motivation example
To better demonstrate our research problem, in this section we introduce a real-world malware sample and discuss the designated challenges of events generation. The sample is an instance of a well-known Android malware family named Koler. Koler is a family of Android ransomware that locks the device until a ransom is paid. It shows a screen from some law agency which is selected according to the location obtained from the device and asks the user to pay for illegal use of the device. This
The proposed framework
MEGDroid uses an MDE approach for automatically generating events, specifically for dynamic malware analysis. In MDE, models play the most important role. Hence, it aims at abstract representations of the knowledge and activities that govern a particular application domain rather than the algorithmic concepts [10]. MEGDroid first extracts a specific domain model named Event Source Model (ESM) from the given code using the Model Driven Reverse Engineering (MDRE) approach. ESM is an abstract
Experimental evaluation and comparison
To evaluate MEGDroid, we implemented the framework as an eclipse plugin tool. We used MoDisco to extract the model from the code, ATL to achieve model-to-model transformations, and Acceleo for model-to-code transformations. In this section we performed extensive experimental evaluations to address the following research questions:
RQ1) How effective is MEGDroid?
RQ2) How efficient is MEGDroid?
RQ3) How does MEGDroid compares with other Android event generator tools?
RQ4) What is the impact of
Limitations
Since MEGDroid depends on static analysis for extracting information from the code to generate appropriate events, this imposes some limitations as the following:
- 1-
We may not be able to extract any information if the sample under test is highly and completely obfuscated or encrypted. In this case, we completely depend on the analyzer knowledge about the sample under test to almost generate appropriate events for this sample. However, the analyzer knowledge/information still may not be enough
Conclusion
In this paper, we introduced MEGDroid, as a novel model-driven framework, to generate events for Android malware dynamic analysis. This framework includes two meta-models, one for defining the sources of the events, and the other for defining the generated events. MEGDroid provides the ability to generate both system and UI events in addition to the analyzer involvement option for modifying the generated events to become more realistic and human-like. The proposed framework has been realized as
CRediT authorship contribution statement
Hayyan Hasan: Software, Validation, Investigation, Data curtion, Writing – original draft, Visualization. Behrouz Tork Ladani: Conceptualization, Methodology, Validation, Writing – review & editing, Supervision, Project administration. Bahman Zamani: Conceptualization, Methodology, Writing – review & editing, Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (47)
- et al.
CANDYMAN: classifying Android malware families by modelling dynamic traces with Markov chains
Eng. Appl. Artif. Intell.
(2018) - et al.
Effective detection of android malware based on the usage of data flow APIs and machine learning
Inf. Softw. Technol.
(2016) - et al.
MoDisco : a model driven reverse engineering framework
Inf. Softw. Technol.
(2014) - et al.
ATL : a model transformation tool
Sci. Comput. Program
(2008) Android Dominating Mobile Market
(2020)- et al.
A study of run-time behavioral evolution of benign versus malicious apps in android
Inf. Softw. Technol.
(2020) - et al.
DynaLog : an automated dynamic analysis framework for characterizing Android applications
- et al.
MADAM : effective and Efficient Behavior-based Android Malware Detection and Prevention
IEEE Trans. Dependable Secure Comput.
(2016) - et al.
A Novel Dynamic Android Malware Detection System with Ensemble Learning
IEEE Access
(2018) - et al.
Dynamic malware detection and phylogeny analysis using process mining
Int. J. Inf. Security
(2018)
Model-Driven Software Engineering in Practice
VAnDroid : a framework for vulnerability analysis of Android applications using a model-driven reverse engineering technique
Softw.
A “Human-in-the-loop” approach for resolving complex software anomalies
Deep Ground Truth Analysis of Current Android Malware
Android Malware Clustering Through Malicious Payload Mining
DroidBot : a Lightweight UI-Guided Test Input Generator for Android
Dynodroid : an Input Generation System for Android Apps
EHBDroid : beyond GUI Testing for Android Applications
Exploiting the Saturation Effect in Automatic Random Testing of Android Applications
Effective testing of Android apps using extended IFML models
J. Syst. Softw.
Guided, Stochastic Model-Based GUI Testing of Android Apps
Cited by (27)
Maaker: A framework for detecting and defeating evasion techniques in Android malware
2023, Journal of Information Security and ApplicationsAn effective end-to-end android malware detection method
2023, Expert Systems with ApplicationsARdetector: android ransomware detection framework
2024, Journal of SupercomputingMulti-NetDroid: Multi-layer Perceptron Neural Network for Android Malware Detection
2024, Communications in Computer and Information ScienceAn Adversarial Robust Behavior Sequence Anomaly Detection Approach Based on Critical Behavior Unit Learning
2023, IEEE Transactions on ComputersDetecting Ransomware Using Alignment of the Different Sections of the PE Header
2023, Research Square