A TAN based hybrid model for android malware detection
Introduction
Android operating system has been dominating the smart phone industry for the past 10 years. Android API framework contains functions to access the sensitive resources in the system. This has enabled the cyber attackers to create malicious applications and distribute them through third party app stores or advertisements via social networks. Further, it is possible for an adversary to inject malicious payloads in the existing applications. The malicious apps enable an attacker to perform various kinds of operations such as stealing the information, sending SMS, remotely control the device etc [19], [21], [43]. Hence, it is necessary to protect smart phones from these malicious applications.
Existing malware detection mechanisms are mainly classified into static, dynamic and hybrid analysis. Static analysis can capture the malicious behavior from an application’s source code without executing it [46]. Dynamic analysis can identify the malicious behavior of an application from its runtime information such as system calls produced during its execution time [32]. The advantage of static analysis is in locating the malicious component from the source code (high code coverage) [23] and that of dynamic analysis is in identifying the exploits in the runtime environment [36]. Hence, the advantages of both static and dynamic analysis mechanisms can be combined to form a hybrid analysis mechanism for achieving better accuracy in malware detection [56], [57]. The existing hybrid mechanisms do not check the interdependency between static and dynamic features used in their machine learning classifier. The interdependency between static and dynamic features leads to multicollinearity problem[2]. Multicollinearity occurs when the correlation between two or more features in a machine learning model is high. This multicollinearity problem can affect the performance of a machine learning classifier.
According to Zhang et al. [55], the source code level API calls can determine the underlying semantics of an application. These APIs are protected by some permissions which need to be declared in the manifest file [20]. It is possible for a developer to declare permissions without API calls or vice versa. Hence, using either permissions or API calls alone as features is not enough to detect the malicious behavior of an application. Therefore, it is required to combine both API calls and permissions for accurately detecting malware applications. Malware applications do not require user triggers for invoking sensitive API calls unlike goodwares [18]. This automated invocation of API calls gets reflected in a system call sequence [49]. It is known that, an application generates system calls in accordance with the execution of API calls during runtime [41]. It is difficult to infer malicious behavior from the system call sequence itself in a short time due to the limited code coverage an application makes during its execution. Hence, we can conclude that static features such as API calls, permissions and dynamic features such as system calls are relevant features for detecting malicious applications and there exist some conditional dependencies among these features. However, using conditionally dependent static and dynamic features as a feature vector in a machine learning classifier for hybrid analysis can lead to multicollinearity problem [2]. There is a special kind of Bayesian network called TAN (Tree Augmented Naive Bayes) used for modeling the conditional dependencies between random variables in the form of a tree. In this paper, we employed a Tree Augmented Naive Bayes (TAN) model to combine the classifier ouput variables corresponding to the static features such as API calls, permissions and the dynamic features such as system calls based on their conditional dependencies for predicting the malicious behavior. This TAN based model can capture the interdependence between static and dynamic features for predicting the malicious behavior. The experimental results show that the proposed mechanism can detect malicious applications over a long period with an accuracy of 0.97.
The rest of the paper is organized as follows. In Section 2, a review of the related works is given. In Section 3, a short description for ridge regularized LR classifier is given. In Section 4, the proposed detection mechanism is given. The experimental results are given in Section 5. In Section 6, classifier retraining for detecting evolving malwares is given. Conclusions and future directions for research are given in Section 7.
Section snippets
Static analysis
Talha et al. [44] suggested a permission based malware identification system called Apkauditor. It analyzes the permissions requested by the application for identifying whether it is malicious or not. Arp et al. [6] suggested a static malware detection mechanism called Drebin. In Drebin, the static features such as hardware components, requested permissions, app components, intent filters etc. are given as input to an SVM classifier for identifying whether it is malicious or not. In [12], Cen
Ridge regularized logistic regression classifier
Let be a labeled dataset, where
be the n dimensional feature vector correspond to the ith element and Yi ∈ {0, 1} denotes its label. Let be any data element and Y denotes its label. Let be the regression parameters. Then, the probability can be estimated using ridge regularized logistic regression [33] as given below.
Let . Then,β is estimated as
Methodology
There is a set of API calls defined in the source code of an application. These API calls need some permissions which are declared in the manifest file [20]. Further, there is a set of system call sequences indirectly specified for every application [22]. In the case of malware applications, this set will be very large. The application will generate a system call sequence from these predefined sequences in accordance with the execution of API calls in the source code. Therefore, there exist
Results and discussions
We have taken 1650 malware applications from Drebin [6], AMD [48], AndroZoo (AZ) [3] and external repositories (Github) [26] and 1650 goodware applications from AndroZoo (AZ) [3] and Google Play (GP)4 for evaluating the performance of our approach. Drebin dataset contains malware samples ranging from 2010 to 12, AMD dataset contains malware samples from 2010 to 2016 and AndroZoo dataset contains malware/goodware samples from 2010 to 2019. AndroZoo is a
Retraining the classifiers for detecting evolving malwares
It is known that, Android tends to revise its API calls time to time. Further, Google had created many new API calls in the recent past years. Among these created set of API calls, some of them have overlapping functionalities with others. Hence, evolving malware applications can use these new API calls for performing the malicious activities. Hence, machine learning models trained with API and permissions in older apps may fail to detect new malwares. In malware detection, concept drift
Conclusion
In this paper, we proposed a novel mechanism for detecting Android malware applications by combining static and dynamic features influencing the malicious activity by exploring their conditional dependencies. The proposed mechanism can accurately capture the malicious behavior than existing static and dynamic analysis mechanisms. However, few malware applications can escape from the detection mechanism by employing adversarial techniques [54]. Therefore, a future direction for the research is
Declaration of Competing Interest
All authors have participated in (a) conception and design, or analysis and interpretation of the data; (b) drafting the article or revising it critically for important intellectual content; and (c) approval of the final version.
This manuscript has not been submitted to, nor is under review at, another journal or other publishing venue.
The authors have no affiliation with any organization with a direct or indirect financial interest in the subject matter discussed in the manuscript
Acknowledgement
This work is done as a part of Center for Research and Innovation in Cyber Threat Resilience project (CRICTR 2018-19), which is funded by Kerala state planning board.
References (57)
- et al.
Toward a more dependable hybrid analysis of android malware using aspect-oriented programming
Comput Secur
(2018) - et al.
Android permissions demystified
Proc. 18th ACM Conf. Comput. Commun. Secur.
(2011) - et al.
A family of droids-android malware detection via behavioral modeling: Static vs dynamic analysis
2018 16th Annual Conference on Privacy, Security and Trust (PST)
(2018) - et al.
Detecting software theft via system call based birthmarks
Comput. Secur. Appl. Conf. 2009. ACSAC’09. Annu.
(2009) Multicollinearity
Wiley Interdisc Rev Comput Stat
(2010)- et al.
Androzoo: collecting millions of android apps for the research community
2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)
(2016) - et al.
Generative versus discriminative classifiers for android anomaly-based detection system using system calls filtering and abstraction process
Secur Commun Netw
(2016) - et al.
Ntpdroid: a hybrid android malware detector using network traffic and system permissions
2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE)
(2018) - et al.
Drebin: effective and explainable detection of android malware in your pocket
Proc. 2014 Netw. Distrib. Syst. Secur. Symp.
(2014) - et al.
SAMADroid: a novel 3-Level hybrid malware detection model for android operating system
IEEE Access
(2018)
An android application sandbox system for suspicious software detection
2010 5th International Conference on Malicious and Unwanted Software (MALWARE 2010)
Crowdroid: behavior-based malware detection system for android
Proc. 1st ACM Work. Secur. Priv. smartphones Mob. devices
Detection of malicious web pages using system calls sequences
International Conference on Availability, Reliability, and Security
Detecting android malware using sequences of system calls
Proceedings of the 3rd International Workshop on Software Development Lifecycle for Mobile
A probabilistic discriminative model for android malware detection with decompiled source code
IEEE Trans Dependable Secur Comput
Stormdroid: a streaminglized machine learning-based system for detecting android malware
Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security
Approximating discrete probability distributions with dependence trees
IEEE Trans Inf Theory
Andarwin: Scalable detection of semantically similar android applications
European Symposium on Research in Computer Security
Droidscribe: classifying android malware based on runtime behavior
2016 IEEE Security and Privacy Workshops (SPW)
Evaluation of android malware detection based on system calls
Proc. 2016 ACM Int. Work. Secur. Priv. Anal. IWSPA ’16
Profiling user-trigger dependence for android malware detection
Comput Secur
Android security: a survey of issues, malware penetration, and defenses
IEEE Commun Surv Tutor
A survey of mobile malware in the wild
Proc. 1st ACM Work. Secur. Priv. smartphones Mob. devices
A sense of self for unix processes
Secur. Privacy, 1996. Proceedings., 1996 IEEE Symp.
Automated test generation for java generics
Int. Conf. Softw. Qual.
Bayesian network classifiers
Mach Learn
Should you consider adware as malware in your study?
2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER)
Cited by (68)
Android malware detection and identification frameworks by leveraging the machine and deep learning techniques: A comprehensive review
2024, Telematics and Informatics ReportsDetection approaches for android malware: Taxonomy and review analysis
2024, Expert Systems with ApplicationsDroidEncoder: Malware detection using auto-encoder based feature extractor and machine learning algorithms
2023, Computers and Electrical EngineeringA comprehensive review on permissions-based Android malware detection
2024, International Journal of Information SecurityDroidExaminer: An Android Malware Hybrid Detection System Based on Ensemble Learning
2024, Journal of Internet Technology