A survey of malware detection in Android apps: Recommendations and perspectives for future research

doi:10.1016/j.cosrev.2020.100358

Computer Science Review

Volume 39, February 2021, 100358

https://doi.org/10.1016/j.cosrev.2020.100358 Get rights and content

Abstract

Android has dominated the smartphone market and has become the most popular operating system for mobile devices. However, security threats in Android applications have also increased in lockstep with Android’s success. More than 3 million new malware samples, targeting the Android operating system were discovered in 2017. Although persistent research efforts have been to address these threats and several detection techniques and tools have been developed as a result, they all exhibit distinct limitations such that no single solution can claim to solve the Android malware problem. In this paper, we survey the main mechanisms and approaches for malware detection in Android applications. We identify the advantages and limitations of each and suggest avenues of research to advance knowledge in this regard.

Introduction

It is a common truism of computer security that the user often inadvertently abets the malware running on his device. To aid in protecting the user against himself, Android’s architecture is largely concordant with the principle of least privilege, stated by Saltzer and Schroeder in their seminal 1975 paper [1], and imposes that an application possesses only the most restrictive set of permissions possible that can still allow it to perform its intended task. However, it is ultimately up to each individual user to decide whether or not to install an application, and to determine which permissions will be granted to each application. Google’s commentary on this issue is as follows [2]:

“When installing an application, users see a screen that explains clearly what information and system resources the application has permission to access, such as a phone’s GPS location. Users must explicitly approve this access in order to continue with the installation, and they may uninstall applications at any time. They can also view ratings and reviews to help decide which applications they choose to install. We consistently advise users to only install apps they trust.”

Nonetheless, when choosing to install an app on their device, the user is often constrained to act more on intuition than on a fact-based decision process. While the user does have access to the permissions requested by the app, he may not be cognizant of the myriad ways in which permissions could be misused to compromise the confidentiality, integrity and availability of his data. The signature, which identifies the publisher of the code, is of little use if this publisher is unknown to the end-user. The user who opts to obtain apps from third-party stores is exposed to even more risks, since these sites frequently contain repackaged apps.

Antivirus software remain the first line of defense for most users. The German security firm AV-Comparatives¹ periodically evaluates antivirus software for Windows, Mac OS, Android and Linux. In January 2019, AV-Comparatives tested $250$ anti-virus tools on more than two thousand Android apps, with bleak results [3]. Only 80 of them detected a paltry 30% of malicious apps. Over sixty others relied upon a pre-set white list of permitted app names, and did not even perform an elementary scan of the app beyond checking its name from that list. In fact, some of the anti-viruses tested failed to block a single malicious app from the testing dataset.

Even worse, anti-viruses themselves can carry malware or exploitable vulnerabilities. Indeed, two entries in the Drebin malware database are antivirus apps [4]. In this context, it is useful to remember that anti-viruses run continuously on the user’s devices, often with elevated privileges. They thus form an ideal vector to gather data about a user surreptitiously.

Even when they do perform as intended, only 23 of the 250 tools AV-Comparatives examined achieved 100% detection rates, which shows that anti-viruses have limitations. More particularly, any approach based on signatures is inherently reactive, and cannot provide proactive protection of emergent threats. Note that the dataset used by AV-Comparatives consisted of the “2000 most common Android malware threats of 2018”. As the authors themselves observe, with such a benchmark, detection rates between 90% and 100% “should be easily achieved”.

Because of their limitations, anti-viruses should be supplemented by methods based on static analysis and dynamic monitoring of the code. These are tools that will analyze an app to determine if its behavior conforms with a security policy, rather than rely on a signature or a blacklist. The development of these methods is an important current topic of academic and industrial research.

In this paper, we survey the current state of the art of academic research on the topic of malware detection in Android apps, focusing particularly on the more recent developments. Most techniques can broadly be categorized as static methods, which endeavor to detect malware before it is executed, or dynamic analysis that observes the execution of a potentially malicious application and reacts to a violation of a security policy — usually by terminating the execution. Because of its ubiquity, we adopt this classification here.

The large volume of these apps does not allow for a comprehensive listing nor for a rational discussion. To illustrate, AV-Comparatives recensed more than 200 security apps in a single category, namely anti-virus software. In order to have a meaningful analysis, we will limit the scope of the paper to the developments of the last ten years, as they are more likely to impact the immediate future. Within this time limit, rather than present an exhaustive survey, we have opted for a sample of models that span a cross-section of current thought on the topic. In doing so, we seek to capture the range of variability that exists with respect to the following questions:

•
Which methods are employed to perform malware detection on Android systems?
•
On which features or aspects of the Android app is the detection process based?
•
On which dataset is the method tested?

Particular attention was given to the way in which the accuracy of methods is tested and measured. We found that datasets and metrics differed widely, making comparison on an equal footing difficult. We briefly describe each sampled method, focusing on its advantages and drawbacks, and put forth recommendations to guide further research on the topic.

We will only review academic research, published in peer-reviewed journals and conferences, to the exclusion of non-peer reviewed industrial works. Furthermore, we focus exclusively on studies that tackle the problem of malware in Android apps, and exclude studies on the broader problem of malware in general. We also limit ourselves to a ten year horizon, and exclude any work that predates 2009. Since our objective is to present a sample of current thought on the topic, we excluded papers whose method was closely similar to other, already included papers.

The papers were drawn from the following 4 online libraries, which provide a comprehensive coverage of major academic publication venues in the field of software security.

1.
IEEE (https://www.ieee.org/).
2.
USENIX (https://www.usenix.org/).
3.
ACM Digital Library (https://dl.acm.org/).
4.
Springer Link (https://link.springer.com/).

Several researchers have recently surveyed malware detection methods and techniques for Android applications. Naway and LI [5] presented a survey of static, dynamic and hybrid analysis using deep learning techniques to detect the Android malware. They detailed the techniques and specified their strengths and weaknesses. Another survey done by Alzahrani and Alghazzawi [6] on the detection Android malware. They studied eight research papers focused on deep learning for Android Ransomware² detection and deep learning for Android malware detection. Although Rubiya and Radhamani [7] have analyzed nine Android malware detection approaches, specifying their strengths and weaknesses. Yan and Zheng in their paper [8] surveyed dynamic mobile malware detection. They analyzed, synthesized and compared previous studies on detecting malware in smartphone. In another survey Arshad et al. [9], analyzed the static and dynamic techniques for detection and protection from of Android malware. The techniques analyzed are classified according to the detection mechanism used.

As can be seen in Table 1, our survey focuses on static, dynamic and hybrid Android malware detection methods. It tracks the evolution of malware detection during an eleven year time span, (between 2009 and 2020), a time span that is considerably longer than that considered by other similar studies [5], [6], [7], [8], [9]. In addition, to describing the strengths and weaknesses of the methods, we propose recommendations to guide future research in this topic.

The remainder of this paper is organized as follows: in Sections 2 Static analysis, 3 Dynamic analysis , we examine malware detection mechanisms based on static analysis and dynamic analysis respectively. We also make several recommendations based on our observations of the current state of the art. In Section 4, we review these recommendations and discuss avenues for future research. Concluding remarks are given in Section 5, and a recapitulative table of all surveyed methods is provided in a closing Appendix.

Section snippets

Static analysis

Static analysis encompasses a broad range of methods that seek to discern the runtime behavior of a software prior to its execution. In a security context, the purpose is naturally to weed out potentially malicious apps before they are installed and executed. Static analysis is considered as coarse, since it flags an app as malicious according to an over-approximation of its possible runtime behavior. As a consequence, any static analysis method must maximize effective detection while

Dynamic analysis

Dynamic analysis is an alternative approach to malware detection, which requires running the program to study its behavior and its effects on its environment. Unlike static analysis, it is late in that it only detects a violation right at the moment when it is about to occur. It also suffers from coverage limitations, since it only considers a single execution, rather than all possible program executions.

As we did in the previous section, we organize dynamic tools in four broad categories,

Discussion

Researchers have long realized that traditional malware detection techniques, such as signature-based anti-viruses, are inadequate to provide effective protection against new malware. Consequently, in recent years, several techniques and tools based on behavioral analysis (static or dynamic) have been at the core of malware identification. Table 5 summarizes the existing approaches surveyed in the previous two sections, and Table 2 gives a summary of the recommendations listed throughout the

Conclusion

In this paper, we survey malware detection methods for Android, focusing on the advantages and drawbacks of each and made recommendations for future research on the topic.

Despite the fact that a large number of solutions that have been proposed, several challenges remains to be addressed, especially because of the rapidly evolving nature of malware. We cite difficulties related to code obfuscation, the unavailability of source code and the emerging problem of malware collusion as problems that

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (66)

MaiorcaD. et al.
Stealth attacks: An extended insight into the obfuscation effects on Android malware
Comput. Secur.
(2015)
SaltzerJ.H. et al.
The protection of information in computer systems
Commun. ACM
(1974)
PerezS.
Tap snake game in Android market is actually spy app (update)
(2010)
AV-ComparativesS.
Android test 2019 – 250 apps
(2019)
ArpD. et al.
DREBIN: Effective and explainable detection of Android malware in your pocket
NawayA. et al.
A review on the use of deep learning in android malware detection
(2018)
N. Alzahrani, D. Alghazzawi, A review on Android ransomware detection using deep learning techniques, in: Proceedings...
SweetlinR. et al.
Survey on detection of malware in Android
Int. J. Latest Trends Eng. Technol.
(2016)
YanP. et al.
A survey on dynamic mobile malware detection
Softw. Qual. J.
(2018)
ArshadS. et al.
Android malware detection & protection: A survey
Int. J. Adv. Comput. Sci. Appl.
(2016)

ChenT. et al.

TinyDroid: A lightweight and efficient model for Android malware detection and classification

Mob. Inf. Syst.

(2018)

ChenJ. et al.

Detecting Android malware using clone detection

J. Comput. Sci. Technol.

(2015)

R. Potharaju, A. Newell, C. Nita-Rotaru, X. Zhang, Plagiarizing smartphone applications: Attack strategies and defense...

LiuP. et al.

NSDroid: Efficient multi-classification of android malware using neighborhood signature in local function call graphs

Int. J. Inf. Secur.

(2020)

WangW. et al.

DroidEnsemble: Detecting Android malicious applications with ensemble of string and structural static features

IEEE Access

(2018)

ZhouW. et al.

Detecting repackaged smartphone applications in third-party android marketplaces

BathelotB.

Définition: SDK publicitaire

(2018)

KhanmohammadiK. et al.

Empirical study of android repackaged applications

Empir. Softw. Eng.

(2019)

Suarez-TangilG. et al.

Droidsieve: Fast and accurate classification of obfuscated Android malware

GeurtsP. et al.

Extremely randomized trees

Mach. Learn.

(2006)

TongS. et al.

Support vector machine active learning for image retrieval

Tianqi ChenT.H.

Xgboost: Extreme gradient boosting

(2019)

QiaoM. et al.

Merging permission and api features for Android malware detection

BreimanL.

Random forests

Mach. Learn.

(2001)

GoodfellowI. et al.

Deep Learning

(2016)

WuD. et al.

DroidMat: Android malware detection through manifest and API calls tracing

K. Wagstaff, C. Cardie, S. Rogers, S. Schrödl, Constrained K-means clustering with background knowledge, in:...

AltmanN.S.

An introduction to kernel and nearest-neighbor nonparametric regression

J. Am. Stat.

(1992)

K. Khanmohammadi, R. Khoury, A. Hamou-Lhadj, On the use of API calls to detect repackaged malware apps: Challenges and...

SarmaB.P. et al.

Android permissions: A perspective combining risks and benefits

PengH. et al.

Using probabilistic generative models for ranking risks of Android apps

EnckW. et al.

On lightweight mobile phone application certification

Y. Aafer, W. Du, H. Yin, DroidAPIMiner: Mining API-Level features for robust malware detection in Android, in: Security...

Cited by (49)

GHGDroid: Global heterogeneous graph-based android malware detection
2024, Computers and Security
As the most popular mobile platform, Android has become the major attack target of malware, and thus there is an urgent need to effectively thwart them. Recently, the graph-based technique has been a promising solution for malware detection, which highly depends on graph structures to capture behaviors separating the malware from the benign apps. However, existing graph-based malware detection approaches still suffer from high computation cost in constructing or updating a graph for APK under detection, high false negative and false positive. To cope with these issues, we propose a novel global heterogeneous graph-based Android malware detection approach, named GHGDroid. A global heterogeneous graph (GHG) with a good updatability is first built on large-scale Android applications to characterize complex relationships among APKs and sensitive APIs. And then, using the GHG, a multi-layer graph convolutional network based embedding method is proposed to learn APK embeddings for well capturing behaviors that can separate malware from benign. Finally, using APK embeddings as well their labels, a malware classifier is trained. Experiments on real-world Android applications show that GHGDroid achieves 99.17 % F1-score, which outperforms the state-of-the-art approaches. Moreover, GHGDroid spends about 8 s on detecting an APK, which shows that it has a good potential as a practical tool for the Android malware detection task.
Detection approaches for android malware: Taxonomy and review analysis
2024, Expert Systems with Applications
The main objective of this review is to present an in-depth study of Android malware detection approaches. This article provides a comprehensive survey of 150 studies on Android malware detection from 2010 to 2022. Two broader categories like traditional signature-based and behavior-based approaches are discussed throughout the review process. The behavior-based detection approaches are further categorized in to static, dynamic, and hybrid analysis methods. The survey has conducted in different dimensions including detection approaches, datasets used, features, sustainability of the solutions, etc. Although researchers have proposed detection tools and techniques to develop efficient countermeasures against Android malware, there is a scarcity of a concise review for research practitioners in this subject area. The survey shows there is a great deal of interest in machine learning-based detection methods among the research community. The review not only provides an authentic assessment of the malware detection capabilities of different approaches but also presents observations and suggestions regarding various aspects of the Android malware ecosystem. These observations and suggestions are intended to assist researchers in enhancing further research towards the subject domain.
Machine Learning for Android Malware Detection: Mission Accomplished? A Comprehensive Review of Open Challenges and Future Perspectives
2024, Computers and Security
The extensive research in machine learning based Android malware detection showcases high-performance metrics through a wide range of proposed solutions. Consequently, this fosters the (mis)conception of being a solved problem, diminishing its appeal for further research. However, after surveying and scrutinizing the related literature, this deceptive deduction is debunked. In this paper, we identify five significant unresolved challenges neglected by the specialized research that prevent the qualification of Android malware detection as a solved problem. From methodological flaws to invalid postulates and data set limitations, these challenges, which are thoroughly described throughout the paper, hamper effective, long-term machine learning based Android malware detection. This comprehensive review of the state of the art highlights and motivates future research directions in the Android malware detection domain that may bring the problem closer to being solved.
SeGDroid: An Android malware detection method based on sensitive function call graph learning[Formula presented]
2024, Expert Systems with Applications
Malware is still a challenging security problem in the Android ecosystem, as malware is often obfuscated to evade detection. In such case, semantic behavior feature extraction is crucial for training a robust malware detection model. In this paper, we propose a novel Android malware detection method (named SeGDroid) that focuses on learning the semantic knowledge from sensitive function call graphs (FCGs). Specifically, we devise a graph pruning method to build a sensitive FCG on the base of an original FCG. The method preserves the sensitive API (security-related API) call context and removes the irrelevant nodes of FCGs. We propose a node representation method based on word2vec and social-network-based centrality to extract attributes for graph nodes. Our representation aims at extracting the semantic knowledge of the function calls and the structure of graphs. Using this representation, we induce graph embeddings of the sensitive FCGs associated with node attributes using a graph convolutional neural network algorithm. To provide a model explanation, we further propose a method that calculates node importance. This creates a mechanism for understanding malicious behavior. The experimental results show that SeGDroid achieves an F-score of 98% in the case of malware detection on the CICMal2020 dataset and an F-score of 96% in the case of malware family classification on the MalRadar dataset. In addition, the provided model explanation is able to trace the malicious behavior of the Android malware.
A comprehensive survey on deep learning based malware detection techniques
2023, Computer Science Review
Recent theoretical and practical studies have revealed that malware is one of the most harmful threats to the digital world. Malware mitigation techniques have evolved over the years to ensure security. Earlier, several classical methods were used for detecting malware embedded with various features like the signature, heuristic, and others. Traditional malware detection techniques were unable to defeat new generations of malware and their sophisticated obfuscation tactics. Deep Learning is increasingly used in malware detection as DL-based systems outperform conventional malware detection approaches at finding new malware variants. Furthermore, DL-based techniques provide rapid malware prediction with excellent detection rates and analysis of different malware types. Investigating recently proposed Deep Learning-based malware detection systems and their evolution is hence of interest to this work. It offers a thorough analysis of the recently developed DL-based malware detection techniques. Furthermore, current trending malwares are studied and detection techniques of Mobile malware (both Android and iOS), Windows malware, IoT malware, Advanced Persistent Threats (APTs), and Ransomware are precisely reviewed.
Android malware detection based on multi-head squeeze-and-excitation residual network
2023, Expert Systems with Applications
Citation Excerpt :
Typically, for each app, performing significant operations (e.g., sending SMS message or accessing network) on mobile phones, it must request the corresponding permissions and invoke the sensitive API calls. A series of excellent malware detection methods with permissions and APIs as the main features are proposed (Wu et al., 2021; Arp et al., 2014; Razgallah et al., 2021; Bai et al., 2020). In general, permissions and API calls contain abundant semantics that can server to comprehend the behaviors of an app.
The popularity and flexibility of the Android platform makes it the primary target of malicious attackers. The behaviors of malware, such as malicious charges and privacy theft, pose serious security threats to users. Permission granting, as the primary security scheme of Android, is a prerequisite for performing dangerous operations on devices by invoking Application Programming Interfaces (APIs). Besides, permission and hardware features are jointly declared in the manifest file of an application (app) to guarantee its device compatibility. Thus, we extract permissions, API calls and hardware features to characterize apps. Furthermore, we design a novel architectural unit, Multi-Head Squeeze-and-Excitation Residual block (MSer), to learn the intrinsic correlation between features and recalibrating them from multiple perspectives. Based on these two works, we propose a new malware detection framework MSerNetDroid. To investigate the effectiveness of the proposed framework, we analyzed 2,126 malicious apps and 1,061 benign ones collected from VirusShare and Google Play Store. The assessment results demonstrate that the proposed model successful detects malware with an accuracy of 96.48%. We also compare the proposed method with the state-of-the-art approaches, including the use of diversity static features and various detection algorithms. These promising experimental results consistently show that MSerNetDroid is an effective way to detect Android malware.

View all citing articles on Scopus

View full text

Review articleA survey of malware detection in Android apps: Recommendations and perspectives for future research

Abstract

Introduction

Section snippets

Static analysis

Dynamic analysis

Discussion

Conclusion

Declaration of Competing Interest

Comput. Secur.

The protection of information in computer systems

Commun. ACM

Tap snake game in Android market is actually spy app (update)

Android test 2019 – 250 apps

DREBIN: Effective and explainable detection of Android malware in your pocket

A review on the use of deep learning in android malware detection

Survey on detection of malware in Android

Int. J. Latest Trends Eng. Technol.

A survey on dynamic mobile malware detection

Softw. Qual. J.

Android malware detection & protection: A survey

Int. J. Adv. Comput. Sci. Appl.

TinyDroid: A lightweight and efficient model for Android malware detection and classification

Mob. Inf. Syst.

Detecting Android malware using clone detection

J. Comput. Sci. Technol.

NSDroid: Efficient multi-classification of android malware using neighborhood signature in local function call graphs

Int. J. Inf. Secur.

DroidEnsemble: Detecting Android malicious applications with ensemble of string and structural static features

IEEE Access

Detecting repackaged smartphone applications in third-party android marketplaces

Définition: SDK publicitaire

Empirical study of android repackaged applications

Empir. Softw. Eng.

Droidsieve: Fast and accurate classification of obfuscated Android malware

Extremely randomized trees

Mach. Learn.

Support vector machine active learning for image retrieval

Xgboost: Extreme gradient boosting

Merging permission and api features for Android malware detection

Random forests

Mach. Learn.

Deep Learning

DroidMat: Android malware detection through manifest and API calls tracing

An introduction to kernel and nearest-neighbor nonparametric regression

J. Am. Stat.

Android permissions: A perspective combining risks and benefits

Using probabilistic generative models for ranking risks of Android apps

On lightweight mobile phone application certification

Review article
A survey of malware detection in Android apps: Recommendations and perspectives for future research