Elsevier

Computer Science Review

Volume 39, February 2021, 100358
Computer Science Review

Review article
A survey of malware detection in Android apps: Recommendations and perspectives for future research

https://doi.org/10.1016/j.cosrev.2020.100358Get rights and content

Abstract

Android has dominated the smartphone market and has become the most popular operating system for mobile devices. However, security threats in Android applications have also increased in lockstep with Android’s success. More than 3 million new malware samples, targeting the Android operating system were discovered in 2017. Although persistent research efforts have been to address these threats and several detection techniques and tools have been developed as a result, they all exhibit distinct limitations such that no single solution can claim to solve the Android malware problem. In this paper, we survey the main mechanisms and approaches for malware detection in Android applications. We identify the advantages and limitations of each and suggest avenues of research to advance knowledge in this regard.

Introduction

It is a common truism of computer security that the user often inadvertently abets the malware running on his device. To aid in protecting the user against himself, Android’s architecture is largely concordant with the principle of least privilege, stated by Saltzer and Schroeder in their seminal 1975 paper [1], and imposes that an application possesses only the most restrictive set of permissions possible that can still allow it to perform its intended task. However, it is ultimately up to each individual user to decide whether or not to install an application, and to determine which permissions will be granted to each application. Google’s commentary on this issue is as follows [2]:

“When installing an application, users see a screen that explains clearly what information and system resources the application has permission to access, such as a phone’s GPS location. Users must explicitly approve this access in order to continue with the installation, and they may uninstall applications at any time. They can also view ratings and reviews to help decide which applications they choose to install. We consistently advise users to only install apps they trust.”

Nonetheless, when choosing to install an app on their device, the user is often constrained to act more on intuition than on a fact-based decision process. While the user does have access to the permissions requested by the app, he may not be cognizant of the myriad ways in which permissions could be misused to compromise the confidentiality, integrity and availability of his data. The signature, which identifies the publisher of the code, is of little use if this publisher is unknown to the end-user. The user who opts to obtain apps from third-party stores is exposed to even more risks, since these sites frequently contain repackaged apps.

Antivirus software remain the first line of defense for most users. The German security firm AV-Comparatives1 periodically evaluates antivirus software for Windows, Mac OS, Android and Linux. In January 2019, AV-Comparatives tested 250 anti-virus tools on more than two thousand Android apps, with bleak results [3]. Only 80 of them detected a paltry 30% of malicious apps. Over sixty others relied upon a pre-set white list of permitted app names, and did not even perform an elementary scan of the app beyond checking its name from that list. In fact, some of the anti-viruses tested failed to block a single malicious app from the testing dataset.

Even worse, anti-viruses themselves can carry malware or exploitable vulnerabilities. Indeed, two entries in the Drebin malware database are antivirus apps [4]. In this context, it is useful to remember that anti-viruses run continuously on the user’s devices, often with elevated privileges. They thus form an ideal vector to gather data about a user surreptitiously.

Even when they do perform as intended, only 23 of the 250 tools AV-Comparatives examined achieved 100% detection rates, which shows that anti-viruses have limitations. More particularly, any approach based on signatures is inherently reactive, and cannot provide proactive protection of emergent threats. Note that the dataset used by AV-Comparatives consisted of the “2000 most common Android malware threats of 2018”. As the authors themselves observe, with such a benchmark, detection rates between 90% and 100% “should be easily achieved”.

Because of their limitations, anti-viruses should be supplemented by methods based on static analysis and dynamic monitoring of the code. These are tools that will analyze an app to determine if its behavior conforms with a security policy, rather than rely on a signature or a blacklist. The development of these methods is an important current topic of academic and industrial research.

In this paper, we survey the current state of the art of academic research on the topic of malware detection in Android apps, focusing particularly on the more recent developments. Most techniques can broadly be categorized as static methods, which endeavor to detect malware before it is executed, or dynamic analysis that observes the execution of a potentially malicious application and reacts to a violation of a security policy — usually by terminating the execution. Because of its ubiquity, we adopt this classification here.

The large volume of these apps does not allow for a comprehensive listing nor for a rational discussion. To illustrate, AV-Comparatives recensed more than 200 security apps in a single category, namely anti-virus software. In order to have a meaningful analysis, we will limit the scope of the paper to the developments of the last ten years, as they are more likely to impact the immediate future. Within this time limit, rather than present an exhaustive survey, we have opted for a sample of models that span a cross-section of current thought on the topic. In doing so, we seek to capture the range of variability that exists with respect to the following questions:

  • Which methods are employed to perform malware detection on Android systems?

  • On which features or aspects of the Android app is the detection process based?

  • On which dataset is the method tested?

Particular attention was given to the way in which the accuracy of methods is tested and measured. We found that datasets and metrics differed widely, making comparison on an equal footing difficult. We briefly describe each sampled method, focusing on its advantages and drawbacks, and put forth recommendations to guide further research on the topic.

We will only review academic research, published in peer-reviewed journals and conferences, to the exclusion of non-peer reviewed industrial works. Furthermore, we focus exclusively on studies that tackle the problem of malware in Android apps, and exclude studies on the broader problem of malware in general. We also limit ourselves to a ten year horizon, and exclude any work that predates 2009. Since our objective is to present a sample of current thought on the topic, we excluded papers whose method was closely similar to other, already included papers.

The papers were drawn from the following 4 online libraries, which provide a comprehensive coverage of major academic publication venues in the field of software security.

Several researchers have recently surveyed malware detection methods and techniques for Android applications. Naway and LI [5] presented a survey of static, dynamic and hybrid analysis using deep learning techniques to detect the Android malware. They detailed the techniques and specified their strengths and weaknesses. Another survey done by Alzahrani and Alghazzawi [6] on the detection Android malware. They studied eight research papers focused on deep learning for Android Ransomware2 detection and deep learning for Android malware detection. Although Rubiya and Radhamani [7] have analyzed nine Android malware detection approaches, specifying their strengths and weaknesses. Yan and Zheng in their paper [8] surveyed dynamic mobile malware detection. They analyzed, synthesized and compared previous studies on detecting malware in smartphone. In another survey Arshad et al. [9], analyzed the static and dynamic techniques for detection and protection from of Android malware. The techniques analyzed are classified according to the detection mechanism used.

As can be seen in Table 1, our survey focuses on static, dynamic and hybrid Android malware detection methods. It tracks the evolution of malware detection during an eleven year time span, (between 2009 and 2020), a time span that is considerably longer than that considered by other similar studies [5], [6], [7], [8], [9]. In addition, to describing the strengths and weaknesses of the methods, we propose recommendations to guide future research in this topic.

The remainder of this paper is organized as follows: in Sections 2 Static analysis, 3 Dynamic analysis , we examine malware detection mechanisms based on static analysis and dynamic analysis respectively. We also make several recommendations based on our observations of the current state of the art. In Section 4, we review these recommendations and discuss avenues for future research. Concluding remarks are given in Section 5, and a recapitulative table of all surveyed methods is provided in a closing Appendix.

Section snippets

Static analysis

Static analysis encompasses a broad range of methods that seek to discern the runtime behavior of a software prior to its execution. In a security context, the purpose is naturally to weed out potentially malicious apps before they are installed and executed. Static analysis is considered as coarse, since it flags an app as malicious according to an over-approximation of its possible runtime behavior. As a consequence, any static analysis method must maximize effective detection while

Dynamic analysis

Dynamic analysis is an alternative approach to malware detection, which requires running the program to study its behavior and its effects on its environment. Unlike static analysis, it is late in that it only detects a violation right at the moment when it is about to occur. It also suffers from coverage limitations, since it only considers a single execution, rather than all possible program executions.

As we did in the previous section, we organize dynamic tools in four broad categories,

Discussion

Researchers have long realized that traditional malware detection techniques, such as signature-based anti-viruses, are inadequate to provide effective protection against new malware. Consequently, in recent years, several techniques and tools based on behavioral analysis (static or dynamic) have been at the core of malware identification. Table 5 summarizes the existing approaches surveyed in the previous two sections, and Table 2 gives a summary of the recommendations listed throughout the

Conclusion

In this paper, we survey malware detection methods for Android, focusing on the advantages and drawbacks of each and made recommendations for future research on the topic.

Despite the fact that a large number of solutions that have been proposed, several challenges remains to be addressed, especially because of the rapidly evolving nature of malware. We cite difficulties related to code obfuscation, the unavailability of source code and the emerging problem of malware collusion as problems that

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (66)

  • MaiorcaD. et al.

    Stealth attacks: An extended insight into the obfuscation effects on Android malware

    Comput. Secur.

    (2015)
  • SaltzerJ.H. et al.

    The protection of information in computer systems

    Commun. ACM

    (1974)
  • PerezS.

    Tap snake game in Android market is actually spy app (update)

    (2010)
  • AV-ComparativesS.

    Android test 2019 – 250 apps

    (2019)
  • ArpD. et al.

    DREBIN: Effective and explainable detection of Android malware in your pocket

  • NawayA. et al.

    A review on the use of deep learning in android malware detection

    (2018)
  • N. Alzahrani, D. Alghazzawi, A review on Android ransomware detection using deep learning techniques, in: Proceedings...
  • SweetlinR. et al.

    Survey on detection of malware in Android

    Int. J. Latest Trends Eng. Technol.

    (2016)
  • YanP. et al.

    A survey on dynamic mobile malware detection

    Softw. Qual. J.

    (2018)
  • ArshadS. et al.

    Android malware detection & protection: A survey

    Int. J. Adv. Comput. Sci. Appl.

    (2016)
  • ChenT. et al.

    TinyDroid: A lightweight and efficient model for Android malware detection and classification

    Mob. Inf. Syst.

    (2018)
  • ChenJ. et al.

    Detecting Android malware using clone detection

    J. Comput. Sci. Technol.

    (2015)
  • R. Potharaju, A. Newell, C. Nita-Rotaru, X. Zhang, Plagiarizing smartphone applications: Attack strategies and defense...
  • LiuP. et al.

    NSDroid: Efficient multi-classification of android malware using neighborhood signature in local function call graphs

    Int. J. Inf. Secur.

    (2020)
  • WangW. et al.

    DroidEnsemble: Detecting Android malicious applications with ensemble of string and structural static features

    IEEE Access

    (2018)
  • ZhouW. et al.

    Detecting repackaged smartphone applications in third-party android marketplaces

  • BathelotB.

    Définition: SDK publicitaire

    (2018)
  • KhanmohammadiK. et al.

    Empirical study of android repackaged applications

    Empir. Softw. Eng.

    (2019)
  • Suarez-TangilG. et al.

    Droidsieve: Fast and accurate classification of obfuscated Android malware

  • GeurtsP. et al.

    Extremely randomized trees

    Mach. Learn.

    (2006)
  • TongS. et al.

    Support vector machine active learning for image retrieval

  • Tianqi ChenT.H.

    Xgboost: Extreme gradient boosting

    (2019)
  • QiaoM. et al.

    Merging permission and api features for Android malware detection

  • BreimanL.

    Random forests

    Mach. Learn.

    (2001)
  • GoodfellowI. et al.

    Deep Learning

    (2016)
  • WuD. et al.

    DroidMat: Android malware detection through manifest and API calls tracing

  • K. Wagstaff, C. Cardie, S. Rogers, S. Schrödl, Constrained K-means clustering with background knowledge, in:...
  • AltmanN.S.

    An introduction to kernel and nearest-neighbor nonparametric regression

    J. Am. Stat.

    (1992)
  • K. Khanmohammadi, R. Khoury, A. Hamou-Lhadj, On the use of API calls to detect repackaged malware apps: Challenges and...
  • SarmaB.P. et al.

    Android permissions: A perspective combining risks and benefits

  • PengH. et al.

    Using probabilistic generative models for ranking risks of Android apps

  • EnckW. et al.

    On lightweight mobile phone application certification

  • Y. Aafer, W. Du, H. Yin, DroidAPIMiner: Mining API-Level features for robust malware detection in Android, in: Security...
  • Cited by (49)

    • Android malware detection based on multi-head squeeze-and-excitation residual network

      2023, Expert Systems with Applications
      Citation Excerpt :

      Typically, for each app, performing significant operations (e.g., sending SMS message or accessing network) on mobile phones, it must request the corresponding permissions and invoke the sensitive API calls. A series of excellent malware detection methods with permissions and APIs as the main features are proposed (Wu et al., 2021; Arp et al., 2014; Razgallah et al., 2021; Bai et al., 2020). In general, permissions and API calls contain abundant semantics that can server to comprehend the behaviors of an app.

    View all citing articles on Scopus
    View full text