Elsevier

Computer Networks

Volume 179, 9 October 2020, 107372
Computer Networks

App trajectory recognition over encrypted internet traffic based on deep neural network

https://doi.org/10.1016/j.comnet.2020.107372Get rights and content

Abstract

Despite the increasing popularity of mobile applications and the widespread adoption of encryption techniques, mobile devices are still susceptible to security and privacy risks. In this paper, we propose ActiveTracker, a new type of sniffing attack that can reveal the fine-grained trajectory of userâs mobile app usage from a sniffed encrypted Internet traffic stream. It firstly adopts a sliding window based approach to divide the encrypted traffic stream into a sequence of segments corresponding to different app activities. Then each traffic segment is represented by a normalized temporal-spacial traffic matrix and a traffic spectrum vector. Based on the normalized representation, a deep neural network (DNN) model which consists of an app filter and an activity classifier is developed to extract comprehensive features from the input and uncover the crucial app usage trajectory conducted by the user. By extensive experiments on real-world app usage traffic collected from volunteers and on our synthetic traffic data, we show that the proposed approach achieves up to 79.65% accuracy in recognizing app trajectory over encrypted traffic streams.

Introduction

The popularity of mobile applications (apps) is increasing dramatically in the past few years. People frequently use mobile apps for social interaction, online shopping, gaming, route navigation, enjoying music, watching videos, etc. According to the report [1], in the year of 2018, mobile apps accounted for 58% of worldwide Internet traffic and will continue to grow rapidly.

Due to the broadcast nature of wireless communications, mobile devices are susceptible to security and privacy risks. Malicious attacks such as sniffing may reveal users’ sensitive information [2], [3], [4], [5], [6]. For example, the traffic classification techniques [7], [8], by inspecting the headers (e.g., protocol type, IP address, port, etc) of the IP packets and the payloads, can infer the application types and the corresponding protocols (e.g., email, news, VoIP, etc). To enhance security, encryption techniques have been applied in different levels of the communication process [9]. For example, the Transport Layer Security (TLS) protocol has been widely used by many mobile apps to encrypt the application data to avoid the inspection of payload [10], [11]. The Internet Protocol Security (IPsec) protocol can be used to encrypt data flows between a pair of hosts. The Wired Equivalent Privacy (WEP) and Wi-Fi Protected Access (WPA) standard have been widely applied in wireless local area networks (WLANs) to prevent unauthorized access to the network. However, the recent researches [12], [13], [14] showed that, the information of mobile app usage can be inferred by examining the temporal-spacial patterns of the encrypted Internet traffic packets.

The works of app fingerprinting [15], [16], [17], [18] tended to establish unique features for app distinction. The features were extracted from the traffic level, code level, and system level. For example, NetworkProfiler [15] automatically generated network profiles for identifying Android apps according to the HTTP headers in the traffic level. AppDNA [19] inspected the function-call-graph to form app fingerprint in the code level. POWERFUL [17] fingerprinted mobile apps by analyzing their power consumption patterns in the system level.

Recently, several works focused on in-app activity classification [12], [13], [20] that aimed to recognize the usage of different services within a particular app such as Whatsapp. Fu et al. proposed a system for classifying service usages of mobile messaging apps by jointly modeling user behavioral patterns, network traffic characteristics, and temporal dependencies [12], and developed an online analyzer to improve feature extraction and achieve in-app activity classification in real-time [13]. However, the existing works only focused on identifying the activity within a particular app, and they lacked the ability to recognize both app and in-app activity in a fine-grained level.

In this paper1, we address a more challenging task: uncovering the trajectory of user’s mobile app usage from a continuous encrypted Internet traffic stream. Specifically, we focus on the app trajectory recognition problem: inferring which apps are used to conduct what activities by analysing the encrypted Internet traffic stream sniffed from a user. Fig. 1 illustrates an example that a malicious attacker sniffs the encrypted traffic of a user via a public access point. As shown in the figure, there is a clear pattern (e.g., the packet size, the packet interval, etc.) in the encrypted traffic stream when the user conducts different activities with different apps. By exploring the patterns, a well-designed algorithm can uncover the trajectory of mobile app usage in a fine-grained level. In other words, the described technology can be considered as a new form of attack: an adversary can sniff the encrypted traffic and infer user’s sensitive information such as “sending pictures with Skype and transferring money with PayPal”.

The conventional works of app fingerprinting and in-app activity classification cannot solve the app trajectory recognition problem directly. The reason and technical challenges are discussed as follows. Firstly, the conventional approaches were designed for recognizing either app or activity, but not both. The combination of apps and activities forms a more complicated classification task, which yields low recognition accuracy with the conventional approaches (as shown in the performance analysis in Section 6). Secondly, the conventional approaches used hand-crafted features for classification. The extraction of features heavily relies on human experience, and the hand-crafted features are not thorough enough to differentiate similar activities on different apps (e.g., text messaging with Skype and text messaging with WeChat), which leads to poor performance as shown in Section 6. Thirdly, to uncover the trajectory of app usage from a continuous encrypted traffic stream, a method is needed to correctly partition the traffic stream into segments representing different activities, which has not been well studied in the past.

To address these challenges, we propose ActiveTracker, a novel deep learning framework to uncover the trajectory of app usage from the encrypted Internet traffic stream. Inspired by the great success of deep learning techniques in different research areas such as street-level imagery [22], [23], [24], [25] and trajectory prediction [26], [27], [28], we exploit the ability of deep learning in automatic feature extraction from complex high-dimensional input signals and end-to-end learning to capture non-linear dependencies for app trajectory recognition. In the proposed deep learning framework, it first adopts a sliding window based approach to divide the traffic stream into a sequence of segments, where each segment corresponds to an app activity. The traffic segment is then normalized and represented by a temporal-spacial traffic matrix and a traffic spectrum vector. Using the normalized data as input, a deep neural network (DNN) model is proposed for activity recognition. Combining the recognition results of the sequence of traffic segments, the trajectory of app usage can be uncovered, which may lead to the leakage of sensitive personal information of the mobile users.

Table 1 highlights the differences of ActiveTracker and the existing works on app fingerprinting and in-app activity classification. The main results and contributions of this paper are summarized as follows:

  • We design a novel sliding window based approach for encrypted traffic stream segmentation, which is able to accurately partition an encrypted Internet traffic stream into multiple single-activity sub-streams.

  • We propose a DNN-based classification model for activity recognition from traffic segments. The proposed model uses convolutional neural network to extract features from the traffic segments automatically, and achieves high accuracy in activity recognition. To the best of our knowledge, we are the first to solve the problem of app trajectory recognition over encrypted Internet traffic streams.

  • We conduct extensive experiments based on real-world Internet traffic collected from volunteers. The results show that the proposed approach achieves up to 79.65% accuracy in uncovering app trajectory from a long traffic stream. Our work will draw people’s attention to privacy protection of mobile app communications.

The rest of the paper is organized as follows. Section 2 summarizes the related works. Section 3 formulates the research problem of app trajectory recognition over encrypted traffic streams. Section 4 introduces our proposed solution in detail. Section 5 introduces our data collection method and the statistics of the collected traffic. Section 6 evaluates our solution framework with extensive experiments. Finally, we conclude our paper in Section 7.

Section snippets

Related work

We summarize the related work into three categories: Internet traffic classification, app fingerprinting, and in-app activity classification.

Problem formulation

In this section, we introduce the adversary model for recognizing app trajectory from encrypted traffic and formulate the problem of mobile app traffic segmentation and classification.

App trajectory recognition based on DNN

In this section, we first overview the framework of app trajectory recognition based on DNN. Then, we in detail introduce three major components of our framework: a novel sliding window based approach for encrypted traffic segmentation, a method for data representation, and a DNN classification model for app activity recognition.

Data collection and processing

To evaluate the performance of our framework, we conducted experiments based on both real-world collected smartphone traffic dataset and synthetic dataset. The dataset collection and precessing are described as follows.

Performance evaluation

To validate the effectiveness of our proposed framework, we conducted extensive experiments on the collected real-world traffic data as well as the synthetic traffic data. All experiments were performed on a single machine with Intel Core i5-6600 processors (4 cores / 4 threads), 8 GB of memory, and NVIDIA GeForce GTX 1070 GPU. To implement our proposed DNN model, we used the PyTorch library [53], [54].

Conclusion

In this paper, we propose ActiveTracker, a framework to recognize app trajectory over encrypted Internet traffic streams. First, the incoming Internet traffic of mobile apps is segmented into several single-activity subsequences by a sliding window based approach. Then, each traffic segment is represented by a temporal-spacial traffic matrix and a traffic spectrum vector. Using the normalized data as input, we propose a deep neural network (DNN) model to combine the features from different

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was partially supported by the National Key R&D Program of China (Grant No. 2017YFB1001801), the National Natural Science Foundation of China (Grant Nos. 61972196, 61672278, 61832008, 61832005), the open Project from the State Key Laboratory of Smart Grid Protection and Operation Control “Research on Smart Integration of Terminal-Edge-Cloud Techniques for Pervasive Internet of Things”, the Collaborative Innovation Center of Novel Software Technology and Industrialization, and the

Ding Li received his B.E. degree in computer science and technology from Beijing University of Posts and Telecommunications and his M.S. degree in computer science from University of California, Santa Cruz. He is now a Ph.D. candidate in the Department of Computer Science and Technology, Nanjing University. His research interests include wireless network measurement, mobile computing, and deep learning.

References (57)

  • G. Yang et al.

    Study and mitigation of origin stripping vulnerabilities in hybrid-postmessage enabled mobile applications

    IEEE Symposium on Security and Privacy (S & P)

    (2018)
  • A. Possemato et al.

    Clickshield: are you hiding something? Towards eradicating clickjacking on android

    ACM SIGSAC Conference on Computer and Communications Security (CCS)

    (2018)
  • W. Zhou et al.

    Discovering and understanding the security hazards in the interactions between IoT devices, mobile apps, and clouds on smart home platforms

    USENIX Security Symposium

    (2019)
  • T. Karagiannis et al.

    Transport layer identification of p2p traffic

    ACM SIGCOMM Internet Measurement Conference (IMC)

    (2004)
  • S. Sen et al.

    Accurate, scalable in-network identification of p2p traffic using application signatures

    International Conference on World Wide Web (WWW)

    (2004)
  • A. Pham et al.

    Hidemyapp: hiding the presence of sensitive apps on android

    USENIX Security Symposium

    (2019)
  • M. Shen et al.

    Classification of encrypted traffic with second-order Markov chains and application attribute bigrams

    IEEE Trans. Inf. Forensics Secur.

    (2017)
  • S. Sengupta et al.

    Exploiting diversity in android TLS implementations for mobile app traffic classification

    The World Wide Web Conference (WWW)

    (2009)
  • Y. Fu et al.

    Service usage classification with encrypted internet traffic in mobile messaging apps

    IEEE Trans. Mob. Comput. (TMC)

    (2016)
  • J. Liu et al.

    Effective and real-time in-app activity analysis in encrypted internet traffic streams

    ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD)

    (2017)
  • T. Bao et al.

    An unsupervised approach to modeling personalized contexts of mobile users

    IEEE International Conference on Data Mining (ICDM)

    (2010)
  • S. Dai et al.

    Networkprofiler: towards automatic fingerprinting of android apps

    IEEE International Conference on Computer Communications (INFOCOM)

    (2013)
  • H. Yao et al.

    Samples:self adaptive mining of persistent lexical snippets for classifying mobile application traffic

    ACM International Conference on Mobile Computing and Networking (MobiCom)

    (2015)
  • S. Xu et al.

    Appdna: app behavior profiling via graph-based deep learning

    IEEE Conference on Computer Communications (INFOCOM)

    (2018)
  • Y. Fu et al.

    A multi-label multi-view learning framework for in-app service usage analysis

    ACM Trans. Intell. Syst. Technol.

    (2018)
  • Y. Ye et al.

    The visual quality of streets: a human-centred continuous measurement based on machine learning algorithms and street view images

    Environ. Plann. B

    (2019)
  • D. Anguelov et al.

    Google street view: capturing the world at street level

    IEEE Comput.

    (2010)
  • A.B. Adege et al.

    Mobility predictions for IoT devices using gated recurrent unit network

    IEEE Internet Things J.

    (2020)
  • Cited by (16)

    • DISTILLER: Encrypted traffic classification via multimodal multitask deep learning

      2021, Journal of Network and Computer Applications
      Citation Excerpt :

      Some works have leveraged DL approaches specialized for time-series (Lopez-Martin et al., 2017; Yao et al., 2019). Also, reflecting the modern nature of traffic, most approaches have focused on ET (Wang et al., 2017; Zeng et al., 2019; Yao et al., 2020; Li et al., 2020; Dong et al., 2020). We report relevant works in these lines of research in Table 1, that summarizes the key aspects of each paper.

    • DataZoo: Streamlining Traffic Classification Experiments

      2023, SAFE 2023 - Proceedings of the 2023 Explainable and Safety Bounded, Fidelitous, Machine Learning for Networking
    View all citing articles on Scopus

    Ding Li received his B.E. degree in computer science and technology from Beijing University of Posts and Telecommunications and his M.S. degree in computer science from University of California, Santa Cruz. He is now a Ph.D. candidate in the Department of Computer Science and Technology, Nanjing University. His research interests include wireless network measurement, mobile computing, and deep learning.

    Wenzhong Li receives his B.S. and Ph.D degree from Nanjing University, China, both in computer science. He was an Alexander von Humboldt Scholar Fellow in University of Goettingen, Germany. He is now a full professor in the Department of Computer Science, Nanjing University. Dr. Li’s research interests include distributed computing, data mining, mobile cloud computing, wireless networks, pervasive computing, and social networks. He has published over 100 peer-review papers at international conferences and journals, which include INFOCOM, UBICOMP, IJCAI, ACM Multimedia, ICDCS, IEEE Communications Magazine, IEEE/ACM Transactions on Networking (ToN), IEEE Journal on Selected Areas in Communications (JSAC), IEEE Transactions on Parallel and Distributed Systems (TPDS), IEEE Transactions on Wireless Communications (TWC), etc. He served as Program Co-chair of MobiArch 2013 and Registration Chair of ICNP 2013. He was the TPC member of several international conferences and the reviewer of many journals. He is the principle investigator of three fundings from NSFC, and the co-principle investigator of a China-Europe international research staff exchange program. Dr. Li is a member of IEEE, ACM, and China Computer Federation (CCF). He was also the winner of the Best Paper Award of ICC 2009 and APNet 2018.

    Xiaoliang Wang received the Ph.D. degree from the Graduate School of Information Sciences, Tohoku University, Japan, in 2010. He is currently an Associate Professor with the Department of Computer Science and Technology, Nanjing University. His research interests include network system design and human–computer interaction.

    Cam-Tu Nguyen obtained her bachelor and master degrees from Vietnam National University in 2005 and 2008, respectively. She received her Ph.d. in information science from Tohoku, University, Japan in 2011. From 2012 to 2016, she worked as Postdoctor at Nanjing University, China. She has published several papers in leading journals and conferences such as IEEE TKDE, ACM Tweb, AAAI, IJCAI, CIKM. Her research interest includes Machine Learning, Natural Language Understanding.

    Sanglu Lu received her B.S., M.S., and Ph.D. degrees from Nanjing University in 1992, 1995, and 1997, respectively, all in computer science. She is currently a professor in the Department of Computer Science and Technology and the deputy director of State Key Laboratory for Novel Software Technology. Her research interests include distributed computing, pervasive computing, and wireless networks. She has published more than 100 papers in referred journals and conferences in the above areas. She was the principle investigator of many national fundings including National Key R&D Program of China, the National Natural Science Foundation of China, the Key R&D Program of Jiangsu Province, China, etc. She is a member of IEEE and ACM.

    View full text