App trajectory recognition over encrypted internet traffic based on deep neural network
Introduction
The popularity of mobile applications (apps) is increasing dramatically in the past few years. People frequently use mobile apps for social interaction, online shopping, gaming, route navigation, enjoying music, watching videos, etc. According to the report [1], in the year of 2018, mobile apps accounted for 58% of worldwide Internet traffic and will continue to grow rapidly.
Due to the broadcast nature of wireless communications, mobile devices are susceptible to security and privacy risks. Malicious attacks such as sniffing may reveal users’ sensitive information [2], [3], [4], [5], [6]. For example, the traffic classification techniques [7], [8], by inspecting the headers (e.g., protocol type, IP address, port, etc) of the IP packets and the payloads, can infer the application types and the corresponding protocols (e.g., email, news, VoIP, etc). To enhance security, encryption techniques have been applied in different levels of the communication process [9]. For example, the Transport Layer Security (TLS) protocol has been widely used by many mobile apps to encrypt the application data to avoid the inspection of payload [10], [11]. The Internet Protocol Security (IPsec) protocol can be used to encrypt data flows between a pair of hosts. The Wired Equivalent Privacy (WEP) and Wi-Fi Protected Access (WPA) standard have been widely applied in wireless local area networks (WLANs) to prevent unauthorized access to the network. However, the recent researches [12], [13], [14] showed that, the information of mobile app usage can be inferred by examining the temporal-spacial patterns of the encrypted Internet traffic packets.
The works of app fingerprinting [15], [16], [17], [18] tended to establish unique features for app distinction. The features were extracted from the traffic level, code level, and system level. For example, NetworkProfiler [15] automatically generated network profiles for identifying Android apps according to the HTTP headers in the traffic level. AppDNA [19] inspected the function-call-graph to form app fingerprint in the code level. POWERFUL [17] fingerprinted mobile apps by analyzing their power consumption patterns in the system level.
Recently, several works focused on in-app activity classification [12], [13], [20] that aimed to recognize the usage of different services within a particular app such as Whatsapp. Fu et al. proposed a system for classifying service usages of mobile messaging apps by jointly modeling user behavioral patterns, network traffic characteristics, and temporal dependencies [12], and developed an online analyzer to improve feature extraction and achieve in-app activity classification in real-time [13]. However, the existing works only focused on identifying the activity within a particular app, and they lacked the ability to recognize both app and in-app activity in a fine-grained level.
In this paper1, we address a more challenging task: uncovering the trajectory of user’s mobile app usage from a continuous encrypted Internet traffic stream. Specifically, we focus on the app trajectory recognition problem: inferring which apps are used to conduct what activities by analysing the encrypted Internet traffic stream sniffed from a user. Fig. 1 illustrates an example that a malicious attacker sniffs the encrypted traffic of a user via a public access point. As shown in the figure, there is a clear pattern (e.g., the packet size, the packet interval, etc.) in the encrypted traffic stream when the user conducts different activities with different apps. By exploring the patterns, a well-designed algorithm can uncover the trajectory of mobile app usage in a fine-grained level. In other words, the described technology can be considered as a new form of attack: an adversary can sniff the encrypted traffic and infer user’s sensitive information such as “sending pictures with Skype and transferring money with PayPal”.
The conventional works of app fingerprinting and in-app activity classification cannot solve the app trajectory recognition problem directly. The reason and technical challenges are discussed as follows. Firstly, the conventional approaches were designed for recognizing either app or activity, but not both. The combination of apps and activities forms a more complicated classification task, which yields low recognition accuracy with the conventional approaches (as shown in the performance analysis in Section 6). Secondly, the conventional approaches used hand-crafted features for classification. The extraction of features heavily relies on human experience, and the hand-crafted features are not thorough enough to differentiate similar activities on different apps (e.g., text messaging with Skype and text messaging with WeChat), which leads to poor performance as shown in Section 6. Thirdly, to uncover the trajectory of app usage from a continuous encrypted traffic stream, a method is needed to correctly partition the traffic stream into segments representing different activities, which has not been well studied in the past.
To address these challenges, we propose ActiveTracker, a novel deep learning framework to uncover the trajectory of app usage from the encrypted Internet traffic stream. Inspired by the great success of deep learning techniques in different research areas such as street-level imagery [22], [23], [24], [25] and trajectory prediction [26], [27], [28], we exploit the ability of deep learning in automatic feature extraction from complex high-dimensional input signals and end-to-end learning to capture non-linear dependencies for app trajectory recognition. In the proposed deep learning framework, it first adopts a sliding window based approach to divide the traffic stream into a sequence of segments, where each segment corresponds to an app activity. The traffic segment is then normalized and represented by a temporal-spacial traffic matrix and a traffic spectrum vector. Using the normalized data as input, a deep neural network (DNN) model is proposed for activity recognition. Combining the recognition results of the sequence of traffic segments, the trajectory of app usage can be uncovered, which may lead to the leakage of sensitive personal information of the mobile users.
Table 1 highlights the differences of ActiveTracker and the existing works on app fingerprinting and in-app activity classification. The main results and contributions of this paper are summarized as follows:
- •
We design a novel sliding window based approach for encrypted traffic stream segmentation, which is able to accurately partition an encrypted Internet traffic stream into multiple single-activity sub-streams.
- •
We propose a DNN-based classification model for activity recognition from traffic segments. The proposed model uses convolutional neural network to extract features from the traffic segments automatically, and achieves high accuracy in activity recognition. To the best of our knowledge, we are the first to solve the problem of app trajectory recognition over encrypted Internet traffic streams.
- •
We conduct extensive experiments based on real-world Internet traffic collected from volunteers. The results show that the proposed approach achieves up to 79.65% accuracy in uncovering app trajectory from a long traffic stream. Our work will draw people’s attention to privacy protection of mobile app communications.
The rest of the paper is organized as follows. Section 2 summarizes the related works. Section 3 formulates the research problem of app trajectory recognition over encrypted traffic streams. Section 4 introduces our proposed solution in detail. Section 5 introduces our data collection method and the statistics of the collected traffic. Section 6 evaluates our solution framework with extensive experiments. Finally, we conclude our paper in Section 7.
Section snippets
Related work
We summarize the related work into three categories: Internet traffic classification, app fingerprinting, and in-app activity classification.
Problem formulation
In this section, we introduce the adversary model for recognizing app trajectory from encrypted traffic and formulate the problem of mobile app traffic segmentation and classification.
App trajectory recognition based on DNN
In this section, we first overview the framework of app trajectory recognition based on DNN. Then, we in detail introduce three major components of our framework: a novel sliding window based approach for encrypted traffic segmentation, a method for data representation, and a DNN classification model for app activity recognition.
Data collection and processing
To evaluate the performance of our framework, we conducted experiments based on both real-world collected smartphone traffic dataset and synthetic dataset. The dataset collection and precessing are described as follows.
Performance evaluation
To validate the effectiveness of our proposed framework, we conducted extensive experiments on the collected real-world traffic data as well as the synthetic traffic data. All experiments were performed on a single machine with Intel Core i5-6600 processors (4 cores / 4 threads), 8 GB of memory, and NVIDIA GeForce GTX 1070 GPU. To implement our proposed DNN model, we used the PyTorch library [53], [54].
Conclusion
In this paper, we propose ActiveTracker, a framework to recognize app trajectory over encrypted Internet traffic streams. First, the incoming Internet traffic of mobile apps is segmented into several single-activity subsequences by a sliding window based approach. Then, each traffic segment is represented by a temporal-spacial traffic matrix and a traffic spectrum vector. Using the normalized data as input, we propose a deep neural network (DNN) model to combine the features from different
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This work was partially supported by the National Key R&D Program of China (Grant No. 2017YFB1001801), the National Natural Science Foundation of China (Grant Nos. 61972196, 61672278, 61832008, 61832005), the open Project from the State Key Laboratory of Smart Grid Protection and Operation Control “Research on Smart Integration of Terminal-Edge-Cloud Techniques for Pervasive Internet of Things”, the Collaborative Innovation Center of Novel Software Technology and Industrialization, and the
Ding Li received his B.E. degree in computer science and technology from Beijing University of Posts and Telecommunications and his M.S. degree in computer science from University of California, Santa Cruz. He is now a Ph.D. candidate in the Department of Computer Science and Technology, Nanjing University. His research interests include wireless network measurement, mobile computing, and deep learning.
References (57)
- et al.
Powerful: mobile app fingerprinting via power analysis
IEEE Conference on Computer Communications (INFOCOM)
(2017) - et al.
Mampf: encrypted traffic classification based on multi-attribute Markov probability fingerprints
IEEE/ACM International Symposium on Quality of Service (IWQoS)
(2018) - et al.
Activetracker: uncovering the trajectory of app activities over encrypted internet traffic streams
IEEE International Conference on Sensing, Communication, and Networking (SECON)
(2019) - et al.
Social sensing from street-level imagery: a case study in learning spatiotemporal urban mobility patterns
ISPRS J. Photogramm. Remote Sens.
(2019) - et al.
Representing place locales using scene elements
Comput. Environ. Urban Syst.
(2018) - et al.
Novel feature selection and classification of internet video traffic based on a hierarchical scheme
Comput. Netw.
(2017) - et al.
An innovative approach for real-time network traffic classification
Comput. Netw.
(2019) - Mobile vs. desktop usage in...
- et al.
Peek-a-boo, i still see you: why efficient traffic analysis countermeasures fail
IEEE Symposium on Security and Privacy (S & P)
(2012) - et al.
Mobile application web API reconnaissance: web-to-mobile inconsistencies & vulnerabilities
IEEE Symposium on Security and Privacy (S & P)
(2018)
Study and mitigation of origin stripping vulnerabilities in hybrid-postmessage enabled mobile applications
IEEE Symposium on Security and Privacy (S & P)
Clickshield: are you hiding something? Towards eradicating clickjacking on android
ACM SIGSAC Conference on Computer and Communications Security (CCS)
Discovering and understanding the security hazards in the interactions between IoT devices, mobile apps, and clouds on smart home platforms
USENIX Security Symposium
Transport layer identification of p2p traffic
ACM SIGCOMM Internet Measurement Conference (IMC)
Accurate, scalable in-network identification of p2p traffic using application signatures
International Conference on World Wide Web (WWW)
Hidemyapp: hiding the presence of sensitive apps on android
USENIX Security Symposium
Classification of encrypted traffic with second-order Markov chains and application attribute bigrams
IEEE Trans. Inf. Forensics Secur.
Exploiting diversity in android TLS implementations for mobile app traffic classification
The World Wide Web Conference (WWW)
Service usage classification with encrypted internet traffic in mobile messaging apps
IEEE Trans. Mob. Comput. (TMC)
Effective and real-time in-app activity analysis in encrypted internet traffic streams
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD)
An unsupervised approach to modeling personalized contexts of mobile users
IEEE International Conference on Data Mining (ICDM)
Networkprofiler: towards automatic fingerprinting of android apps
IEEE International Conference on Computer Communications (INFOCOM)
Samples:self adaptive mining of persistent lexical snippets for classifying mobile application traffic
ACM International Conference on Mobile Computing and Networking (MobiCom)
Appdna: app behavior profiling via graph-based deep learning
IEEE Conference on Computer Communications (INFOCOM)
A multi-label multi-view learning framework for in-app service usage analysis
ACM Trans. Intell. Syst. Technol.
The visual quality of streets: a human-centred continuous measurement based on machine learning algorithms and street view images
Environ. Plann. B
Google street view: capturing the world at street level
IEEE Comput.
Mobility predictions for IoT devices using gated recurrent unit network
IEEE Internet Things J.
Cited by (16)
DISTILLER: Encrypted traffic classification via multimodal multitask deep learning
2021, Journal of Network and Computer ApplicationsCitation Excerpt :Some works have leveraged DL approaches specialized for time-series (Lopez-Martin et al., 2017; Yao et al., 2019). Also, reflecting the modern nature of traffic, most approaches have focused on ET (Wang et al., 2017; Zeng et al., 2019; Yao et al., 2020; Li et al., 2020; Dong et al., 2020). We report relevant works in these lines of research in Table 1, that summarizes the key aspects of each paper.
DataZoo: Streamlining Traffic Classification Experiments
2023, SAFE 2023 - Proceedings of the 2023 Explainable and Safety Bounded, Fidelitous, Machine Learning for NetworkingA Mobile Application-Classifying Method Based on a Graph Attention Network from Encrypted Network Traffic
2023, Electronics (Switzerland)
Ding Li received his B.E. degree in computer science and technology from Beijing University of Posts and Telecommunications and his M.S. degree in computer science from University of California, Santa Cruz. He is now a Ph.D. candidate in the Department of Computer Science and Technology, Nanjing University. His research interests include wireless network measurement, mobile computing, and deep learning.
Wenzhong Li receives his B.S. and Ph.D degree from Nanjing University, China, both in computer science. He was an Alexander von Humboldt Scholar Fellow in University of Goettingen, Germany. He is now a full professor in the Department of Computer Science, Nanjing University. Dr. Li’s research interests include distributed computing, data mining, mobile cloud computing, wireless networks, pervasive computing, and social networks. He has published over 100 peer-review papers at international conferences and journals, which include INFOCOM, UBICOMP, IJCAI, ACM Multimedia, ICDCS, IEEE Communications Magazine, IEEE/ACM Transactions on Networking (ToN), IEEE Journal on Selected Areas in Communications (JSAC), IEEE Transactions on Parallel and Distributed Systems (TPDS), IEEE Transactions on Wireless Communications (TWC), etc. He served as Program Co-chair of MobiArch 2013 and Registration Chair of ICNP 2013. He was the TPC member of several international conferences and the reviewer of many journals. He is the principle investigator of three fundings from NSFC, and the co-principle investigator of a China-Europe international research staff exchange program. Dr. Li is a member of IEEE, ACM, and China Computer Federation (CCF). He was also the winner of the Best Paper Award of ICC 2009 and APNet 2018.
Xiaoliang Wang received the Ph.D. degree from the Graduate School of Information Sciences, Tohoku University, Japan, in 2010. He is currently an Associate Professor with the Department of Computer Science and Technology, Nanjing University. His research interests include network system design and human–computer interaction.
Cam-Tu Nguyen obtained her bachelor and master degrees from Vietnam National University in 2005 and 2008, respectively. She received her Ph.d. in information science from Tohoku, University, Japan in 2011. From 2012 to 2016, she worked as Postdoctor at Nanjing University, China. She has published several papers in leading journals and conferences such as IEEE TKDE, ACM Tweb, AAAI, IJCAI, CIKM. Her research interest includes Machine Learning, Natural Language Understanding.
Sanglu Lu received her B.S., M.S., and Ph.D. degrees from Nanjing University in 1992, 1995, and 1997, respectively, all in computer science. She is currently a professor in the Department of Computer Science and Technology and the deputy director of State Key Laboratory for Novel Software Technology. Her research interests include distributed computing, pervasive computing, and wireless networks. She has published more than 100 papers in referred journals and conferences in the above areas. She was the principle investigator of many national fundings including National Key R&D Program of China, the National Natural Science Foundation of China, the Key R&D Program of Jiangsu Province, China, etc. She is a member of IEEE and ACM.