MyDigitalFootprint: An extensive context dataset for pervasive computing applications at the edge

https://doi.org/10.1016/j.pmcj.2020.101309Get rights and content

Abstract

The widespread diffusion of connected smart devices has greatly contributed to the rapid expansion and evolution of the Internet at its edge, where personal mobile devices follow the behavior of their human users and interact with other smart objects located in the surroundings. In such a scenario, the user context is represented by a large variety of information that can rapidly change, and the ability of personal mobile devices to locally process this data is fundamental to make the system able to quickly adapt its behavior to the current situation. This ability, in practice, can be represented by a single elaboration process integrated in the final user application, or by a middleware platform aimed at implementing different context processing and reasoning to support third-party applications. However, the lack of public datasets that take into account the complexity of the user context in the mobile environment strongly limits the advance of the research in this field.

In this paper, we present MyDigitalFootprint, a novel large-scale dataset composed of smartphone embedded sensors data, physical proximity information, and Online Social Networks interactions aimed at supporting multimodal context-recognition and social relationships modeling. The dataset includes two months of measurements and information collected from the personal mobile devices of 31 volunteer users, in their natural environment, without limiting their usual behavior. Existing public datasets generally consist of a limited set of context data, aimed at optimizing specific application domains (human activity recognition is the most common example). On the contrary, our dataset contains a comprehensive set of information describing the user context in the mobile environment. In order to demonstrate the efficacy of the proposed dataset, we present three context-aware applications based on different machine learning tasks: (i) a social link prediction algorithm based on physical proximity data, (ii) the recognition of daily-life activities based on smartphone-embedded sensors data, and (iii) a pervasive context-aware recommender system. To the best of our knowledge, this is the first large-scale dataset containing such heterogeneity of information, representing an invaluable source of data to validate new research in mobile and edge computing.

Introduction

Nowadays we are living surrounded by a plethora of electronic devices that are equipped with a continuously increasing amount of both computational and networking capabilities. In particular, the computational resources of personal mobile devices (e.g., smartphones, tablets, and wearables) are comparable or sometimes exceed those of desktop computers of the past years. In addition, their multiple communication interfaces guarantee both a continuous Internet access and proximity-based communications opportunities. These conditions, along with the widespread penetration of Internet of Things devices (IoT) embedded in physical objects, contribute to a rapid expansion and evolution of the Internet at its edge, and leads to the raise of new paradigms for the future Internet [1]. For example, based on the observation that the edge of the Internet is mainly composed of personal mobile devices that follow the behavior of their human users, the Internet of People (IoP) paradigm calls for a radical change of the Internet, where personal mobile devices are no more considered as simple clients [2]. Indeed, in this envisioned scenario, they represent active elements of the new Internet that are able to forward and disseminate data within the network by exploiting their wireless equipment and self-organizing networks [3], [4], extending the users connectivity opportunities, including direct communications with other users and devices in proximity.

These aspects are paving the way towards new types of computing models, shifting most of the tasks from centralized architectures (e.g., remote servers and cloud-based computing) to distributed solutions, where the data available at the edge is directly processed by mobile devices [5]. This paradigm shift creates new opportunities for the creation of novel pervasive mobile applications (e.g., data dissemination algorithms [6], forwarding protocols [7], and personalized services [8], [9]), which benefit from low-latency direct communications and the sensing capabilities of modern mobile devices. Specifically, the great variety of sensors embedded in the personal mobile devices of the users provides essential information to recognize the context and the situation in which the user is involved, making context-awareness a real feature of new pervasive computing applications. A basic example can be the automatic configuration of the device based on the user’s current activity, while more complex context information must be analyzed to optimize networking and forwarding protocols to disseminate content according to users’ interests and social relationships.

Processing user data directly on the local device provides two main advantages. Firstly, it allows both the device and the applications to quickly adapt their behavior according to the changes in the user context. Even though the context recognition task can be performed on remote servers, data transmission delay may make the computation useless with respect to the service optimization, since the user could have changed her context in the meanwhile. Secondly, user privacy can considerably benefit from the use of such a decentralized approach [10]. Indeed, traditional client–server solutions may demotivate privacy-aware users to use context-aware services since a third-party entity will be in charge of storing and processing users data, and additional mechanisms must be employed to safeguard the user’s privacy [11]. On the contrary, shifting the computation from remote servers to the source of the context data (i.e., the user’s mobile device) allows the system to preserve the user’s privacy, avoiding the need of trusting external entities. In the last few years, such advantages have been found in different application domains, including IoT systems [12], healthcare [13], smart cities [14], and, more in general, context-aware and mobile applications [15], highlighting the need for a paradigm change.

However, to validate and evaluate the effectiveness of context-awareness in pervasive computing applications, the availability of context data collected in the wild becomes an essential requirement. Most of the public datasets available in the literature focus on few sensors data for inferring the user’s context. For instance, keeping into account only the accelerometer and gyroscope data is a common practice for recognizing simple human activities and transportation modalities [16], [17], while the GPS coordinates and the list of Wi-Fi access points are commonly used to infer the current location of the user and her social interactions [18], [19]. On the other hand, the user’s context in a mobile setting represents a more abstract concept, which requires the combination of heterogeneous sources of data that characterize not just the user’s activities, but also her behavior, her social interactions with other people, her daily-life situations and the surrounding environment. For this reason, in recent years researchers started to collect a wider range of smartphones and wearable sensors data. Several research studies in the area of context-recognition and human behavior modeling base their results on experiments performed in controlled environments (e.g., a research laboratory), with researchers instructing subjects to perform scripted tasks, generally using the same device [20], [21], [22]. However, lab results typically diverge from those obtained in real-world experiments, in which users may have different ways of performing the same activity, and devices are equipped with different types of sensors [23]. In addition, according to [24], to better represent the complexity of the real-world, context data should be collected in natural and realistic settings, satisfying the following in-the-wild conditions: (i) to represent the variety of available devices, the subjects should not be forced to use a foreign device, but they should use their smartphones; (ii) to address the variability in device wearing and placement, no restrictions on device usage should be defined; and (iii) the recorded data should represent the users’ natural behavior, thus they should not be instructed on how to perform the activities.

In this work, we present MyDigitalFootprint (MDF), a novel large-scale dataset that we collected from the personal smartphones of 31 volunteer users within a period of 2 months. Following the in-the-wild data collection protocol, we installed on the volunteers’ devices an Android sensing application that monitored a wide range of heterogeneous smartphone sensors in the background, without interfering with the user’s natural behavior. More specifically, the application continuously collected data from both physical and virtual sensors that can be used to characterize the different aspects of the user context in a mobile setting, including daily-life situations and social interactions with other people. Physical sensors refer to phone-embedded hardware (e.g., accelerometer and gyroscope) aimed at describing simple human activities (e.g., the user’s gait), while virtual sensors represent data sources that characterize the device status, the surrounding environment, and the interactions between the user and her device. In order to collect data that actually represents users’ daily-life situations, we defined no constraints related to the user behavior or the interactions with her device during the experiment. On the contrary, we encouraged the volunteers to use their smartphones as usual, without worrying about the positions of the device (e.g., trousers pockets, or hand) or the activities they usually perform during the day.

Nonetheless, daily activities represent only part of the user’s context. In this paper, we argue that taking into account information related to the user’s social relationships is fundamental to model the overall context in mobile environments. In fact, inferring the diverse social ties in the interpersonal network of a subject can lead to better recognition of her context, helping to precisely discriminate among different daily-life situations [25]. For this reason, during the sensing experiment, we also collected essential information to model the users’ social contexts: proximity data, and activities performed by the volunteers on Online Social Networks (OSN) platforms. Specifically, we collected proximity information through the use of wireless communication interfaces (i.e., WiFi-Direct and Bluetooth scans), which can be used to infer face-to-face interactions among the people. As far as OSN is concerned, they represent a rich source of data to characterize both the user’s preferences and her social relationships with other people in the virtual world, such as contents shared by the users, their reactions (e.g., likes), comments, and the list of followed public profiles. To the best of our knowledge, MDF is the first dataset presented in the literature that contains such heterogeneous data, aimed at modeling all the aspects of the user’s context, combining information generated in both the physical and the cyber worlds.

In order to demonstrate the utility and the efficacy of MDF, we propose three different pervasive mobile applications based on machine learning methods. Firstly, we propose a social interaction prediction algorithm based on physical proximity data collected by mobile devices. This represents a key feature to design effective data dissemination and forwarding algorithms for the edge of the Internet. Then, we define a context-recognition system that identifies the situation in which the user is currently involved, based on multimodal and high-dimensional sensor features. Finally, a pervasive context-aware recommender system for mobile devices is proposed. Implementing such a system directly on mobile devices allows to evaluate and automatically filter the contents discovered at the edge, providing personalized recommendations to the user, based on her current context and needs.

The entire dataset (i.e., 2.64 GB of data) is publicly available online,1 and researchers are encouraged to use it to develop and compare methods and algorithms in different research fields. Along with the anonymized dataset, researchers will also find the processed data to replicate the proposed pervasive applications.

As a summary, this work provides several contributions to the research areas of human behavior modeling and pervasive mobile computing, as follows:

  • a large-scale public dataset collected from commercial smartphones in the users’ natural environment, without setting any constraint to the users’ behavior;

  • the dataset is characterized by the unique feature of combining heterogeneous smartphone sensors data with information derived from Online Social Network platforms;

  • the use of machine learning algorithms to implement three different pervasive computing applications evaluated by using the MDF dataset: (i) prediction of physical interactions among the users, (ii) automatic recognition of the user’s context, and (iii) context-aware recommendations in mobile environments;

  • three different datasets extracted from MDF that has been already preprocessed to reproduce the proposed proof-of-concept applications.

The remainder of the paper is organized as follows. In Section 2 we provide an extensive review of the existing datasets collected from mobile devices in the wild. Section 3 describes the details of the data collection campaign and the experimental protocol used to collect the MDF dataset. In Section 4 we perform both a quantitative and qualitative analysis of the main data collected from the volunteers’ smartphones. Then, we rely on the presented dataset to design and evaluate three potential edge computing applications in Section 5. Finally, in Section 6, we draw our conclusions and present some directions for future work.

Section snippets

Related work

Most of the public datasets related to human behavior modeling has been collected in controlled environments and focuses on a limited number of smartphone-embedded sensors aimed at recognizing a predefined set of human activities and events (e.g., user’s gait or fall detection) [16], [26]. Instead, the number of available datasets for the identification of a wider user’s contexts, especially in outdoor environments, is limited. Specifically, after an extensive search in the literature, we

The MyDigitalFootprint dataset

To obtain a dataset that highly represents the mobile environment, we designed a data collection campaign following the in-the-wild protocol. To this aim, we enrolled a total of 31 volunteer users (6 females and 25 males), where the majority of them (i.e., 24 people) were students between age 14 and age 17 involved in a training period at research institutes, coming from high-schools of three different cities located in the Tuscany region of Italy (i.e., Pisa, Pontedera, and Livorno). The other

Data analysis

The presented dataset has been collected through the use of an Android application installed on the volunteers’ devices to monitor a wide range of physical and virtual sensors. Even though our application ran in an unobtrusively way (i.e., in the background) to collect the data without interfering with the user behavior, users were able to deactivate the data collection for privacy reasons. Fig. 4 shows the distribution of the total number of days in which our application has been used by each

Potential applications

The MDF dataset contains a great variety of information related to people’s daily activities and their social relationships in a mobile setting. This characteristic makes MDF an invaluable source of data to create intelligent systems specially designed for the edge of the Internet. In this scenario, the user context might rapidly change due to mobility, daily-life activities, environmental change or discovery of new services or devices in proximity. Consequently, the ability of mobile devices

Conclusion

In this paper we presented MyDigitalFootprint (MDF), a large-scale dataset collected from commercial smartphones that represents a rich source of data to characterize the user physical and social context at the edge of the Internet. Differently from datasets derived from experiments conducted in controlled environments, we performed a data collection campaign in-the-wild, guaranteeing a detailed representation of the real-world as seen by personal mobile devices. In fact, the dataset contains a

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work has been partially funded by the European Commission under H2020-INFRAIA-2019-1SoBigData-PlusPlus project. Grant number: 871042.

References (67)

  • A.S. Vincentelli, Let’s get physical: Adding physical dimensions to cyber systems, in: 2015 IEEE/ACM International...
  • Al-FuqahaA. et al.

    Internet of things: A survey on enabling technologies, protocols, and applications

    IEEE Commun. Surv. Tutor.

    (2015)
  • BasagniS. et al.

    Mobile ad hoc networking: Cutting edge directions, vol. 35

    (2013)
  • SatyanarayananM.

    The emergence of edge computing

    Computer

    (2017)
  • ZhaoY. et al.

    Survey on social-aware data dissemination over mobile wireless networks

    IEEE Access

    (2017)
  • HuiP. et al.

    BUBBLE rap: Social-based forwarding in delay-tolerant networks

    IEEE Trans. Mob. Comput.

    (2011)
  • EichingerT. et al.

    On gossip-based information dissemination in pervasive recommender systems

  • ShiW. et al.

    The promise of edge computing

    Computer

    (2016)
  • ChenD. et al.

    Data security and privacy protection issues in cloud computing

  • ChiangM. et al.

    Fog and IoT: An overview of research opportunities

    IEEE Internet Things J.

    (2016)
  • LiuY. et al.

    Intelligent edge computing for IoT-based energy management in smart cities

    IEEE Netw.

    (2019)
  • MaoY. et al.

    A survey on mobile edge computing: The communication perspective

    IEEE Commun. Surv. Tutor.

    (2017)
  • SuX. et al.

    Activity recognition with smartphone sensors

    Tsinghua Sci. Technol.

    (2014)
  • YuM.-C. et al.

    Big data small footprint: The design of a low-power classifier for detecting transportation modes

    Proc. VLDB Endow.

    (2014)
  • ChonY. et al.

    Automatically characterizing places with opportunistic crowdsensing using smartphones

  • MicucciD. et al.

    Unimib SHAR: A dataset for human activity recognition using acceleration data from smartphones

    Appl. Sci.

    (2017)
  • AnguitaD. et al.

    A public domain dataset for human activity recognition using smartphones

  • M. Shoaib, S. Bosch, H. Scholten, P.J.M. Havinga, O.D. Incel, Towards detection of bad habits by fusing smartphone and...
  • KerrJ. et al.

    Objective assessment of physical activity: Classifiers for public health

    Med. Sci. Sports Exerc.

    (2016)
  • VaizmanY. et al.

    Recognizing detailed human context in the wild from smartphones and smartwatches

    IEEE Pervasive Comput.

    (2017)
  • ShoaibM. et al.

    A survey of online activity recognition using mobile phones

    Sensors

    (2015)
  • CrucianiF. et al.

    A public domain dataset for human activity recognition in free-living conditions

  • MirskyY. et al.

    Sherlock vs moriarty: A smartphone dataset for cybersecurity research

  • Cited by (5)

    View full text