Elsevier

Computer Networks

Volume 183, 24 December 2020, 107618
Computer Networks

A regression framework for predicting user’s next location using Call Detail Records

https://doi.org/10.1016/j.comnet.2020.107618Get rights and content

Abstract

With the growth of using cell phones and the increase in the diversity of smart mobile devices, a massive volume of data is generated continuously in the process of using these devices. Among these data, Call Detail Records, CDR, is highly remarkable. Since CDR contains both temporal and spatial labels, mobility analysis of CDR is one of the favorite subjects of study among the researchers. The user next location prediction is one of the main problems in the field of human mobility analysis. In this paper, we propose a regression framework to predict next locations of users of cellular operators. We propose domain-specific data processing strategies and design a deep neural network model which is based on recurrent neurons and performs regression tasks. Using this framework on real-world data, we show that the error of the prediction decreases up to 74% in comparison to the traditional location prediction models. The results of this paper can be helpful in many applications from urban planning and digital marketing to predicting the spread of pandemics.

Introduction

Location prediction is a problem in the area of human mobility analysis which has drawn the attention of data scientists and machine learning researchers to itself during the past decade. Despite the simplicity of the problem description, the solution is expected to be complicated due to its dependence on many factors such as the environment, variability in users’ habits and their locations, and different formats of the data.

Location prediction problem could be described using the available data for users’ locations and the environment in which we need to predict the next locations of these users. Such data usually contain historical records that include information about time, location, user characteristics, and the infrastructure that collects the location information.

The prediction of human mobility has a wide variety of applications and many real-life use cases. Knowledge about the places each user may traverse through would supply valuable information to use in urban planning, country infrastructure management, and public transportation organization. Smart advertising and businesses would hugely benefit from such predictions. There are also many different domains like social and cultural researches, disease epidemiology studies, criminology investigations, cellular network infrastructure administration, etc. that could benefit from the information provided from human mobility predictions [1].

With the growth of smart mobile devices that are spread geographically apart in the environment, gathering human mobility data has become much easier compared to traditional methods like surveys or census which suffer from the static or low resolution spatial and temporal information. There are various sources of data used by researches in the field of location prediction. For example, Lenormand et al. [2] has done a cross-check analysis by comparing results obtained from different sources of data (cell phones, Twitter and census) and compared the levels of correlations between these sources in three different aspects of spatial distribution of population, temporal evolution of people density, and mobility patterns of individuals.

The most common data resource that has been used in human mobility researches is mobile devices. These data might be collected from GPS-based navigation applications, social networks which work over the internet, or cellular networks infrastructure.

Cellular networks possess valuable data like call detail records (CDRs) of the users, accounting information, and infrastructure information which can play an effective role in human mobility analysis. CDR is a type of metadata which describes users’ activities in a cellular network. CDR data are commonly used for the purpose of billing users, value-added services, and network maintenance and optimizations by cellular network operators and infrastructure maintainers. However, having both spatial and temporal information about users has also made CDR a good resource for analyzing human mobility.

In this paper, we propose a framework which consists of a recurrent neural network regression model for predicting users’ next location based on the spatio-temporal information in CDR records. What distinguishes this solution from traditional models proposed for location prediction could be stated as follows:

  • We propose a practical processing framework which defines modules and model to handle the data from raw CDR to accurate predictions.

  • The proposed framework uses domain-specific data preparation methods for real-world data to form meaningful user trajectories which increase the performance drastically.

  • The prediction module of this framework uses geographic coordinates instead of domain-independent semantic labels for locations in the process of learning and prediction. This approach resolves the sparsity issue, and copes with possible errors in the CDR data.

In order to compare the performance of our proposed framework with existing models, we implemented two baseline models and a common recurrent neural network classification model that have been widely used in the field of location prediction. We test these four models/frameworks on a real-world dataset, which includes CDR data collected from 12 users over a period of two years by one of the largest mobile phone operators in Iran, and compare the results with respect to different metrics of performance.

The remainder of this paper is organized as follows. We first provide an overview of the related works in the field of human mobility prediction in Section 2. Section 3 introduces the dataset and Section 4 formulates the problem. In Section 5, we discuss traditional models and in Fig. 6 our prediction framework is introduced. We explain the experiment results in Section 7, and finally Section 8 concludes the paper.

Section snippets

Related work

The seminal paper of Song et al. [3] explored the limits of predictability in human mobility. They measured the entropy of users’ visited locations and found a 93% potential predictability across their dataset which includes 50,000 individuals’ mobility data in 3 months collected by mobile carriers for billing purposes. It is also showed that this limit of predictability is not much variable for users with different distance coverage. Also in [4], the upper bound of human mobility

Dataset description

Call Detail Records (CDR) is a collection of records consisting of information about users’ activity in cellular networks. CDRs are usually used for the purpose of billing, cellular infrastructure maintenance, and resource management by service providers.

A CDR often includes source and destination phone number, date and time, base transceiver station (BTS) IDs, device information of the parties, type of the communication (call, message, data packets, etc.), duration, and network operator IDs of

Problem formulation

The location prediction problem is about getting information of the users and the history of their trajectories and propose a location or a sequence of locations as the prediction of next places that users may visit in the time ahead. This information might contain temporal and spatial labels of the visited cells, points of interest, frequencies of the visited locations, the gap time between adjacent recorded locations, etc.

Definition 1

Location Sequence: Each user’s historical location sequence can be

Traditional models

In order to compare the results of our proposed framework with the existing and traditional models that have been used to predict the next location of a user, in this section we introduce these models thoroughly.

The framework

Our framework consists of 5 units. The whole framework receives new updates in the format of CDR data. At the first step, raw CDR data are processed using the Cleaning Unit. This unit performs primary and generic cleaning of the data in order to fix common flaws that may exist in the available raw CDR data. The cleaning unit then passes the cleaned data to the Profiling Unit. This unit separates the cleaned CDR records for each of the users, add the extension labels like l, lat and lon and then

The experiment

In the experiment, we compare three methods that we discussed earlier with the proposed framework of this paper. The dataset is an anonymized CDR collection of 12 users in a period of 1.5–3 years. The summary of activities for each of the users is showed in Table 1.

We perform the experiment by splitting each user’s data to the train and test part with the corresponding proportion of 50%–50%. The first portion of the data is used as training data and the second half used as testing data. For

Conclusion and future work

In this paper, we proposed a framework to predict the next location of users in a cellular network. This framework compensates the drawbacks of the sparsity in CDR data by a novel preparation method, resolves the limitations of traditional classification models for this problem, and proposes a unified process from raw CDR data to next location prediction. The proposed data preparation method is based on how cellular networks register their users’ trajectories. Another prominent contribution in

CRediT authorship contribution statement

Mohammad Saleh Mahdizadeh: Read and approved the final manuscript. Behnam Bahrak: Read and approved the final manuscript.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Mohammad Saleh Mahdizadeh received both the Bachelor’s Degree (2016) and the Master’s Degree (2019) in Computer Engineering from the University of Tehran. He is now a Ph.D. student at the University of Tehran. His main areas of research interests are Applied AI&ML, Data Mining and Social Network Studies.

References (35)

  • IlarriS. et al.

    An approach to process continuous location-dependent queries on moving objects with support for location granules

    J. Syst. Softw.

    (2011)
  • BlondelV.D. et al.

    A survey of results on mobile phone datasets analysis

    EPJ Data Sci.

    (2015)
  • LenormandM. et al.

    Cross-checking different sources of mobility information

    PLoS One

    (2014)
  • SongC. et al.

    Limits of predictability in human mobility

    Science

    (2010)
  • KulkarniV. et al.

    Examining the limits of predictability of human mobility

    Entropy

    (2019)
  • ParentC. et al.

    Semantic trajectories modeling and analysis

    ACM Comput. Surv.

    (2013)
  • LengY.

    Urban Computing Using Call Detail Records: Mobility Pattern Mining, Next-Location Prediction and Location Recommendation

    (2016)
  • ChenN.C. et al.

    Comprehensive predictions of tourists’ next visit location based on call detail records using machine learning and deep learning methods

  • HadachiA. et al.

    Cell phone subscribers mobility prediction using enhanced Markov chain algorithm

  • KelesI. et al.

    Location prediction of mobile phone users using apriori-based sequence mining with multiple support thresholds

  • DashM. et al.

    Next place prediction by understanding mobility patterns

  • GomesJ.B. et al.

    Where will you go? mobile data mining for next place prediction

  • KaratzoglouA. et al.

    A seq2seq learning approach for modeling semantic trajectories and predicting the next location

  • J. Feng, Y. Li, C. Zhang, F. Sun, F. Meng, A. Guo, D. Jin, Deepmove: Predicting human mobility with attentional...
  • CuttoneA. et al.

    Understanding predictability and exploration in human mobility

    EPJ Data Sci.

    (2018)
  • GambsS. et al.

    Next place prediction using mobility markov chains

  • MathewW. et al.

    Predicting future locations with hidden Markov models

  • Mohammad Saleh Mahdizadeh received both the Bachelor’s Degree (2016) and the Master’s Degree (2019) in Computer Engineering from the University of Tehran. He is now a Ph.D. student at the University of Tehran. His main areas of research interests are Applied AI&ML, Data Mining and Social Network Studies.

    Behnam Bahrak received his Bachelor and Master degrees, both in Electrical Engineering, from Sharif University of Technology, Tehran, Iran, in 2006 and 2008, respectively. He received the Ph.D. degree from the Bradley Department of Electrical and Computer Engineering at Virginia Tech in 2013. He is currently an Assistant Professor of Electrical and Computer Engineering at University of Tehran.

    1

    All authors have contributed equally to the paper.

    View full text