1 Introduction

Semantic enrichment of location data is the process of transforming raw data collected from mobility tracking devices into behaviours [1]. These behaviours may express human activities, or they may be descriptions of non-human actions (such as animal behaviours or ship traffic and air navigation) [2]. The former can be derived from sources such as Geographical Positioning System (GPS) [3] and Call Detail Records (CDR) [4]; and has a multitude of applications [5], including extensive use for health and well-being [6].

This paper focuses on semantic enrichment of GPS data collected using smartphones since the majority of the population near-continuously carries a smartphone featuring a GPS sensor [7]. The enrichment process involves several sub-processes whose implementations are domain-specific [8]. For instance, segmentation is a sub-process that aims to divide GPS streams (a.k.a. trajectories) into episodes that serve specific application purposes. Some applications may split episodes based on their duration, while others may specify them based on the distance to previously determined points of interest. Consequently, different domains use different requirements to produce application-specific meanings of trajectories [9].

Smartphones trajectories reflect a naturalistic representation of human mobility and introduce unique semantic enrichment challenges. Smartphone-based GPS tracking is particularly problematic since individuals’ mobility do not necessarily represent constrained roads and can have more variable trajectories [10]. Additionally, data collection is negatively impacted by factors that are unique to smartphones. For instance, people can explicitly turn off sensors to prioritise the battery consumption [11, 12]. However, it is not always necessary that the collected data is an actual representation of mobility behaviour. This is because people are expected to leave or forget their phones in different places such as home or car [13]. Lastly, implicit factors, such as power management and software modules, degrade the resilience of the enrichment process and requires a more profound analysis of how each reason could hinder the semantic understanding of the raw data [14].

To enrich smartphones trajectories, we need to consider the above challenges in conjunction with requirements scattered across the literature of semantic enrichment. In this paper, we approach this goal through a structural framework based on a meta-analysis of our systematic review of the literature. We go beyond the mere introducing and surveying of the general knowledge related to the semantic enrichment operation to synthesising the findings into a structural model. Our analysis of the literature is human-centric that provides the following contributions:

  • We introduce a structural framework for enriching smartphone location data based on a systematic review of the literature. This framework presents a holistic and integrative view to help researchers plan the semantic enrichment requirements and address the smartphone-specific challenges. We synthesise findings scattered across the existing studies into workflows corresponding to the enrichment tasks. These workflows streamline the implementation of the enrichment process and facilitate the tracing of errors throughout the entire process.

  • We provide a systematic literature review of enriching smartphone location data. To the best of our knowledge, this is the first review that targets smartphone trajectories and organises the findings according to the semantic enrichment task. The reported results introduce the researchers with a comprehensive analysis of the state-of-the-art of each task and help them identify the characteristics and limitations of the existing methods.

  • We provide a planning strategy derived from the conducted review and the created model. We identify the Strengths, Weaknesses, Opportunities, and Threats of the existing studies according to the SWOT analysis framework. As a well-known planning framework, a SWOT analysis based on the review findings can help researchers better envisage the potentials of future contributions.

2 Background and related work

Traditionally, Geographical Information Systems (GIS) provide tools that analyse and understand spatial data [15]. These applications map longitude and latitude to place labels; and provide several functions that facilitate the users’ interactions with maps, such as location query and map edit. GIS systems use different methods to capture and store the large amount of locations’ meta-data they need to support their functionalities. Recently, due to the proliferation of mobile devices (e.g. smartphones), people are becoming primary data collectors for GIS data as they check in their visited locations [15].

Mobile devices, however, foster a new paradigm of spatial analysis centred around individuals’ behaviour [16]. In this paradigm, the spatial analysis of raw data is tightly coupled to high-level behaviour conducted by humans [17]. If a person moves from one place to the other, the captured raw data is enriched to answer human-centred questions such as: how long does the person stay, does the stay duration significant enough to be considered, what defines significance and how to decode that from data. These types of analysis go beyond the mere labelling of GPS data to build a semantic enrichment process that is human-centric.

This new paradigm is commonly discussed using the concept of trajectories and episodes. A trajectory is a continuous temporal stream of geographical coordinates collected from GPS sensors (such as smartphone-embedded GPS). The temporal boundaries of a trajectory are application-specific. Some applications are interested in daily behaviour, and accordingly, each trajectory record the mobility behaviour of one day. Other implementations may consider weekly or monthly behaviour and consequently define a trajectory. Episode is another commonly used concept which determines a segment of the trajectory (i.e. sub-trajectory) that represents a specific event. For instance, A daily trajectory may consist of home, work and walking episodes. A stay-point is a particular type of episode used to divide trajectories based on time and distance threshold. For instance, if the distance between adjacent points in a trajectory is less than 10 metres and the duration between the start and end of the adjacent points—that meet the distance constraint—is greater than 5 min, then the underlying segment is considered a stay-point. However, as we shall explore in this paper, decisions about thresholds values are application-dependent and impacted by the selected algorithm and the collection media.

Besides the basic concepts, the process of enriching raw data involves one or more of the following tasks to facilitate knowledge extraction: segmentation, annotation, and behaviour recognition (Fig. 1). Segmentation and annotation sub-processes are driven by the target behaviour and thus facilitate the mining of behavioural knowledge. For instance, if the target behaviour is walking, then the segmentation step divides location data into walking and non-walking episodes. Next, contextual data sources are consulted to associate episodes with places details (i.e. annotation). Most applications employ an external knowledge source to add context-specific data to raw coordinates [3]. We refer to these additional sources as a context data source (CDS). Foursquare—a geographical information repository—is a CDS example that maps a pair of longitude and latitude values to a place’s details such as name and category. Consequently, knowledge—such as the person’s preferences for walking (e.g. park, lake)—are extracted from the annotated trajectories.

Fig. 1
figure 1

Semantic enrichment of smartphone trajectories

Segmentation, annotation and behaviour recognition are not the only way of classifying studies related to semantic enrichment. Other studies related to semantic trajectories are classified into modelling, computation, and applications [18, 19]. Modelling class groups studies that focus on how GPS data is modelled and used in the database. Studies that focus on the segmentation and annotation of trajectories are assigned to the computation class. Lastly, studies of predicting or visualising behaviours that are derived from GPS data are classified under applications.

To this end, we recognise several studies that contribute to the goal of better understanding the challenges of enriching location data. Some of which partially address the enrichment processes [20], while others consider trajectories in a broader domain that include human and non-human trajectories [9]. Nevertheless, this is the first effort, to the best of knowledge, that systematically target smartphone-based trajectories.

In the next section, we introduce the general framework proposed by this paper. We articulate the main layers and components of the model. Then, since our framework is motivated by a systematic literature review, we explain the methods and analyse the results (Sects. 4 and 5) before we dive into the details of each component in our model (Sect. 6).

3 Design

We propose a layered and structural design to detail the semantic enrichment processes. Our work is built on a systematic literature review of enriching smartphones’ location collected in-the-wild. We expand processes in Fig. 1 to lay out the internal structure of each process as well as the interactions across processes. We map each one of those processes to a layer in the proposed framework and derive the details from the conducted review.

As the first task of the enrichment process, segmentation is the base layer of our model. Within this layer, we have three main components (Fig. 2). An input module that interfaces with the collection device and stores the movement logs according to the collection requirements. Off-device-based enrichment may have constraints for collecting and offloading GPS data that differ from the online-based enrichment [21]. The collected raw data are passed to the segmentation core, which manages the activities responsible for dividing the spatiotemporal stream. These activities include tasks such as data cleansing, compression and episode identification. Once the core unit produces the application-specific episodes, the validation step assesses the correctness of the extracted episodes by comparing them against the available ground truth. When no ground truth data is available, episodes extracted from other sources such as CDR or accelerometers can be compared against the ones extracted from the GPS sensor. Accordingly, the number of matches can determine the correctness of the extracted episodes. In the absence of data from these sensors, it may not be possible to validate the exact time of the extracted behavioural events (i.e. stay-points). However, it is still possible to know whether a person has visited a particular place, although we are not sure about the correct times of this visit.

Fig. 2
figure 2

Structural framework for enriching GPS trajectories

An episodes collection is generated from the segmentation layer and used as input to the annotation core component. Episodes in the collection may be represented by one or more GPS points. A stay-point is referenced by a longitude and latitude pair that represents the mean value of the multiple GPS readings within a boundary of d meters. In contrast, move-points contain multiple GPS references that form the route taken by an individual to travel from a stay-point to another. The core unit defines the annotation rules to filter out episodes that do not require annotation. For instance, if the application is interested in stay-points only, then move-points will be ignored during the annotation process. Accordingly, episodes are annotated either externally or internally using the appropriate CDS. Decisions about selecting the best candidates and the reliability of the semantic labels are made within the annotation core units.

The validation step assesses the accuracy of the annotation (according to the CDS selected by the core unit) and evaluates the impact of segmentation errors on the overall results. Measuring the accuracy can be done differently according to the experiment design and goals. For instance, in our previous work,Footnote 1 participants are asked to confirm the correctness of the detected and annotated stay-points. Accordingly, the number of corrected places are used to estimate the accuracy of the external CDS (Foursquare in this case). Also, participants can see the start and end time of the recognised events (i.e. stay-points) and report potential segmentation errors concerning the start and end of those events. This integrative evaluation enables a more comprehensive analysis of the results and enhances the ability to separate segmentation errors from the ones caused by the annotation process.

The annotated episodes are used as inputs to the core unit of behaviour recognition layer.Footnote 2 The implementation of this unit is tightly coupled with the application goal. Recognition of social anxiety [22] differs from the identification of user routine, and therefore they yield different implementation of the core component. The behaviour recognition layer also has a validation unit to measure the accuracy of recognising behaviour. Similar to the annotation layer, errors are either produced by the process of behaviour recognition or propagated from lower layers.

In this paper, we propose workflows for each one of the core units described above. These workflows are built on the insights extracted from the systematic review of the literature. Next, we explain how this review is conducted before we dive into the details of the workflows later.

4 Method

We conduct a broad review of the literature and adopt the PRISMA statement for reporting the systematic review of enriching GPS data collected via smartphone devices. To comply with the objective of understanding smartphone-specific requirements and challenges, we include studies if:

  • They use smartphones devices as the source of raw GPS data.

  • They analyse multi-day continuous real-world data. Short studies do not reflect a continuous and longitudinal data collection that can help understand daily behaviours of individuals.

  • They collect data continuously and unobtrusively (i.e. in a passive manner). Studies that require smartphones to be in a specific posture or attached to the participants’ bodies are excluded.

  • The movement data are collected using GPS sensors only. Studies of location data gathered by other means—such as location dairy delivered through smartphones or check-ins tweets—are excluded.

  • They analyse smartphone trajectories and are not restricted by specific conditions such as vehicle-only trajectories.

  • They are full papers written in English published before March 2020.

Guided by the above inclusion criteria, two researchers have reviewed the papers separately and selected the related papers. A second cycle of the review was conducted to resolve disagreements about the selected papers.

5 Results

We report the results of a cross-domain search using Google Scholar and two-domain specific searches in ACM and ScienceDirect. The search query and retrieved results are detailed in Table 1.

Table 1 Search query and returned results

We selected 21 papers that meet the inclusion criteria that we specified later. The details of the process through which these 21 papers were selected are illustrated in Fig. 3.

Fig. 3
figure 3

Summary of the literature systematic review

We classify the selected papers according to the semantic enrichment task. If a paper, for instance, focuses mainly on dividing movement records into episodes, then it is categorised as segmentation only. It is possible to have a paper that covers more than one process. In that case, the paper category would be based on the process order in the chain (e.g. segmentation and annotation classified as annotation). Figure 4 shows the distribution of selected studies across processes. Papers about annotation contribute the most to the enrichment process; whereas behaviour recognition and segmentation-specific papers are studied equally. However, \(84\%\) of the included papers refer to the segmentation process within the context of the papers’ main contributions.

Fig. 4
figure 4

Papers distribution across the three sub-processes of the semantic enrichment

Among the selected works, the most recent publication of a segmentation-only paper was in 2018 [23]. Between 2016 and 2018, 83% of the behaviour recognition papers were published. The first paper about annotation published in 2013, and since then, every year except 2016 has at least one annotation-related article (Fig. 5).

5.1 Duration and sample size

Papers vary in their studies duration with a minimum of 5 days and a maximum of 18 months. The mean and median of studies duration are 200 and 75, respectively. These statistics differ significantly according to the dataset property. Analysis based on public data set such as Lausanne campaign [24] and reality mining [25] has a mean and a median of 405 days; while these statistics change significantly to become 58 and 30 days for the mean and median respectively when those in charge of the experiment collect the data. However, process-based analysis of duration does not reveal any significant differences compared to the overall duration results.

Similar to the duration statistics, the analysis of the sampling size of the overall process is consistent across subprocesses. It ranges from 1 to 228 participants with a median of 9 participants and a mean of 37. Two studies have not specified the sampling size, and no study rationalised the determination of the selected size through statistical analysis such as power analysis.

Fig. 5
figure 5

Publications per year

5.2 Validation

In 71% of the papers, results are evaluated based on ground truths collected directly from the participants. In the absence of participants’ inputs, researchers substitute the ground with synthetic data (e.g. ask external assessors to predict the trajectory details and compare their generated results with human prediction). Two out of six papers on segmentation [23, 26] collect ground truth, while most of the annotation papers (78%) built a ground truth to evaluate their inferences. All papers about behaviour recognition analyse their results based on ground truths collected about the examined behaviour; however, they do not gather data about other sub-process to investigate the possibility of error propagation and how that may impact the accuracy of the behaviour recognition process.

The reported results can be divided into three categories (Table 2). The first one is descriptive results that explain and clarify the outputs based on the collected ground truth, mainly in terms of precision/recall or general statistics. The main theme of this category is the absence of results comparison in which the outputs are not compared with papers of a similar process or any other baselines. On the contrary, the second type of papers depends on a comparison that distinguishes its proposed method from a comparable process in the literature. Between the two categories, the third one is based, where ground truths and extracted features are modelled as a supervised learning task. The contributing factor under this approach is measured as to how feature engineering based on semantic enrichment techniques improves the classification task. Consequently, the results of various machine learning algorithms are compared based on baseline features and enrichment-based features but not against papers of similar interests.

Table 2 Papers distribution based on the category of the reported results

5.3 Study data

Lastly, 71% of studies conduct real-time experiments to collect location data. The remaining studies use public dataset collected longitudinally under natural settings. Also, 33% of the papers that reported the use of a public dataset did not provide details about their utilisation of data (e.g. if they use the entire dataset for evaluation or how they split evaluation/test folds when training models).

80% of the reported studies conduct an off-device analysis of the collected data to one or more of the semantic enrichment sub-processes. The annotation holds the most significant portion of the off-device analysis, with 90% of the papers consult external APIs to annotate episodes. Google, Foursquare and OSM APIs are the primary annotation providers reported by these articles. The on-device analysis starts to emerge recently (the first study was published in 2017) to improve data privacy and mainly tackle the segmentation process.

5.4 Summary of selected papers

In this part of the results, we summarise the selected papers in Table 3 to set the stage for the in-depth discussion reported in the next section.

Table 3 Summary of the reviewed papers (ordered first by the sequence of the workflow operations and then by the year)

6 Discussion

Motivated by the reported results, we extracted insights from each sub-process and accordingly present the details of the core units in the proposed framework. By doing so, we facilitate the planning of the enrichment process as well as the tracking of potential errors. In the subsequent sections, we describe the core units of the framework (Sect. 3) as task-based workflows. These workflows integrate the extracted insights per sub-process into a consistent set of steps to facilitate a proper semantic enrichment of smartphone trajectories. Later in this section, we provide a SWOT analysis of the extracted findings to help researchers identify and plan future directions.

6.1 Segmentation

In most cases, segmentation is the first process toward enriching GPS trajectories. It divides movement records into episodes that reflect behavioural units in the real world. Behavioural units are cognitive-driven segments that compose behavioural sequences. Trajectory segments represent behavioural units within the context of GPS data. Based on our analysis of the selected papers, we identify three main perspectives to segmentation, namely, segmentation base, segmentation algorithm and collection strategy.

6.1.1 Segmentation base

The segmentation base is the reference point that guides the segmentation process. It could be a behavioural reference, such as walking, or a statistical-based point inferred from calculations on movement records. Trajectories represent continuous behavioural units in real life captured through GPS devices [14]. Behavioural-based referencing implement top-down approaches to trajectory segmentation that divide GPS sequences based on the goal of a particular behaviour. If the motive is to find the places where a user prefers to stay, then stillness and movement are potential segmentation references that divide movement records to stay and move points. The choice of stillness and movement (i.e. behavioural references) and the variables (aka. hyperparameters) that identify those references (e.g. time and distance threshold) are decided according to heuristics and prior behavioural knowledge [8].

Table 4 Segmentation bases and behavioural references adopted by the selected papers

On the other hand, bottom-up approaches adopt a statistical mechanism to merge atomic segments and form a larger one consisting of statistically homogeneous state. An atomic segment is a small unit of the captured trajectory that is used as the building block of an episode. For each atomic segment, a feature vector is calculated, and a sliding window is used to compare segments by their underlying features such as duration or covered distances. If, for instance, a mobile device collects movement logs every 2 min, an atomic segment of 6 min would contain three data points. If we define the duration of the sliding window to be 30 min, then each sliding window would cover five atomic segments. Atomic segments within the same sliding window are sequentially compared based on a similarity measure of their feature vectors. Based on the similarity score, consecutive segments are merged if they are identified as ’similar’.

Table 4 summarises the papers in our review based on the segmentation reference. As general observations, stillness is the most common reference among studies that adopt top-down segmentation approaches. 98% of papers following this approach employ the covered distance and duration as the episode determinants, with one paper rely on the number of GPS readings inside the cluster instead of the duration to define the episode boundaries.

In contrast, bottom-up approaches focus more on the movement patterns and classifying episodes based on their movement status. Bottom-up approaches are built under the hypothesis of sampling rate regularity. They address sampling irregularities through data imputation; a process that aims to fill the frequency gaps in data collection. However, this leads to different issues related to the reliability of the imputation process and how errors may propagate through the entire process.

6.1.2 Segmentation algorithms

Statistical and behavioural referencing just set the guidelines for the subsequent processes. Each segmentation reference has several implementation options, and the selection among them is dependent upon other factors such as the application domain and the collection media. In this section, we discuss the various implementations from an algorithmic perspective.

We classify the segmentation algorithms into two classes, density-based and sequence-based. Density-based algorithms (e.g. DBSCAN and K-means) employ clustering techniques to group similar locations entries. As these are parametric algorithms that rely on hyperparameters to accomplish their tasks and compute items similarities, the type of segmentation reference determine the values of those hyperparameters. Topic modelling is another type of clustering that stems from the literature of natural language processing [36]. In this approach, point similarities correspond to latent topics and episodes are formed (i.e. clustered) based on their closeness to each topic.

Sequential algorithms preserve the temporal order of trajectories’ points during the process of generating segments. These algorithms study the relations between consecutive entries of movement records and rely on behavioural rules applied on spatiotemporal features embedded into trajectories. An example is a rule to define stillness behaviour applied to the distance between temporally adjacent points. If the geodetic distance of two points is close to 0, then a stillness behaviour is detected; otherwise, the person is moving, and a stay-point is defined accordingly.

All bottom-up approaches (Table 4) are sequential in nature as they adopt a sliding window to process the movement data sequentially. On the other hand, top-down approaches utilise both algorithmic types to divide trajectories.

Both algorithmic classes, however, are mostly built on the assumption that movement records are sampled at regular time intervals. Although this assumption may go well with some controlled implementations, it does not reflect a real-life smartphone-based collection of location data as we shall explain next.

6.1.3 Collection strategy

The third perspective, collection strategy, emphasises the crucial role of the collecting mechanism on the semantic enrichment operation. Different devices export different challenges to the process of collecting and processing trajectories. Smartphones, as the epicentre of this paper, introduce power optimisation techniques to increase the battery life, which in turn influence the sampling rates of GPS sensors. This shows why segmentation methods should consider how movement tracks are captured and sampled to improve their performance.

Generally, we discuss two types of sampling strategies for collecting GPS data. The first one is a time-based strategy that assumes a fixed and guaranteed sampling rate of collecting location’s data. Algorithms written under these assumptions do not have to deal with irregularities of sampling intervals as the device is configured to enforce the sampling constraints. An event-based strategy is a different approach in which recording GPS data is only triggered if predefined conditions are met. For instance, an app may be set to collect the GPS data only if the participant is connected to a WiFi network. Event-based strategy imposes an additional data preparation task to deal with sampling irregularities and potential data loss. However, unobtrusive observing may involve both strategies since built-in power optimisation techniques as well as interaction preferences impact the streaming of data.

Each one of these perspectives addresses one of the challenges specific to the enrichment process. Segmentation-base deals with the contextual ambiguity. By identifying the reference of the segmentation, we limit the scope of the possible outcomes and orient the operation based on the specified reference. Within the context of smartphones, segmentation algorithms should be designed to deal with the application-specific sampling rate challenges. If the time intervals regularity of recording GPS data is guaranteed, state-of-the-art density-based algorithms may fit well. However, if such regularity is not guaranteed, as in many naturalistic smartphone-based settings, then density-based algorithms are more likely to fail [8, 14]. This last point shows how the algorithmic and data collection perspectives interplay to enrich raw GPS data.

Figure 6 shows a workflow that we propose to illustrate the smartphone-based segmentation process. Before dividing trajectories, it is essential first to identify which segmentation reference—behavioural or statistical—is more appropriate to the enrichment objective. At the same time, the requirements of the collection process should be identified based on the device and application capabilities. Once the collection criteria and segmentation reference are decided, a pre-processing step is initiated to smooth and clean the collected data. This stage is essential as the collection strategy influences the expected noise, and therefore the applied pre-processing techniques may differ. The segmentation process and collection constraints drive the choice of the implementation algorithm. A density-based algorithm could be the right choice when the sampling rate is guaranteed, while sequential-based is more flexible when dealing with unexpected sampling rates. After applying the segmentation step, application requirements may require additional postprocessing of the resultant episodes (e.g. merging consecutive stay-point episodes if they are separated by a move-point that is less than 2 min long).

Fig. 6
figure 6

Workflow of the segmentation process

6.2 Annotation

Annotation is the process of assigning descriptive labels to behavioural units extracted from trajectories. The goal of this process is to bridge the semantic gap between raw location data and human cognition by naming the extracted episodes. This descriptive annotation refers to more than the segmentation-based driven annotation. For instance, stillness as a behavioural reference for the segmentation process implies the existence of two basic labels: move episode and stay episode. These two labels are embodied into the segmentation base and therefore do not provide additional knowledge. Semantic annotation of such trajectories would go beyond these built-in labels to include more descriptive data like the type of place (e.g. restaurant, café) or the purpose of visit (e.g. socialising, studying). In this article, we classify research as annotation-related if they target filling the semantic with information different than the one presented by the segmentation phase.

Within the context of smartphone-based trajectories, we found two main basics for the annotation process, namely, activity-base and land-based. The former aims to understand the activity performed within the episode’s boundaries and annotate the episode accordingly. If a person is having a meeting at a café place, then the corresponding activity is labelled as ‘meeting’. In contrast, the land-based method would have labelled the same episode as ‘café’ as its emphasis is on the land use of the property on where the episode takes place.

It is noteworthy that the primary affordance of the property may sometimes describe both the activity and the land-use, such as in the case of a dance club. Although the two approaches may seem to overlap in this case, their outputs differ according to the target user. If the episode is extracted from a trajectory of a worker in that dancing club; the activity-based annotation yields ‘working’ episode. Alternatively, customers’ episodes are annotated as “dancing” since they are expected to do so. This example shows why one approach cannot be substituted for the other.

Table 5 Annotation methods and contextual data sources (CDS) as reported by the selected papers
Fig. 7
figure 7

Workflow of the annotation process

Annotation source is another annotative aspect that considers the contextual data source (CDS) necessary to enrich movement trajectories. Traditionally, CDSs are classified as either external or internal sources. When the annotation data is retrieved from a remote conduit, that exists outside the phone such as Google or Foursquare spatial APIs, the source is considered external. Inputs to external sources are either single or multiple coordinates per episode based on the output of the segmentation process. Segmentation algorithms do not necessarily produce a single representation for episodes and consequently shift the burden of this task to the annotation phase. In that case, the label for each point within the episode is first retrieved from the external provider. Then a postprocessing task is initiated to select the representative label based on application-specific criteria.

Internal sources employ contextual data collected explicitly or implicitly alongside the GPS data. To annotate episodes based on explicitly collected data; users are required to annotate the extracted segments, and then a classification task is conducted to train a model that utilises additional features (e.g. temporal features) to predict the annotation. However, this approach requires users to update the extracted episodes regularly. Alternatively, sensor data collected passively along with location data, are used as a source for annotation. For instance, Wi-Fi labels may contain useful information such as the name or category of the place, which provides a valuable source for annotation. These contextual data are used by the annotation algorithm to predict the labels of the extracted episodes. Table 5 summarise annotation papers based on the discussed views.

Although the above perspectives suggest multiple methods to the challenge of filling the semantic gaps in the location data, only one paper [34] provides a mechanism to facilitate the evaluation of the external Geo-location provider. Nevertheless, none of the selected papers rationalised the selection of specific CDS nor provides a comparison or inter-reliability test of the accuracy of various Geolocation APIs, despite the significance of this matter.

Based on the above, Fig. 7 shows our proposed workflow that integrates the elements and perspectives of the annotation process. First, it is essential to identify the goal of the annotation task as it determines the details of the subsequent processes. Point of interests (POI) systems that aim to provide suggestions based on users’ preferences (e.g. preferred cuisine) may adopt a land-based approach to extract visited places and generate recommendations accordingly. On the other hand, behavioural informatics systems may focus more on annotating episode based on the underlying activity to serve their objectives. Once the goal is identified, labels are generated either internally or externally. Although external sources typically provide APIs to facilitate their functions, raw data may require additional pre-processing and manipulation to utilise these functions. The produced results are post-processed to select the best annotation candidate. This step may include synthesising data from several sensors (e.g. Wi-Fi and Bluetooth) to select the most probable description of episodes under consideration, or it may vote on the best candidates from labels provided by external annotators based on inter-reliability tests.

6.3 Behaviour recognition

Existing studies have addressed the behaviour and knowledge extraction from GPS trajectories. The application domains addressed by these studies shape their differences. Requirements for extracting knowledge from health-related applications differ from the ones within the context of marketing, for instance. Moreover, some of those researches reside outside the context of enriching raw location data. For example, instead of going through the process of transforming raw GPS data to semantically improved trajectories, an application may utilise check-ins data as input to the behavioural mining task. This approach does not address the challenges caused by the potential limitation of enriching raw data and how that may affect the knowledge extraction process. Therefore, to account for the influence of potential challenges inherited from other sub-processes (e.g. segmentation); in this article, we address the mining of behavioural knowledge that arise as a result of the semantic enrichment. Other location-based knowledge extraction studies lie outside the scope of this analysis.

Fig. 8
figure 8

Workflow of the behaviour recognition process

Accordingly, we find that the analysis granularity is the central aspect the distinguish smartphone-based behaviour identification methods. Episode-based behavioural analysis mine features related to the trajectory components and how these components—and their latent features—correlate with each other to form a behaviour. Trajectory components are the different episodes’ types that compose a trajectory. If a trajectory is segmented based on the stillness attribute of the embodied event, then stay-points and move-points are the components of that trajectory. Accordingly, episode-based knowledge extraction may study episodes of similar types, such as counting the frequency of similar episodes to get the number of visits to a specific place. The place in this instance represents a stay-point extracted from the collected trajectories. Alternatively, the knowledge extraction may target the inter-relations across different episodes type. In this case, multiple episodes’ types (e.g. stay-point and move-point) are investigated to determine behavioural phenomena such as preferred transportation mode (i.e. move-point features) for each visited place (i.e. stay-point).

Although episode-based approaches may study the temporal relation between sub-components of the trajectory, these methods do not preserve the full sequentiality of the entire trajectory. To clarify this idea, consider the example of extracting the preferred transportation mode for each place. Episode-based approaches would study the relationship between the episode representing the visited place and its surroundings to understand how a user moves to and leave the target place. With multiple stay and move points (i.e. places and transportation modes) reside on a single trajectory, a similar approach is conducted to extract knowledge. However, from an episode-based perspective, only the temporal aspect between the adjacent components is required by the analysis, as other sequential features (e.g. temporal sequence of two places) does not contribute to the learning process.

In contrast to episode-based analysis, the trajectory-based approaches extract knowledge encoded in an entire trajectory rather than its building components. Accordingly, the sequentiality of episodes is preserved to facilitate the mining of behavioural patterns. An example of this method would be the extraction of daily habits from multiple daily trajectories. In this scenario, the behavioural habits may be extracted based on aggregating similar trajectory and performing sequential pattern analysis.

Moreover, trajectory-based mining may target movement records co-located across multiple devices. One example would be a trajectory modelling to discover chasing behaviour from two smartphones. In this case, two trajectories are examined to decide whether a person is being followed by another person. This is also an example of inter-personal analysis that involves more than one person in the mining process.

Figure 8 concludes the proposed framework by depicting the workflow of the last semantic enrichment process. The first step is to identify the features of the target behaviour since this will impact the granularity choice, as explained above. Recognising episode-based behaviour has different requirements than trajectory-based. Once the granularity level is decided, the mining strategies vary according to the selected methodology. Rule-based and machine-learning approaches are possible mechanisms to achieve this goal.

6.4 Validation and error handling

The correctness of the outputs for each process in our framework is essential to the semantic enrichment validity. Therefore, studies related to semantic enrichment should be designed in a way that facilitates the understanding of how potential errors propagate across the framework. In this subsection, we discuss the design of a real-world experiment that we have conducted to extract personal interests from GPS data.Footnote 3 As part of the experimentation process, seven participants were asked to assess the correctness of the semantic enrichment processes. The collection period lasted for three months, and 200,000 GPS data points were collected.

To locate the errors of each layer’s processes, we provide a plugin within the study app to examine and correct the enriched GPS data (Fig. 9a). Each time a visit to a new place is detected (i.e. a stay-point), the participant receives a notification inviting them to confirm or correct the detected place. To validate the segmentation correctness; the start and end time of the visits are provided. Also, the names of the nearby places are shown if a participant decides to correct the label that is assigned to a detected stay-point. To support the analysis of errors related to behaviour recognition (in this case, behaviours of personal interests), we add a further plugin within the study app (Fig. 9b). This plugin presents an adaptation of an Interest/Enjoyment subscale that is widely used to assess interest associated with a given activity [44, 45].Footnote 4

Fig. 9
figure 9

Interfaces for the two developed plugins: a the Places plugin allows participants to examine and correct their annotated locations; b the IMI plugin presents participants with a set of validated questions that can be used to evaluate the correctness of the recognised behaviour

This design allows us to separate errors caused by a process such as annotation from errors caused by an algorithm intended to recognise behaviour from raw GPS data. For example, in the same work, the extracted places are analysed to extract behaviours motivated by personal interests. Without separating errors, the performance of the ranking algorithm could be impacted by the segmentation and/or the annotation errors. This is because the algorithm can classify wrongly identified stay-points as a potential interest. When we rely on the corrected data, the algorithm’s performance can better reflect its ability to recognise behaviours motivated by interests. This is a result of avoiding errors that propagate from segmentation and annotation layers.

6.5 SWOT analysis

To better benefit from this review findings in helping future research on semantic enrichment of GPS trajectories; we summarise and organise limitations and opportunities found in the selected papers into a SWOT analysis framework. SWOT framework is a decision-making technique used to identify Strengths, Weakness, Opportunities and Threats related to a specific application [46]. Researchers can use this tool strategically to analyse and plan their research through (i) embracing strengths and potential opportunities, (ii) addressing weaknesses and (iii) mitigating potential threats [47, 48]. We provide a planning strategy for semantic enrichment of smartphone-based location. Our implementation of SWOT derived from the conducted review and the created model. The presented analysis can help researchers better envisage the potentials of future contributions.

figure a

7 Conclusion

We propose a structural framework and planning strategy to streamline the semantic enrichment process of smartphone location data. Our work helps in understanding the challenges and limitations of the existing methods and how they interrelate within the entire process. Moreover, the layered approach and workflows facilitate the understanding of error propagation through the enrichment operation. Next, we plan to instantiate this framework with real-world smartphone data to examine the effectiveness of the proposed methodology in facilitating the analysis of mobile-specific challenges.

Future reviews can be conducted on smartphone-based digital phenotypes such as device usage and notifications. These reviews could study the extraction of behavioural units from other smartphones’ sensors and organise the involved process in a human-oriented manner. Collectively, this work and the suggested reviews on streamlining the processes of extracting human behaviour from digital phenotypes can improve the human-centric research based around smartphone’s longitudinal data.

figure b