Abstract

Human behavior modeling in smart environments is a growing research area treating several challenges related to ubiquitous computing, pattern recognition, and ambient assisted living. Thanks to recent progress in sensing devices, it is now possible to design computational models able of accurate detection of residents’ activities and daily routines. For this goal, we introduce in this paper a deep learning-based framework for activity recognition in smart homes. This framework proposes a detailed methodology for data preprocessing, feature mining, and deep learning techniques application. The novel framework was designed to ensure a deep exploration of the feature space since three main approaches are tested, namely, the all-features approach, the selection approach, and the reduction approach. Besides, the framework proposes the evaluation and the comparison of several well-chosen deep learning techniques such as autoencoder, recurrent neural networks (RNN), and some of their derivatives models. Concretely, the framework was applied on the “Orange4Home” dataset which represents a recent dataset specially designed for smart homes research. Our main findings show that the best approach for efficient classification is the selection approach. Furthermore, our overall results outperformed baseline models based on random forest classifiers and the principal component analysis technique, especially the results of our RNN-based model for the all-features approach and the results of our autoencoder-based model for the feature reduction approach.

1. Introduction

Analyzing human routines in ambient and smart environments represents a growing research area due to its multiple scientific, engineering, and data-privacy challenges [1]. Smart homes are a typical example of these intelligent environments. They represent classic living houses equipped with multiple sensors measuring various modalities such as energy consumption, luminosity, lights status, doors opening, human movements, etc. [2]. These sensing technologies have opened new opportunities to understand humans’ behavior and model their routines. In the healthcare field, daily routines, describing daily human activities, are known as Activities of Daily Living (ADLs) [3]. For instance, cleaning, sleeping, eating, using stairs, cooking, and showering are some typical examples of these ADLs. Thanks to smart homes, it is henceforth possible to monitor resident’s ADLs, track their vital status, evaluate life quality, and enhance their well-being [4]. Indeed, one goal of these augmented environments is to assist elderly and disabled people to have decent and autonomous living [5, 6]. More generally, smart homes sensing technologies come with several benefits not only for persons with special needs. They help in optimizing energy consumption, guarantee house security, improve general comfort, and enhance entertainment applications within the house [7].

Our main challenge in this research is to build computational models to efficiently recognize ADLs in smart home livings. The long-term goal of this work is to build user-aware houses able to analyze humans’ behavior and respond to their daily needs. To reach this aim, our methodology in this research is to mine the complete process of ADLs detection task: from data preprocessing to the evaluation of many machine learning approaches. To ensure this deep mining of the detection task, we propose in this paper an original framework based on several data mining techniques and deep learning approaches. The suggested deep learning-based framework should enable smart processing of multimodal data, a deep analysis of human behavior subtleties, and an accurate detection of activity patterns. Compared to other works, it can be considered as a powerful guideline for researchers and engineers to process any ADL classification task, especially on similar datasets format.

The rest of the paper is organized as follows: the next section reviews the state of the art of ADLs recognition approaches. In Section 3, we propose a rapid introduction to the main deep learning approaches applied in this work. In Section 4, we introduce our proposed framework and its different compartments. In Section 5, we detail our experiments: we present the used dataset and how the proposed approaches in the framework were applied and implemented. All recognition results and main findings are discussed in Section 6. Section 7 summarizes our contributions and concludes the paper.

Activity recognition is a foundation stone of a myriad of real-world applications such as social robots [8, 9], personnel skills analytics [1012], sports analytics [13, 14], education analytics [15], group interaction analysis [16, 17], affective computing [18], human behavior understanding [19, 20], assisted living and healthcare [5], and security and well-being in smart homes [7]. For these applications, ADLs recognition is considered as a classification problem [4]. It consists of inferring resident’s activities based on the smart home sensors or cameras. In our research, we focus particularly on the sensors-based detection approach [21]. Two types of sensors are widely used: (1) wearable sensors such as smart watches and helmets [22] and (2) ambient sensors that are installed directly within the house or its compartments [23, 24]. Both sensors share the same objective of catching all possible human actions and interactions. Data generated from these sensors may suffer from several defects such as noise, redundancies, missing values, and other imperfections. Before any classification step, a preprocessing step is then crucial to deal with these problems [25] but also to unify data types and rescale numeral values. Researchers should also decide which features to use by either selecting a subset of features, extracting new features [26], or reducing the feature space [27]. Afterward comes the classification task, which consists of simply mapping the input features to the right output activity. At this step, several metrics should be used to control and evaluate the accuracy of this operation.

Initially, ADLs classification literature considered traditional approaches of machine learning [28] such as support vector machines (SVM) [5, 29], Naïve Bayes [30], random forests (RF) [23, 31], Ensemble approaches [32], and hidden Markov models (HMM) [3335] (see Table 1). However, in recent years, a major shift is observed since the rapid development of deep learning techniques [42]. In fact, conventional techniques require heavy handcrafted feature exploration based on researcher knowledge [43], which limits model development and extensions. Besides, they are inefficient in capturing highly complex activities composed of micromovements sequences and gestures [44]. On the opposite side, deep learning techniques can automatically infer and extract relevant features reducing then complex handcrafted operations. Moreover, recent results of these methods have shown unparalleled performance in many related fields to sequence classification [45] such as object recognition, speech recognition, natural language processing, and CyberSecurity [42, 46]. For all these reasons, deep learning has received a lot of attention in many recent works [3638, 4749] and represents, with no doubt, a promising approach for ADLs recognition and classification.

For instance, using available smart home datasets for activity recognition, researchers in [38] compared deep learning techniques such as convolutional neural networks (CNN) and Long Short-Term Memory (LSTM), to other traditional machine learning approaches. Experimental results show similar results from CNN and LSTM but both classifiers outperform other classic models. In [36], the authors addressed the classification problem by applying the Deep Belief Network (DBN) model on data recorded from resident’s wearable devices. The paper shows the effectiveness of the proposed approach compared to other classic approaches such as SVM (support vector machines). In [37], activity recognition for elderly people with dementia was explored using many types of recurrent neural networks (RNN). Obtained results showed promising and competitive performance compared to classic state-of-the-art techniques. Other works demonstrating the efficiency of deep learning approaches for ADLs detection task can be found in [3941, 4749].

Consequently, we have chosen in our proposed framework to apply multiple deep learning approaches well-known for their efficiency, such as MLP (Multilayer Perceptron), deep autoencoder, RNN (Recurrent Neural Networks), LSTM (Long Short-Term Memory), and GRU (Gated Recurrent Units). To compare these techniques in different configurations related to feature extraction and reduction, our proposed framework was carefully designed to ensure an exhaustive evaluation of all possible approaches. Indeed, the proposed deep learning framework englobes and mines the whole process of the activity detection task from data preprocessing to the application of deep learning models and their final evaluation. This framework was applied to an interesting dataset in the field of smart homes called the “Orange4Home dataset” [24]. The advantages of this recent dataset can be found in Experiments section. One challenge of our framework is to outperform the classification rates of baseline models computed on the same dataset. Actually, our main contributions can be listed as follows:(i)The introduction of a deep learning-based framework for activity recognition: this framework proposes a detailed methodology for data preprocessing, feature selection, and feature reduction as well as the training/testing steps for the chosen deep learning models. Note that this framework can be easily generalized and applied to other similar datasets’ format.(ii)The enhancement of recognition rates computed with baseline models based on random forest classifiers and the principal component analysis technique.(iii)The testing of multiple approaches to ensure a deep exploration of the feature space. These approaches are the all-features approach, the feature selection approach, and the feature reduction approach.(iv)The evaluation and comparison of several well-chosen deep learning techniques (MLP, autoencoder, RNN, LSTM, and GRU). From our findings, several conclusions were drawn concerning the best approaches for relevant ADLs detection.

In the next section, we start by briefly presenting the main concepts behind neural networks and deep learning.

3. Deep Learning Models

3.1. ANN

Deep Learning [42] is a branch of machine learning that relies on artificial neural network architectures (ANNs) [50]. ANNs are computational models that are inspired by the biological neurons of the human brain attempting to simulate similar information processing and task performing. ANNs are composed of multiple interconnected and successive layers. Each layer is represented by a set of artificial neurons called nodes. Each node is connected to the next layer nodes via links, the same way as natural neurons and their connections. Each link is represented by a numeric weight corresponding to the impact of each node on the next layer node. Nodes are typically organized in multiple successive layers. The first layer is called the input layer since it receives input data that will be injected into the network. The intermediate layers are known as the hidden layers and the last one is called the output layer. The output layer typology depends essentially on the network task whether it is regression or classification, etc. The basic form of ANNs is also known as Multilayer Perceptron (MLP) or Feedforward Network (see Figure 1).

Like the rest of machine learning models, MLP models can be learned using data samples. The training process in MLP consists of adjusting connections weights to better handle the desired task. In supervised learning, this training process is ensured by the backpropagation algorithm [50]. Technically it consists of minimizing a loss function that computes observed errors from the output layer. Then, with respect to the loss function, network weights are updated. MLP has shown to be a good technique in many classification or regression problems [51]; however, it may not be the most appropriate approach to handle sequential and temporal input data.

3.2. RNN, LSTM, and GRU

For time-series data, the most appropriate models are recurrent neural networks (RNN) [52] and their derivatives since they were particularly developed to model temporal and sequential data. In its basic form, RNN is a basic Feedforward Network in which a hidden layer receives, in addition to actual inputs, the outputs of the hidden layer calculated in the previous time step (see Figure 1). The recurrent hidden layer as shown in Figure 1 represents a temporal layer that allows the information to flow from one step to another. This information is known as the hidden state of the RNN. That is why it is often said that RNN, thanks to their hidden state, have a certain kind of memory that remembers what has been computed. However, when RNN processes long sequences, it shows more difficulties in retaining information from old steps. This problem is known as the vanishing gradient problem [53, 54]. The vanishing gradient problem causes the RNN to be a short-term memory network unable to learn long-term dependencies. To overcome these issues, two models were developed which are LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Units) [52].

LSTM and GRU are derived from RNN but present new internal mechanisms and gates concepts allowing for regulating the flow of information over time. These concepts allow retaining previous information as in RNN but with more efficiency in carrying relevant information from earlier time steps. For LSTM models, the core principles rely on carrying two pieces of information through the processing which are a cell state and a hidden state. Actually, an LSTM cell contains three gates: a forget gate to decide which information should be deleted or retained, an input gate to update the cell state, and an output gate to decide the next hidden state. For GRU, the process is simplified by letting down the cell state and retaining only a hidden state over processing. Moreover, it has only two gates: a reset gate and an update gate. The update gate decides how to combine the previous memory with actual input while the reset gate determines how much of the previous memory should be kept. More details on RNN, LSTM, and GRU can be found on [52].

3.3. Autoencoders

Another type of ANN is autoencoders [55]. An autoencoder (denoted as AE) is considered as a Feedforward Network similar to MLP since it contains at least three layers: one input layer, one or more hidden layers, and an output layer. “Deep autoencoder” notation is used when more than one hidden layer is used. The major difference with MLP is that the latter model tries to predict a target variable, while AE aims to reconstruct the input data with a minimum of errors. To achieve this goal, the input and the output layers must be identical in the learning phase. Moreover, an AE can be considered as a dimension reduction technique trying to extract an optimal compressed representation of the initial data. Concretely, an AE chains two main stages: the encoding stage and the decoding stage (see an example in Figure 2). In the encoding phase, the network reduces the number of nodes from the input layer to the middle hidden layer, called the bottleneck layer. This bottleneck layer represents the new compressed representation of the input data. In the decoding phase, the network increases the number of nodes from the bottleneck to the output layer to retrieve the initial input data. Encoding and decoding mechanisms have made AE approach a great data-driven technique for learning and extracting relevant features avoiding many troubles related to handcrafted features [56]. Furthermore, since neural networks apply nonlinear transformation between layers, AE has shown great performance for feature reduction compared to other state-of-the-art reduction techniques [47]. In the next section, we present our proposed framework in which we detail the different proposed approaches.

4. Proposed Framework

The proposed deep learning framework (see Figure 3) explores the whole process of the activity detection task. It was carefully designed to ensure an exhaustive evaluation of all possible approaches related to feature choice and the application of deep learning models. In fact, it proposes detailed guidelines for the following:(i)Data preprocessing(ii)Features to use(iii)Models building

Each of those steps is detailed in the following subsections. Note that this framework can be generalized and applied to any smart home dataset built over ambient sensors and devices.

4.1. Data Preprocessing

From a smart home setting, each sensor data could be stored in one file. Each file contains two columns: timestamp and sensor value. Time granularity is different from one sensor to another. Our preprocessing approach aims at merging all data sensors to one matrix in which row index is timestamp and column indexes are sensor labels. The output matrix should not contain any missing values and all values must be normalized. To reach this goal, we propose a specific preprocessing algorithm that chains essentially three big steps (see Figure 3):(i)Data fusion: this first manipulation is effectuated by the fusion of all timestamps from all sensors. Some missing values may appear since sensors do not share the exact timestamps.(ii)Filling missing values: many strategies are proposed to replace missing values with relevant ones. First, missing values should be replaced by the last valid observation from the past. Next, the remaining ones, especially in the beginning, are replaced by zero in case of numerical sensor and by the first valid observation in case of a categorical sensor. This way all missing values are suitably treated.(iii)Data normalization: for many machine learning models, it is recommended and even required in some cases to normalize data to a standard format. For this goal, a min-max scaler is firstly applied to numerical values. Secondly, Binarization is applied for two-category variables (e.g., “ON” and “OFF” are, respectively, replaced by 1 and 0). Lastly, hot encoding is applied for multiple-categories variables (i.e., three categories or higher).

Therefore, applying these three steps, we should come out with a homogenous numerical table in which all columns’ values are ranged between 0 and 1. These table columns represent the initial input features of the next steps.

4.2. Which Features to Use?
4.2.1. The All-Features Approach

Once data processing is complete, we should decide which features to use for the classification task. The first approach is to use all available features without any selection or reduction technique. Despite its basic form, this approach is quite interesting in the deep learning context. In fact, data transformations applied in the intermediate hidden layers can be interpreted as automatic features extraction computations. This process is known as the “Automatic Features Learning” process [56]. These extracted features are repeatedly tuned during the training phase, ending up in optimized and well-suited features for the classification task.

4.2.2. The Feature Selection Approach

The feature selection approach consists of using classic data mining methods to select from the initial number of features a fewer subset of features that are most relevant to the classification task. This approach is widely used by researchers to remove insignificant variables, simplify models, decrease training time, overcome overfitting issues, etc. In this work, we applied, in particular, the “Univariate Selection Approach” also known as the “F-ANOVA selection” (see Figure 3). The key idea of this method is to select the most relevant features by computing univariate statistical tests (ANOVA tests [57]) between each input feature and the target variable. Concretely, if we want k features, we compute ANOVA F-value score between each input and the desired output variable and then we select the best k features having the k highest ANOVA F-value scores.

4.2.3. The Feature Reduction Approach

The feature reduction approach consists of reducing the initial number of features by extracting new feature representations. The extracted features should summarize original data and guarantee to retain a maximum of information. Depending on the used approach, these new features can be a linear or nonlinear transformation of the initial data. In this work, our main approach for dimensionality reduction relies on deep autoencoder architecture. Please refer to Subsection 3.3 for more details on autoencoders. To compare the autoencoder approach, principal component analysis (also known as PCA), a state-of-the-art reduction technique, is also applied. Principal component analysis (PCA) [58] is a statistical approach that aims to map a set of potentially correlated features to a set of linearly uncorrelated features named principal components. Due to an internal mechanism of variance conservation, PCA transforms feature space to a more consistent one that allows reducing the number of features and conserving the original data variability.

4.3. Models Building

In our framework, many models are explored depending on the chosen approach for treating features (see Figure 3). First, for the all-features approach, we simply build the neural network over the complete dataset. The neural network may be a simple MLP, an RNN model, an LSTM model, or a GRU model. These models were particularly chosen since they represent the best state-of-the-art models for activity recognition as pointed by many literature reviews [45, 56]. Please refer to Section 3 for more details about these different neural network architectures. Second, for the feature selection approach, we begin by applying the F-ANOVA selection technique. Next, we build over selected features a neural network model which can be a simple MLP, an RNN model, an LSTM model, or a GRU model. Third, for the reduction approach, we start by building a deep autoencoder using the whole training data. At this step, we remind that the output layer is set to be identical to the input layer. After the training, we remove the decoding part from the AE, keeping only the encoding part. The last layer of the encoding part is the bottleneck layer from which are computed the new features. Indeed, the new reduced features are extracted by getting the output of the bottleneck layer. To finalize the model, we add a “softmax” output layer for the classification task and we retrain the model to fit the new architecture (see Figure 3). This latter training step is known as the fine-tuning step. Finally, all models are tested and evaluated using a multitude of classification metrics. In the next section, we present how our framework was implemented and applied to the Orange4Home dataset, depending on what feature strategy was used and which model was adopted.

5. Experiments

5.1. Dataset Presentation

For this work, the “Orange4Home dataset” [24] was chosen to test our framework for many reasons. Actually, many datasets already exist in the literature. However, many remarks and limitations are observed on these databases. Opportunity dataset [59] for instance is limited by the presence of only one room, the shortness of recorded sequences (∼30 minutes), and the large use of body-worn devices which makes the scenario unreal. Transfer Learning dataset [60] uses 3 houses for experiments but was limited by the number of recording devices and takes into account only 8 classes of activities. ARAS dataset [61] records 27 classes of activities over one whole month in two different smart homes. Activities were labeled in situ by the residents with high accuracy. However, the only use of 20 binary sensors limited the richness of recorded data.

In our work, the “Orange4Home dataset” [24] is chosen due to its numerous advantages and its well-designed protocol. It represents a promising benchmark for behavior studies and routines detection. This dataset is the fruit of a collaboration between INRIA (https://www.inria.fr/en/) and Orange Labs (France). Recordings were effectuated in a two-floor smart apartment, named “Amiqual4Home” (https://amiqual4home.inria.fr/), fully equipped with 236 heterogeneous sensors. Four successive weeks of work (9 hours per day, 5 days per week) were recorded resulting in nearly 180 hours of multimodal data and 24 activity classes labeled accurately in situ by the single resident of the smart house.

The total list of classes is presented in Table 2. Each class is labeled by its location followed by the activity label (e.g., Office|Cleaning). Note that gaps were observed in activity labeling when a transition occurs between two different activities. To deal with this issue, we added a novel class named “inter” activity, which has raised the total number of classes to 25.

Back to the experimental setting, the smart apartment was equipped with several types of ambient sensors generating various types of recorded data as detailed in Table 3 extracted and slightly modified from [24]. Several modalities were sensed ranging from environmental information such as weather variables to more local information such as indoor temperature, noise, doors states, lights states, presence, movements, electric and water consumption, etc. Thanks to the diversity of sensed data and the relevant labeled classes, the “Orange4Home” represents an appropriate benchmark to assess the different components of our framework, namely, the data preprocessing approaches and the deep learning models. In the next subsection, we detail how our framework was applied to this dataset to get relevant classification results.

5.2. Models Implementation

As previously mentioned, the first component of our framework is data processing. At this level, three steps are applied which are data fusion, filling missing values, and data normalization (see Figure 3). Applying these steps to the Orange4Home dataset results in a feature space composed of 258 normalized features, 25 activity classes, and a total of 224000 timestamp lines. Next, we should decide which features to use. To this end, three approaches were proposed: the first one selects the whole feature space before model building. The second one uses the F-ANOVA technique to select a subset of features, and the third one applies feature reduction techniques. Before presenting all models results, we remind that, for all approaches, 4 neural networks models were tested, namely, MLP, RNN, LSTM, and GRU. The MLP model is built upon two hidden dense layers in addition to input/output layers. The RNN model contains three layers besides input/output layers: one RNN layer followed by one dropout layer (to limit overfitting issues) followed by one dense layer. The same architecture is used for LSTM and GRU models (see Figure 4). This architecture showed to be empirically the most optimal for our dataset. Moreover, for all approaches and all models (MLP, RNN, LSTM, and GRU), the number of nodes in each layer was calculated as follows: (number of inputs + number of classes)/2. This calculation method showed to be empirically the best one for our dataset.

For the all-features approach, the number of used features for classification corresponds to the initial number of 258 features. For the selection approach, the best classification results were computed using exactly 50 selected features determined by the F-ANOVA technique. Similarly, the number of reduced features was 50 features. In fact, a deep autoencoder model was applied for the reduction approach. Besides input/output layers, the AE model contained three hidden dense layers with, respectively, 100, 50, and 100 nodes (see Figure 2). The input/output layers contained the initial number of features, i.e., 258 nodes. After the training, the decoding part is dropped out, and a “softmax” output layer (with 25 class nodes) is added just after the bottleneck layer, which, we remind, was composed of 50 nodes representing the reduced features. A fine-tune step is lastly computed to retrain the new model parameters. To compare the deep autoencoder results, PCA (principal component analysis) is also tested on our dataset using 50 principal components.

To test all models, 4-cross validation is applied since we have 4 weeks of recording data. In each round, three weeks were used for training and one was used week for testing. Final evaluation metrics represent then the average of the 4 iterations. To conclude this section note that data preprocessing steps, selection techniques, reduction methods, models training, models testing, and assessment metrics were all implemented using Python programming language with the support of well-known Data Science packages such as Numpy, Pandas, Scikit-learn [62], and Keras [63]. All manipulations were ensured by a 16 GB RAM PC characterized by an Intel® Core™ i7-8550U Processor. All results are presented and discussed in the next section.

6. Results and Discussion

6.1. Results of the All-Features Approach

For results comparison, we will essentially rely on two main metrics: the general accuracy and the F-measure which represents the harmonic mean of recall and precision [64]. These evaluation metrics are reminded in the below formulas. For these formulas, note that taking a specific class as a reference, TP refers to True Positive and represents the number of instances correctly assigned to that class. FP refers to False Positive and represents the number of instances incorrectly assigned to that class. FN refers to False Negative and represents the number of instances of that class but assigned to another class. TN (True Negative) refers to the number of instances of another class and correctly assigned to that class.

In the all-features approach, models were trained using 258 features (the whole feature space). After testing several baseline classic classifiers, the best results (92.22% for accuracy and 92.83% for F-measure) were computed by a random forest classifier [31] (which we denote here as the baseline-RF model). Our proposed MLP model, as well as the RNN model, has succeeded to enhance this result by giving, respectively, 94.26% and 94.20% for the accuracy and 94.73% and 94.74% for the F-measure. The rest of the tested models (i.e., LSTM and GRU) have led to closer results but with no significant amelioration as shown in Table 4. In fact, neural network models have given better results than RF since they have the advantage of the “automatic features learning” process. As explained before, extracted features from this process are repeatedly tuned during the training phase, ending up in optimized and well-suited features for the classification task. Moreover, looking at the computing times in Table 4, neural networks models have recorded, as expected, longer times than RF due to the complexity of these models. Besides having the best rates, MLP and RNN models present also better computing times compared to LSTM and GRU. For instance, for the testing time, the RNN model has given 3.71 sec versus 5.48 sec for the LSTM. Therefore, both MLP and RNN models present a good tradeoff taking into account all evaluation metrics in the all-features approach.

6.2. Results of the Feature Selection Approach

Similar to the all-features approach, after testing several baseline classic classifiers, the feature selection approach has given optimal results using a random forest classifier. Thanks to the F-ANOVA technique, the optimal number of selected features was 50 features, the accuracy rate was equal to 95.10%, and the F-measure was equal to 95.30% as shown in Table 5. Next, the same selection technique was coupled with our framework classifiers (MLP, RNN, LSTM, and GRU) resulting in the following new models: ANOVA-MLP, ANOVA-RNN, ANOVA-LSTM, and ANOVA-GRU. The baseline-RF model is denoted here as the baseline-ANOVA-RF. While a slight accuracy amelioration was recorded for the neural network models (especially 95.37% for ANOVA-MLP versus 95.10% for ANOVA-RF), the F-measure scores were very close for all models. For instance, the ANOVA-MLP classifier has given an F-score of 95.39% with no significant difference with ANOVA-RF results. The same finding was observed for the rest of the tested models (i.e., ANOVA-RNN, ANOVA-LSTM, and ANOVA-GRU). In fact, for neural networks, the internal process of automatic feature selection has given these models some advantage over the RF model in the all-features approach since no selection operation was applied at all. When applying the F-ANOVA selection technique, the internal process of feature selection seems to be unbeneficial for NNs models since here all classifiers share the same selection approach. Furthermore, the comparison between these two approaches (all-features and the selection one) shows that selection techniques have succeeded to improve classification performances for overall models. As a consequence, when building smart houses, it is not necessary to have full instrumentation and a high number of sensors for efficient activity detection. It is rather more optimal to precede with a smart choice of sensors number, sensors type, and their locations. This approach presents many advantages for reducing costs and technical issues and especially helps in resolving many challenges related to resident’s privacy since fewer sensors are needed to capture their movements. Last, for the computing times, similar results to the previous subsection were observed since recorded times were equivalent to the complexity of models; i.e., the more complex the model is, the longer the computing time is.

6.3. Results of the Feature Reduction Approach

In this work, our main approach for feature reduction is to apply the autoencoder method as described in Subsection 5.2. The resulting model for the classification task is denoted here as the AE-MLP model. But before exposing this model’s results, a baseline model based on an RF classifier coupled to a PCA reduction strategy (50 components from 258 features) has given a rate of 79.68% for accuracy and 81.30% for F-measure. This baseline model is denoted here as the baseline-PCA-RF model. Next, we have tested a PCA-MLP approach by combining a PCA reduction technique with an MLP classifier. Results were significantly enhanced from 79.68% to 86.58% for accuracy and from 81.30% to 88.04% for F-measure as shown in Table 6. Afterward, we tested our main proposed approach (the AE-MLP model) and results were significantly improved to reach an accuracy of 94.09% and an F-measure of 94.26% as shown in Table 6. Therefore, our proposed AE-MLP model has outperformed all other reduction models with relevant accuracy and F-measure classification rates. Moreover, the computing times of the AE-MLP model were close to the PCA-MLP model which supports the relevance of our proposed approach. This result confirms previous findings in literature showing advantages of autoencoders over PCA since autoencoders perform nonlinear transformations when compacting reduced features, contrary to PCA which performs simple linear calculations. Another advantage of the AE-MLP approach is the homogenous architecture of this model. In fact, both reduction and classification models are based on neural network layers and then combined to a single deep neural network model. This homogenous topology enables optimal training and fine-tuning at the same time of the two parts of the model, namely, the reduction part and the classification part.

7. Conclusion

\In this paper, we have proposed a deep learning-based framework to decide the best approaches for ADL classification in smart home. The framework was intended to ensure relevant data preprocessing steps, test all possible strategies to choose relevant sensors, and apply the best state-of-the-art models for classification. In fact, after preprocessing, three approaches were explored to determine optimal features, namely, the all-features approach, the selection feature approach, and the reduction feature approach. The F-ANOVA technique is applied for the selection approach, while a deep autoencoder (AE) is proposed as the main technique for the reduction approach. The last component of the proposed framework is the ADL recognition models. Four neural network models were chosen thanks to their proven discriminative power which are MLP, RNN, LSTM, and GRU. Our new framework was applied to the “Orange4Home” dataset which represents a promising dataset for smart homes research. As a result, our framework has shown significant improvement over baseline models computed on the same dataset. Actually, for the all-features approach, our RNN-based model has given the best F-measure rate (94.74%) compared to the baseline-RF classifier (92.83%). For the reduction approach, our AE-MLP model has largely succeeded to outperform the baseline-PCA-RF model (94.26% vs. 81.30%), thanks to its consistent architecture and training process. Furthermore, the best overall results were given by the feature selection approach suggesting that heavy installation settings in smart homes may not lead to optimal performances for ADL recognition. As a consequence, it may be more beneficial to limit the number of sensors and to choose smartly their types and locations. This recommendation represents a relevant approach to tackle many smart homes concerns related to interactivity, costs, and privacy. To conclude, we would like to emphasize that our proposed framework is designed to be easily generalized and applicable to any smart home dataset equipped with ambient sensors. It can be a useful and powerful guideline for researchers and engineers to process any ADL classification task. For the perspectives of this research, we intend to develop the framework to include other types of sensors such as wearable devices, audio sensors, and video-based sensors. Moreover, many extensions of this framework are studied to ameliorate recognition rates using incremental learning techniques and deep reinforcement learning approaches.

Data Availability

The author has used third-party data. More information about these data can be obtained from the reference Cumin J., Lefebvre G., Ramparany F., Crowley J.L. (2017) A Dataset of Routine Daily Activities in an Instrumented Home; Ochoa S., Singh P., Bravo J. (eds) Ubiquitous Computing and Ambient Intelligence. UCAmI 2017; Lecture Notes in Computer Science, vol 10586. Springer, Cham. https://doi.org/10.1007/978-3-319-67585-5_43.

Conflicts of Interest

The author declares no conflicts of interest.