Abstract

Nowadays, recommender systems are used widely in various fields to solve the problem of information overload. Collaborative filtering and content-based models are representative solutions in recommender systems; however, the content-based model has some shortcomings, such as single kind of recommendation results and lack of effective perception of user preferences, while for the collaborative filtering model, there is a cold start problem, and such a model is greatly affected by its adopted clustering algorithm. To address these issues, a hybrid recommendation scheme is proposed in this paper, which is based on both collaborative filtering and content-based. In this scheme, we propose the concept of time impact factor, and a time-aware user preference model is built based on it. Also, user feedback on recommendation items is utilized to improve the accuracy of our proposed recommendation model. Finally, the proposed hybrid model combines the results of content recommendation and collaborative filtering based on the logistic regression algorithm.

1. Introduction

With the rapid development of the Internet, users can enjoy rich information services and convenient social interaction through Internet applications [1]. However, the information overload problem in Internet applications is becoming more and more serious, which makes it difficult for users to choose what they really like. Therefore, various recommendation models are widely used to help users locate information. In general, these popular recommendation models can be divided into collaborative filtering, content-based, and hybrid approaches. The collaborative filtering method [24] is based on the view that the higher the similarity between users, the more the overlapping of user preferences. The content-based approach [5, 6] is based on representations to recommend items, and these representations are usually extracted from descriptions. It is necessary to calculate the similarity between item representations and user profiles. The hybrid approach [7] generates recommendations by combining several other methods. This approach is based on the idea that the hybrid method should take advantage of other approaches and avoid the disadvantages of each approach to achieve better recommendations.

Among these approaches, collaborative filtering is used widely in the field of e-commerce. Content-based performs well in recommending blogs, news, and documents. Generally, the collaborative filtering has a better performance than the content-based model. Note that the better performance is based on sufficient user information, including personal information and behavior information. Moreover, the collaborative filtering model usually suffers from the cold start problem due to a lack of adequate rating records [3, 5], case in which the content-based model seems to be an alternative approach. However, this approach has its limitations. For instance, users’ profiles cannot be accurately acquired because of a lack of sufficient user behavior information [2, 6]. Another issue is that the content-based approach is slow to perceive the change of user preference. In fact, the user’s interest usually changes with time. For instance, a fashion windbreaker suitable for autumn is launched. A user just has such a demand and pay attention to it for a while. But after autumn, he may no longer need such clothes, and the relevant behavior records are remained in the recommendation model and affect the recommendation results. This is because that the content-based model only depends on users’ preferences for certain items in the past, and the recommendations generated by CB will be similar to those that users used to like.

The hybrid recommendation model provides a new approach to solve the above issues. Different kinds of hybrid recommendation models have been proposed for Internet applications [8, 9], such as weighting results of different recommended techniques, using a switching mechanism (i.e., this mechanism changes and adopts different recommendation technologies according to the background and actual situations of problems). However, there is no research considering the impact of time on user preferences in the hybrid model. Based on our experience, the time factor has a significant impact on user preferences, and these preferences are usually changing with time. Thus, we should pay attention to the time factor that affects user preferences. Moreover, we can improve the recommender system by utilizing feedback from users. Therefore, we propose a novel hybrid recommendation model, which contains 3 key points:(1)For building user profiles, we use the time factor as a weighted basis for selecting behavior records. In this selection process, we will focus on the generated behavior records that are closer to the current time. Thus, points of interest that are contained in the selected records should be given priority.(2)A feedback mechanism should be introduced into the model, which can be used to establish feedback libraries based on the user’s feedback records (e.g., click rate and browsing duration) of previous recommendations. Furthermore, the recommendations of our hybrid model should be filtered by the feedback mechanism to improve the accuracy of our model.(3)Spectral clustering algorithm is used to improve the efficiency of collaborative filtering.

Finally, we use the logistic regression method to aggregate the recommendations from content-based and collaborative filtering. The remainder of the paper is organized as follows. In Section 2, the related works are presented. Section 3 provides the background necessary to understand the proposed scheme, such as the preference representation and the spectral clustering algorithm. In Section 4, we describe the proposed scheme in detail, including the definition of time impact factor. In Section 5, we introduce the experimental environment and analyze the experimental results. Finally, the conclusion and future work are introduced in Section 6.

The main purpose of recommendation models is to provide helpful and suitable items for users. The traditional recommendation approaches, including collaborative filtering, content-based, and knowledge-based ones. Specifically, these approaches mainly focus on the fields of online news, social media, online advertising, and e-commerce. However, a single recommendation approach may not perform well, and it is difficult to collect detailed enough user behavior records for privacy concerns. Therefore, more and more attention has been paid to the hybrid approach and a lot of studies have been carried out recently.

The early hybrid model is mainly used to improve collaborative filtering. In 2005, Li et al. [10] present a hybrid model of collaborative filtering based on items and users. This model combines both item-based and user-based collaborative filtering. The similarity calculation between active users and other neighbor users is based on other items related to prediction items, not on all items. Researchers consider introducing various auxiliary information into the recommendation model to build better hybrid models. For instance, the studies in [1113] introduce a hybrid model, which utilizes user-based similarity, POI-based (Point-Of-Interest) similarity, and geographic information to recommend tourist spots. Zheng et al. [14] designed a hybrid trust-based model. This model is applied in the field of online learning, which deals with the issue of data sparsity by incorporating two trust relationships into algorithm computation. To mine more implicit information in hybrid models, in [15], a Bayesian network model combining content-based and collaborative filtering is proposed, and the Bayesian network is used to calculate the joint probability distribution of user access time and resource information to obtain the user’s interest of the provided resource. In [1618], the concept of group recommendation is proposed, Boutilier et al. [16] developed probabilistic inference methods for predicting individual preferences given observed social connections. Sun et al. [18] proposed a social-aware group recommendation framework that jointly utilizes both social relationships and social behaviors to not only infer a group’s preference but also model the tolerance and altruism characteristics of group members. In [19], a time-aware hybrid model is proposed for topics in microblogs. Since hot topics of microblogging communities change quickly with time, it is necessary to recommend time-sensitive topics. Such a model combines a content-based approach and a time-aware component to find latent topics.

Artificial intelligence technologies provide a new perspective to improve the hybrid model. In [20], a system that uses a sentiment analysis approach to classify user’s keywords or ratings as positive and negative is proposed. This system will recommend items to users that match their emotional tendencies. In the work of [21, 22], knowledge graphs are mainly integrated into the recommendation generation process as a dataset with rich semantics. In [23], a hybrid model that builds a graph-based latent factor model is proposed. This approach combines the strength of latent factorization with graphs. In [24], a collaborative deep learning model is proposed. This model applies the deep neural network and convolution neural network to extract the hidden feature vectors of users and items with sparse ratings to build the rating matrix. Yu et al. [25] proposed a multilinear interactive MF (matrix factorization) algorithm (MLIMF) to model the interactions between the users and each event associated with their final decisions. The proposed model considers not only the user-item rating information but also the pairwise interactions based on some empirically supported factors, and this model is used to solve the problem of overdependence on the user-item rating matrix for MF-based (matrix factorization) approaches. To solve the issue of the sparse content of items in collaborative retrieval (CR) system, Yu et al. [26] suggested that the sophisticated relationship of each (query, user, and item) triple should be sufficiently explored from the perspective of items. Besides, an alternative factorized model is proposed in [26], which could better evaluate the ranks of those items with sparse information for the given query-user pair.

Overall, the previous research works on the hybrid recommendation mainly focus on solving the sparsity of rating matrix and mining implicit relations between users and items. However, the feedback from users on recommendations and the timeliness of the recommendations have not been paid enough attention. This work is built on the prior work of content-based and collaborative filtering, and we consider particularly the time factor and user feedback to enhance the performance of recommendation.

3. Preliminaries

3.1. Preference Representation

User preference should be represented in a way that can be easily processed by computer systems. Usually, natural language descriptions of items should be converted to structures that computers can process directly. To specific, there are several structures used widely, such as user-item rating matrix [8], user-interests knowledge table [27], keywords vector, VSM (vector space model), and semantic ontology [2830]. In this work, we choose VSM as the user preference structure due to the following reasons:(1)Compared with user-interests knowledge table [27], the structure of the VSM model is relatively simple, and the computation costs of building the VSM model are not very large.(2)Compared with the keywords vector, the VSM model is equivalent to a set of common keywords vector, and it includes more useful information for recommendation computations.(3)Also, the VSM model does not rely on the natural language processing technology, and the implementation difficulty of VSM is smaller than that of the semantic ontology method.

After that, we can briefly introduce several concepts about VSM as follows:(1)Document: it is usually a fragment with a certain scale in an article, such as sentence, sentence group, paragraph, and paragraph groups. In recommender systems, documents mainly refer to items to be recommended (e.g., news, blog, video, and music) or behavior records of users.(2)Term/feature term: a feature term is the smallest indivisible language unit in VSM, which can be a word, a phrase, and a phrase group. Specific to recommender systems, the term refers to the keywords that can represent the characteristics of recommended items or rating items of users. Therefore, a document can be regarded as a collection of terms and can be expressed as , where () is a feature term.(3)Term weight: for the document , each term should be assigned a weight to indicate its importance in the document . Then, such a document can be expressed as . In recommender systems, the term weight represents the weight of feature keywords after the recommended items are converted into vector form or the score of each rated item in user rating vectors. Therefore, given a document , and it conforms the following two principle: (1) each feature term is different (there is no repetition). (2) There is no sequential relation of each feature term (that is, the internal structure of the document is not considered). Then, we call as the vector or vector space model of the document. TF-IDF (term frequency-inverse document frequency) [9] is usually used to calculate the weight of feature terms in VSM. The main idea of TF-IDF is that if a feature term appears frequently in one article, but rarely in other articles, it is considered that this term should be assigned with high weight. We can describe the calculation process of TF-IDF as follows:(i)Step 1: TF (term frequency) means term frequency, that is, the number of times a term appears in an article. It can be calculated as follows:where is number of in document and the denominator is the total number of all terms in (ii)Step 2: IDF (inverse document frequency) shows the frequency of a term in all documents. If a term appears in many texts, its IDF should be low. It can be calculated as as follows:where is the total number of all documents and indicates the number of documents containing the term . We usually use as the denominator to avoid having a zero denominator.(iii)Step 3: we can calculate the TF-IDF value of the term as , which is shown in the following equation:

After introducing the definition of VSM and the calculation process of TF-IDF in detail, we can give the definition of preference representation as follows.

Definition 1 (preference representation). let represent the user preference that is calculated as VSM, and is the weight of the -th preference.

3.2. Spectral Clustering

Generally, the similarity between users or items should be calculated in the collaborative filtering algorithm. For instance, in user-based collaborative filtering, the first step is to provide a user set composed of elements with the highest similarity. Then, the recommendation of items is based on the interests and preferences of similar users. Consequently, with the increase of users, the efficiency of the recommendation algorithm will decline. To solve this challenge, the clustering algorithm is proposed and applied in the recommendation algorithm. By dividing users into different clusters based on user similarity, the calculations involved are only within the cluster and between different clusters.

The idea of spectral clustering comes from graph theory. The essence is to transform the clustering problem into the optimal partitioning problem of the graph. In the process of spectral clustering, data nodes are used as vertex in the graph, and edges between vertices are assigned weights according to the similarity between data nodes. Finally, an undirected weighted graph is formed. Compared with traditional clustering algorithms such as -means, the spectral clustering algorithm can solve the local optimal problem of convex sample space and can cluster in sample space of any shape, which has a better clustering effect. The specific process of spectral clustering can be summarized as follows:(1)The user behavior dataset should be cleaned and filtered first, and then the similarity matrix is constructed by calculating the similarity of user behaviors.(2)The degree matrix is created based on the similarity matrix : the sum of the elements in each row of the matrix is assigned to the element in the matrix . Then, the Laplace matrix is constructed as .(3)The matrix is decomposed into feature vectors, and appropriate feature vectors are selected for column storage to form a feature matrix .(4)Each vector in the feature matrix is taken as an independent sample, and the vectors are clustered using the -means to form clusters like , , …, .

3.3. Similarity Calculation

In terms of user-based collaborative filtering, the similarity evaluation of users is based on the rating matrix, and the Pearson correlation coefficient is used to measure the similarity. However, the rating matrix is usually sparse; especially, the ratings of different users for the same items are relatively sparse. As a result, the interests of users are significantly different, but the calculated similarity may be relatively close. Therefore, when using the Pearson correlation coefficient to calculate user similarity, the number of common interests of users should be considered, so a modified Pearson correlation coefficient is defined as follows:

where denotes the similarity of users and , is the covariance of and for item ratings, and and denote the standard deviations of users and for ratings, respectively. Here, and denote the set of items rated by and , respectively. Thus, denotes the items that rated by both and . Also, is the proportion of common rating items of users and , which is used to modify the calculated Pearson coefficient. denotes the number of items that rated by users or .

4. Proposed Scheme

4.1. Time-Aware Preference Model

The establishment of the user preference model is a key step for our recommendation scheme. The basic idea is to analyze the users’ behavior recordings of online websites or applications. Based on probability statistics theory, the higher the frequency of terms, the higher the users’ interest in them, and the user preference model can be established based on this theory. Figure 1 shows the basic framework of building user preferences model which consists of three major steps:

Step 1: crawling users’ behavior recordings from Internet applications. In this step, a crawler program is developed to crawl behavior recordings, including users browsing records, comment records and postrecords, and preprocessing users’ behavior recordings from the database. In fact, users’ recordings should be first filtered and cleaned to eliminate invalid recordings. Then, HTML tags, picture elements, and video elements should be removed from documents.Step 2: in this work, we use an attention-based method to build the interest model of users, so that the builded interest model can be dynamically adjusted as the recommended scenario changes. Assume that is the target recommendation vector in the TF-IDF form. The interest model is reversely activated based on the target vector .Let be the behavior record matrix accumulated by user at time after TF-IDF operations, which can be used to build interest model of . Here, we should calculate the similarity between the and each sequence in , and we can use the vector similarity to filter recordings. That is to say, if smaller than a given threshold , then will not be used in the modeling process. Moreover, we should use the calculated as the weight to revise the TF-IDF weight corresponding to as follows:Finally, we will obtain a behavior recordings matrix based on the , and the definition is shown as follows:where denotes a formatted behavior recordings of user and is the generation time of .Step 3: elements in are sorted by timestamp, and the weight of keyword in can be calculated according to equation (3). Then, the time impact factor of keyword could be defined as follows.

Definition 2. (time impact factor): let be the keywords of user . , where the timestamp of is , and the current time is , then the time factor of can be defined as follows:As in equation (7), is inversely proportional to the difference between and , that is to say, the closer the generation time of is to the current time , the higher the should be. Thus, the weight of keyword should be redefined as , which can be described in detail as follows:Therefore, the user preference model can be defined as .

4.2. Feedback Mechanism

Recommendation models are generally based on the idea of when a user is interested in several items, he or she will remain absorbed in it for a long time. Subsequently, items of a similar type to users’ preference will be recommended by systems. However, there are two disadvantages of this mode: (1) ratings are often sparse, and it is hard to obtain full ratings of users for privacy concerns. (2) The offline training mode is difficult to ensure the real-time performance of the recommender system. To address these issues, we introduce a user feedback mechanism into our proposed scheme. As we know, the recommended items with a high click rate or browsing rate can be considered to be appropriate to the user’s preferences. Thus, the weight of such content should be increased. Meanwhile, the recommended items with low click or browsing rate may be considered to have a low matching degree with the user’s preference model, and the weight of such content should be reduced. Specifically, the feedback mechanism can be divided into two phases: building user feedback libraries and applying the feedback libraries. In terms of building user feedback libraries, the key steps are as follows:Step 1 (identifying the features of feedback data): after items have been recommended to users, it is necessary to track the feedback from users on the recommended items to optimize the model. The first thing is that we should identify the features of feedback data and store user feedback data into a database. Generally, we mainly consider the basic features of feedback data such as click or browsing rate, item tags, click, or browsing time in this work. As a result, a set of features will be generated in this step.Step 2 (classifying feedback data): after obtaining standardized feedback data, the next step is to classify these data according to recommendation effect. In this work, we define to indicate the user’s interest in recommendations based on the feedback data, which can be calculated as follows:

where denotes the weight of feature . In fact, based on the feedback data, the information gain of each is calculated to denote the weight of . Besides, we should digitize and normalize all of features (e.g., click or browsing rate and browsing duration) from the feedback data; is used to represent the score value of after digitization and normalization. Here, we use the information gain of the feature as the weight in equation (9), because the information gain can express the contribution of to the classification of records (e.g., features such as click rate and browse duration have different contributions on determining whether a record is negative or not). As for the used score in equation (9), it is essentially a normalized value of the specific feature (e.g., the click rate is 0.43). The reason why we adopt the normalization method is that the value range and measurement of different features are different. Therefore, we need to map the values of different features to the same numerical space (i.e., [0, 1]).

After that, we can categorize the feedback data into the positive sample library and the negative sample library based on the feedback impact factor . Specifically, the positive sample library contains the items with positive feedback from users (i.e., users are more interested in these items), while the negative sample library represents the items that are of little interest to users. The detail process of building feedback libraries is shown in Algorithm 1.

(i)Input: denotes the user feedback to be processed; denotes the defined set of features; is the threshold of positive library; and is the threshold of negative library.
(ii)Output: Boolean.
(1)for in do
(2)   digitization and normalization
(3)   ();
(4)   forindo
(5)    if in then
(6)     ;
(7)    end if
(8)   end for
(9)   if () then
(10)     ;
(11)    end if
(12)    if () then
(13)     ;
(14)    end if
(15)end for
(16)return true;

After completing the establishment of feedback libraries, we can describe the feedback mechanism in detail as follows:(1)When top- recommendation items are generated by the recommender system, then , which is the average similarity between and the positive library, should be calculated. Based on the same principle, , which is the average similarity between and the negative library, should be calculated.(2)We set as the threshold of positive similarity and set as the threshold of negative similarity. These threshold parameters are optimized by machine learning algorithms, but the specific process is not the focus of this study.(3)If , then should be increased the recommendation weight by . Meanwhile, if , then should be reduced the weight by .

Finally, the will be reconstructed based on the feedback mechanism. Specifically, the items in the will be reordered according to the adjusted recommendation weight, and the items with lower recommendation weight will be deleted from the top- recommendation list.

4.3. Collaborative Filtering Based on Spectral Clustering

In this work, the collaborative filtering approach based on spectral clustering can be divided into two stages: the user information clustering and the recommendation of items. The specific process can be described as follows.

4.3.1. Stage of User Clustering

(1)Constructing rating matrix from users’ rating data and the matrix is obtained by numerical normalization and smoothing of the rating matrix (2)For the matrix , the similarity of users is calculated as equation (4), then the similarity matrix of users is obtained(3)The similarity matrix is input as a parameter of the spectral clustering algorithm, and then the clustering result of users will be obtained

It is important to note that the collaborative filtering algorithm is also insensitive to time; therefore, in the process of using rating data to build a rating matrix , we use the defined time impact factor in equation (7) to improve the timeliness of . That is to say, for each rating element in , we calculate its time impact factor and weight the rating value.

4.3.2. Stage of Item Recommendation

(1)Select the -nearest neighbor set from the cluster containing the user (2)Calculate the prediction rating value of user for an unrated item as follows:where denotes the similarity between user and user , which can be calculated as equation (4). Then, is the rating of user for the item .

Finally, we can obtain the prediction rating vector of user for all items as . Here, should be sorted and the top- items in it can be selected as the recommendation items.

4.4. Content Filtering Based on User Preferences

Based on the constructed preference model , we use the content-based model to generate a batch of candidate recommendation items, which can effectively recommend new items that are not rated by other users. Since each user’s preference model is based on his/her behavior recordings, then this method can eliminate the interference of other users’ malicious behaviors (e.g., using multiple accounts to forge the ratings of an item) from interfering with the recommendation results. The basic idea of content filtering is to recommend items according to the user preference model. In this study, we use the content filtering method as one of the important components of our scheme. We can briefly introduce the process as follows:(1)Let be the -th keyword of item . is the weight of on , then the content of can be defined as .(2)As mentioned above, the content filtering method recommends users with similar content to their previous favorite items. Therefore, it is necessary to model users’ preferences based on their previous behaviors. In fact, we utilize the proposed approach in Section 4.2 to build preference model as .(3)In general, we should map and to the same vector space, so that the recommendation calculation could be transformed into vector similarity calculation. One important thing is to calculate the semantic similarity of keywords between vectors. For instance, “algorithm” and “machine learning” have a high semantic correlation, but if we only use cosine similarity, the semantic similarity may be ignored. In this work, we use the Skip-gram model [31] to obtain word embeddings, and we use Google news corpus as the training dataset. The basic structure of Skip-gram is shown in Figure 2, which is a three-layer neural network. The first layer is the input layer for keywords, and the input keyword is usually in vector form; we use the one-hot encoding technique to convert natural language keywords into vectors. The third layer is the output layer, which is the probability of other words appearing in the context when the input is known, and Softmax is usually used to calculate the probability in this layer. Besides, the hidden layer does not have any activate function and is usually composed of linear cells. We use the method of random gradient descent [31] to update the model parameters, and the generated word embedding is used to measure the semantic similarity of the input keyword. In fact, keywords are semantically similar, then their contexts are similar, and their vector representations are also similar.(4)After completing the attribute mapping, and will be unified into the same VSM. Then, we should calculate the similarity between the content vector and the user preference vector as follows:where the cosine similarity is used to measure the degree of compliance with user preferences and is the weight of the -th attribute in the vector .Finally, we can obtain the similarity vector of items as . Here, should be sorted, and the top- items in it can be selected as the recommendation items.

4.5. Hybrid Model

In this section, a hybrid recommendation model based on logistic regression is proposed, and the architecture is shown in Figure 3.

Definition 3. (hybrid model): let be the recommendation item vector, and the vector of corresponding prediction rating based on collaborative filtering for the user as . Meanwhile, the rating value of the content-based model is . Then, let be the result of the hybrid model, where represents the final rating of recommended item , which is calculated using and .
Logistic regression is used to aggregate the results of recommendation both collaborative filtering and content-based. Then and should be the parameters of the following equation:where can be defined as follows:where denotes the constant coefficient for the input parameters and , and can be calculated by the gradient descent algorithm.
Overall, we should use the proposed hybrid approach to obtain a better recommendation effect and the specific process can be described as the following steps:Step 1: using the above collaborative filtering in Section 4.3 to obtain the recommendation items and the corresponding prediction rating value .Step 2: using the above content-based model defined in Section 4.4 to obtain the recommendation items prediction rating value of as .Step 3: and are selected as input parameters of equation (13). Then, the comprehensive rating would be generated.Step 4: will be sorted and the corresponding items should be selected as the top- recommendation items.It is important to note that the obtained should be filtered by feedback libraries (i.e., the positive and negative libraries in Section 4.2) to achieve better recommendations.

5. Experiments and Results

5.1. Data Preparation

To compare the proposed model with state-of-the-art methods, we use three public real-world datasets for comparison, which are introduced as follows:(1)OULAD (Open University Learning Analytics) dataset [32]: it is a recently released open-source dataset. The employed dataset (OULAD) contains 32,593 learners and their assessment results (about 10,655,280 records). In this paper, we mainly focus on VLE (Virtual Learning Environment) data, which show learner preference in choosing learning materials.(2)MovieLens-Latest: this dataset is collected as part of the GroupLens Research Project of the University of Minnesota. This dataset consists of 27,000,000 ratings and 1,100,000 tag applications applied to 58,000 movies by 280,000 users.(3)Book-Crossing: this dataset is collected by Cai-Nicolas Ziegler from the Book-Crossing community. The dataset contains 278,858 users (anonymized but with demographic information) providing 1,149,780 ratings (explicit/implicit) about 271,379 books.

In fact, we should take further measures to deal with the above datasets. Specifically, the data should be filtered to achieve more reliable and completed records. For instance, we obtain about 140,406 valid records from studentAssessment.csv by extracting the score is more than from 173,913 records. Moreover, to reduce the interference of outlier data to the experiment, we use -means to cluster the data and eliminate outlier data. A large abnormal number may cause great bias to the clustering result. For the Book-Crossing dataset, we remove about 8,576 records from its Bx-Book-Ratings.csv. Then, the statistics of these three datasets are shown in Table 1.

Furthermore, we should determine which basic features to be used based on the specific dataset. To specific, we choose the online learning duration, the click rate of courses, and the category of courses as the basic features in the OULAD dataset. As for the MovieLens-Latest dataset, we choose the browsing time, types of movies, and rating scores as the basic features. When it comes to the Book-Crossing dataset, the rating score from users, the publication date, and the item tag are chosen as the basic features.

5.2. Evaluation Metrics

In this study, precision, recall, and F1-score are utilized as metrics to evaluate our proposed scheme. The definitions are as follows:where precision is the ratio of the correctly judged existent propagation relations to all the judged existent propagation relations. Recall is the ratio of the correctly judged existent propagation relations to all the existent propagation relations in the system. F1-score is the harmonic mean (average) of the precision and recall. Hence, F1-score will be a better measure when precision and recall are sometimes contradictory. The meanings of , , and are described in Table 2.

5.3. Evaluation Baselines

In this section, we compare the proposed scheme with several state-of-the-art recommendation algorithms. Thus, the used baselines in our experiments as follows:(1)PRMR [33]: this method is proposed to improve the efficiency of collaborative filtering (CF) for movie recommendations, and then a simple but high-efficient recommendation algorithm is proposed, which exploits users’ profile attributes to partition them into several clusters. For each cluster, a virtual opinion leader is conceived to represent the whole cluster so that the dimension of the original user-item matrix can be significantly reduced.(2)AROLS [34]: this work introduces a learning style model to represent features of online learners. It also presents an enhanced recommendation method named Adaptive Recommendation based on Online Learning Style (AROLS), which implements learning resource adaptation by mining learners’ behavioral data. AROLS applies collaborative filtering (CF) and association rule mining to extract the preferences and behavioral patterns of each cluster.(3)HRBRM [35]: the most important achievement of this study is to present a novel approach in hybrid recommendation systems, which identifies the user similarity neighborhoods from implicit information.(4)HRSRL [36]: this work proposed a hybrid recommendation system, combining content-based and collaborative filtering for job recommendations. In this proposed system, Statistical Relational Learning (SRL) is used to combine the two recommendation approaches through its ability to directly represent the probabilistic dependencies among the attributes of related objects.

Note that for all comparison methods, we tune the hyperparameters carefully according to corresponding references to ensure that each method achieves its performance for a fair comparison. To be specific, we divide the employed dataset into five parts, which can be test datasets and training datasets. Moreover, we set the test dataset accounts for (20%) and the training data takes up (80%) of the original dataset. Five rounds of training and testing are required, and each round of the training process needs to change the test dataset (keep the number of test dataset unchanged, accounting for (20%) of the total dataset), so that after five rounds of testing, the total employed dataset can be utilized. Here, we can briefly describe this process, as shown in Figure 4.

5.4. Impact of Parameters

To make our scheme achieve better performance, we analyze how the parameters affect the performance of our proposed scheme on all the employed datasets (i.e., OULAD, MovieLens-Latest, and Book-Crossing). We study the hyperparameters of and , which are, respectively, the threshold of building positive and negative libraries. We sample the values of and all from 0.1 to 0.9, and the results on F1-score are shown in Figure 5. Specifically, we first evaluate the impact of parameter on the recommendations. We set and Top- = 25 (i.e., the number of recommendation items). F1-score increases when increases, which conforms to the fact that our scheme utilizes more information from user preferences. However, when increases to a certain value, the corresponding F1-score decreases; this may be because the accuracy of the feedback library decreases. Based on the same principle, we also set and Top- = 25 to evaluate the impact of , and the results are shown in Figure 5(b). Furthermore, to select the appropriate and , we conduct an offline training process of parameters, which maximizes the F1-score of recommendation results by constantly changing and .

As illustrated in Figure 6, we keep adjusting and constantly to achieve better recommendations (F1-score) and finally obtained the optimized parameters in Table 3 through several rounds of offline training.

5.5. Experimental Results

In our experiments, the selection of internal parameters (i.e., and ) are based on the results in Table 3. For performance comparison, we report the recommendation precision and recall of different methods over the three datasets in Table 4. It is important to note that all the results in Table 4 are the mean performance of each experiment five times.

From the results in Table 4, we have the following insightful observations: first, our proposed scheme performs better than other baselines evaluated here on the three employed datasets in terms of precision and recall. To specific, we can see that our scheme performs better than PRMR about 11.6% and 9.2% on metrics of precision and recall, respectively. In fact, PRMR is used to improve the efficiency of collaborative filtering in movie recommendation scenes; the main focus is to improve the time complexity of CF but the real-time performance of recommendations and user preferences are not fully considered. As for the AROLS model, which performs well in the dataset of OULAD, since this method focuses on the recommendation of learning resources, but it just applies collaborative filtering (CF) and association rule mining to extract the preferences without using any user feedback, which leads to the precision and recall to be still lower than that of our scheme about 15.8% and 8.5%, respectively. Second, from the results, we find that hybrid methods perform better than traditional recommendation algorithms. For instance, HRBRM performs better than PRMR and AROLS by about 4.5% and 4.9% in terms of recall. HRSRL also performs better than PRMR and AROLS by about 4.8% and 10.1% on the metric of precision, respectively. This is because these hybrid schemes take advantage of collaborative filtering and content-based approaches. Compared with the abovementioned hybrid models, our scheme achieves better performance on these three datasets. Specifically, for all the used datasets, our scheme performs better than HRBRM and HRSRL by about 7.2% and 6.6% on the metrics of F1-score, respectively. Note that the F1-score that we used is the average value of different Top-N conditions in Table 4. Especially, with the increase of recommendations, the performance stability of our scheme is superior to the other two baselines. In fact, with the increasing times of experiments, the advantages of our scheme in time and user preference perception will be more obvious. The reasons may be that our proposed scheme with the time-aware and user feedback mechanisms is more sensitive to user preferences.

F1-score measures the overall performance of recommendation schemes, and it is shown in Figure 7. For our proposed scheme, F1-score changes slightly with the number of recommendation items, while the baselines also change around certain values. To be specific, we can measure the stability of our scheme by calculating the standard deviation of F1-score. For the proposed scheme, the calculated standard deviation of F1-score is about 0.0401, which is in a low state. We can see that the performance of our scheme is in a relatively stable state compared with other used hybrid schemes (i.e., the standard deviation of HRBRM is 0.0428 and that of HRSRL is 0.0459). As shown in Figure 7, our scheme achieves the best performance for F1-score on all the three datasets. Specifically, the average F1-score of our scheme is about 10.2%, 12.5%, 7.2%, and 6.6% greater than that of PRMR, AROLS, HRBRM, and HRSRL, respectively. Overall, compared with PRMR and AROLS, our scheme can make use of the advantages of collaborative filtering and content-based models. For the hybrid baselines (i.e., HRBRM and HRSRL), our scheme takes into account the time characteristics of user preferences and the user feedback on recommendation items. Furthermore, to verify that the improvements in our scheme are statistically significant, we conduct several t tests to analyze the achieved improvements. To be specific, we adopt the paired-sample t tests and use the calculated F1-score of baselines as the analysis data. Thus, we can briefly describe the process of t test as follows. First, we should propose hypothesis and test level as follows.

: (null hypothesis); : (alternative hypothesis); two-sided test, test level: .

Then, we can calculate the test statistics as , . Thus, we obtain the test statistics as: , , , and . Note that the calculation process is complicated and not the focus of this work, so we give the results directly. Finally, we query the t tests threshold table and find that . Because all of the calculated values (i.e., , , , and ) are bigger than , should not be approved, and all the improvements of our scheme are statistically significant.

6. Conclusion

In this study, we proposed a hybrid recommendation scheme combining content-based and collaborative filtering. To improve the accuracy of the hybrid model, we propose the time impact factor of user preferences and analyze the effect of the time factor on recommendations. Also, a user feedback mechanism is proposed in this work, and such a feedback mechanism is used to filter the final recommendations. Simulation results demonstrate that the proposed scheme is effective compared with baselines. Further studies are still needed in the future, for example, how to introduce the FM (factorization machine) method into our proposed scheme to improve the recall and precision of the proposed model in this study. Also, we should study how to filter out invalid feedbacks according to different recommendation scenarios to improve the efficiency of the feedback mechanism.

Appendix

A. Stability of performance

We have adopted multiple Top-N tests to study whether the performance of our scheme is relatively stable. Hence, we analyze the F1-score of our scheme (here, we use F1-score as the metric to evaluate the performance of hybrid schemes), considering its change trend and fluctuation with the recommendation size. Then, we have plotted the F1-score comparison histogram of all the involved hybrid schemes. As shown in Figure 8, the average F1-score of our scheme is , and we can calculate the standard deviation of F1-score for our scheme as . We see that the performance of our scheme is relatively stable with the recommendation size. Note that the standard deviation of other hybrid schemes (HRBRM and HRSRL) is 0.0428 and 0.0459, respectively.

B. Effect analysis of feedback mechanism

To verify the effect of the feedback mechanism in our proposed scheme, we have designed and implemented a set of experiments, which includes the proposed complete scheme and the proposed scheme without the feedback mechanism. In this test, we use F1-score as the metric of performance, and the experimental results are shown in Figure 9.

From the results in Figure 9, we can see that the performance of the scheme without the feedback mechanism is significantly lower than that of the complete scheme. As far as F1-score is concerned, the complete scheme obtains higher scores than that of the scheme without the feedback mechanism on all datasets. In addition, the average F1-score of the complete scheme is , and the average value of the scheme without the feedback mechanism is . Thus, the average F1-score of the complete scheme is about 17.9% higher than that of the scheme without the feedback mechanism. Therefore, whether from the test results on each dataset or the overall average performance, the feedback mechanism can significantly improve the recommendation results.

C. Effect analysis of time factor

To evaluate the effect of the time factor in our scheme, we have designed and implemented the control experiment of the complete scheme and the scheme without the time factor mechanism. In this test, we use F1-score as the metric of performance, and the results are shown in Figure 10.

From the results in Figure 10, we can see that the complete scheme has better performance in F1-score than that of the scheme without the time factor mechanism. To specific, the complete scheme obtains higher values than that of the scheme without the time factor on each dataset. Also, the average F1-score of the complete scheme is , and the average value of the scheme without the time factor mechanism is . Thus, the average F1-score of the complete scheme is about 19.2% higher than that of the scheme without the time factor mechanism. Therefore, the time impact factor can improve the performance of our proposed scheme obviously.

D. Resource consumption comparison

Furthermore, to evaluate the resource consumption of our proposed scheme, we have designed and implemented a comparison experiment. This experiment is built on a desktop computer (Lenovo ThinkCentre M720) with Intel Core i7-7500 2.7 GHz processor and 32 GB. Specifically, we evaluate the resource consumption of algorithms in terms of memory consumption and CPU usage and then use VisualVM 1.4.4 as a measurement tool to monitor the memory and CPU usage.

As shown in Table 5, we mainly consider the resource consumption comparison between the involved hybrid models (i.e., HRBRM [35] and HRSRL [36]) and our scheme. We can see that our proposed scheme performs better than the other two models; the memory usage of our scheme is less than 2 MB; while for the HRBRM [35] and HRSRL [36] schemes, the minimum memory usage is 5.53 MB and 2.79 MB, respectively. In terms of CPU usage, our proposed scheme is significantly lower than that of HRBRM [35] and HRSRL [36], respectively. To be specific, for our consensus, the highest CPU usage of our scheme is 4.4% (Top-N = 60), which is lower than the lowest value of HRBRM [35] (i.e., 5.7%) and HRSRL [36] (i.e., 5.3%), because our scheme adopts the efficient and simple VSM model, which involves less inner product computations. Also, our scheme adopts the hybrid approach based on the logistic regression, which does not involve complex computations compared with other comparisons. Hence, our scheme has more advantages than other schemes in memory and CPU consumptions.

Data Availability

OULAD (Open University Learning Analytics) dataset [32] is a recently released open-source dataset. The employed dataset (OULAD) contains 32,593 learners and their assessment results (about 10,655,280 records). In this paper, we mainly focus on VLE (Virtual Learning Environment) data, which show learner preference in choosing learning materials. MovieLens-Latest dataset is collected as part of the GroupLens Research Project of the University of Minnesota (available online at https://grouplens.org/datasets/movielens/). This dataset consists of 27,000,000 ratings and 1,100,000 tag applications applied to 58,000 movies by 280,000 users. Book-Crossing dataset is collected by Cai-Nicolas Ziegler from the Book-Crossing community (available online at https://grouplens.org/datasets/book-crossing/). The dataset contains 278,858 users (anonymized but with demographic information) providing 1,149,780 ratings (explicit/implicit) about 271,379 books.

Additional Points

Threats to validity: because our scheme is based on content-based and collaborative filtering model, the effect of this scheme is limited for scenes that are not suitable for content recommendation, such as video and picture recommendations.

Conflicts of Interest

The authors declare that they have no conflicts of interest.