Abstract

This paper proposes a personalized music recommendation method based on multidimensional time-series analysis, which can improve the effect of music recommendation by using user’s midterm behavior reasonably. This method uses the theme model to express each song as the probability of belonging to several hidden themes, then models the user’s behavior as multidimensional time series, and analyzes the series so as to better predict the use of music users’ behavior preference and give reasonable recommendations. Then, a music recommendation method is proposed, which integrates the long-term, medium-term, and real-time behaviors of users and considers the dynamic adjustment of the influence weight of the three behaviors so as to further improve the effect of music recommendation by adopting the advanced long short time memory (LSTM) technology. Through the implementation of the prototype system, the feasibility of the proposed method is preliminarily verified.

1. Introduction

In recent years, with the rapid development of mobile Internet and smart phones, information is growing exponentially, and it leads to a serious problem of information overload [1]. Facing massive information and many choices, people often fall into the theory of choice capture or cannot make a reasonable choice or need to spend a lot of energy to make the right choice. In order to solve the problem of information overload and reduce the burden of people in decision-making, information classification, search engine, and recommendation system technology emerge as the times require [2]. In order to liberate people’s hands and tap people’s internal preferences and needs, recommender system technology came into being. The recommender system is essentially an information filtering system, which can mine user’s behavior preference through the analysis of user’s behavior history and then help users filter out the useless information in the mass of information and recommend the information in line with user’s preference. At present, the recommender system has been applied in many fields, such as Amazon in the field of e-commerce, YouTube in the field of video, LastFm Radio in the field of personalized music, and personalized reading in Flipboard reading [3].

Yang et al. [4] showed that people’s behavior of listening to music in daily life is far more than reading and watching movies, which shows that music has become an indispensable part of people’s life. The music recommendation system is the application of the recommendation system in the field of music [5]. Fereidoony et al. [6] recommended songs that meet the needs and preferences of users by analysing the listening habits of users and the characteristics of songs. In order to meet people’s personalized demand for music, some music recommendation systems have been developed, such as LastFm and Pandora abroad, radio station, and Xiami music map in China, which show the interface of radio station and Xiami music [7, 8]. These music recommendation systems first establish their own music library, then analyse the characteristics of songs and users’ listening habits, and then make recommendations for users [9]. Among them, Pandora is one of the most popular music recommendation systems. It assigns attributes to each song through “Music Genome Project” and then makes recommendations for users according to the similarity of songs [10]. LastFm mainly makes recommendations for users based on the assumption that “similar people tend to have similar behaviors.” Shrimp music not only gives the recommendation results but also gives the reasons for the recommendation, which improves the user’s acceptance of the recommendation results to a certain extent [11].

At present, researchers have proposed some effective music recommendation algorithms. According to the degree of these algorithms referring to user behavior, we divide these algorithms into three categories [12]: music recommendation based on user’s long-term behavior, music recommendation based on user’s real-time behavior, and music recommendation based on user’s med-term behavior [13]. Among them, music recommendation based on user’s midterm behavior has attracted more and more attention because it considers the influence of context on user’s behavior [14]. In addition, this paper believes that the future behavior of users is related to their long-term behavior, medium-term behavior, and immediate behavior, but there is no work to consider the influence and effect of these three aspects. For example, users may like rock music for a long time, but most of the music on demand recently is lyric music. Therefore, we need to comprehensively consider the long-term, short-term, and medium-term behavior of users and take into account the user behavior of multiple time dimensions.

In order to solve the problems existing in the literature and improve the recommendation effect by using the user’s midterm behavior reasonably, this paper proposes a music recommendation method based on multidimensional time-series analysis. The theme model is used to represent songs as probability distribution composed of several hidden themes, and on this basis, the user’s behavior in the current session is represented as multidimensional time series. Through the analysis of the multidimensional time series, the method predicts the characteristics of the next song that users may listen to and selects similar songs from the music library to recommend to users. In addition, user’s future behavior is affected not only by user’s medium-term behavior but also by user’s immediate behavior and long-term behavior. However, there is no work to consider the influence of these three factors. Therefore, a comprehensive recommendation method based on users’ long-term, medium-term, and real-time behaviors is proposed to comprehensively examine the time dependence of users’ behaviors.

In section 2 of this paper, the related works about the music recommendation system is introduced. Section 3 introduces the structure of the music recommendation system; in addition, the multidimensional time-series model considering real-time behavior, midterm behavior, and long-term behavior is also introduced. Section 4 is the simulation and analysis of the experiment. Section 5 is the conclusion.

At present, the research work in the field of the music recommendation system has made great progress, and there are many music recommendation algorithms. In order to recommend suitable songs for users, most music recommendation methods need to analyse users’ listening behavior. According to the reference degree of the recommendation algorithm to user behavior, we divide these music recommendation methods into music recommendation based on user’s real-time behavior, music recommendation based on user’s long-term behavior, and music recommendation based on user’s medium-term behavior.

2.1. Music Recommendation Based on Real-Time Behavior

Music recommendation based on user’s real-time behavior is the simplest music recommendation method. This kind of music recommendation algorithm thinks that the user’s state remains stable in a short time, and the next song that the user may listen to has similar characteristics to the song that the user is listening to [15]. Literature [12] makes full use of the editing attributes of songs to recommend songs with the same or similar authors, song names, and lyrics to users. Literature [16] uses the acoustic characteristics of songs to make recommendations for users and recommends songs with similar rhythm, rhythm, and timbre to users. Literature [17] uses mood features to describe songs and recommends songs with similar mood attributes to current songs to users. All these works describe songs from a single aspect and then make recommendations for users, but this recommendation is often one-sided. For example, reference [18] is not suitable for users who are not sensitive to acoustic features but sensitive to emotional features. Literature [19] analyses the text documents corresponding to songs, uses semantic features to describe songs, and achieves good results.

2.2. Music Recommendation Based on Long-Term Behavior

Different from the music recommendation based on users’ real-time behavior, the music recommendation based on users’ long-term behavior investigates and analyses all the songs that users have listened to [20]. Literature [21] uses collaborative filtering recommendation, which is the most popular recommendation algorithm recently. It mines the social environment of users and uses swarm intelligence to make recommendations for users. Reference [22] uses acoustic features to describe songs and then calculates the average acoustic features of all songs listened to by users as user features. Because music recommendation based on users’ long-term behavior has a comprehensive analysis of users, this kind of algorithm can often get good recommendation effect. However, the impact of such algorithms on the context of users is not enough to meet the immediate needs of users [23].

2.3. Music Recommendation Based on Midterm Behavior

Music recommendation based on users’ midterm behavior is a kind of recommendation algorithm, which considers that the sequence of songs that users listen to in the current session can reflect the context of users to a certain extent [24]. They predict the user behavior and recommend songs by analysing the song sequence. Literature [25] uses PrefixSpan as the representative of the pattern mining algorithm to mine the topic sequence, while the literature uses the Markov model to analyse the topic sequence. Literature [26] is two typical representative methods based on user’s midterm behavioral music recommendation. These two methods first use the method of text analysis to represent songs as probability distribution composed of several hidden topics and use some significant topics to represent songs [27]. Literature [28] uses several recently listened songs to match users’ listening history and then finds similar listening behaviors and makes recommendations for users. It also achieves good results, but this kind of method is not enough to reflect users’ long-term preferences and recent preferences [29].

3. Music Recommendation Algorithm Based on Multidimensional Time-Series Model

3.1. Overall Architecture of Music Recommendation System

The music recommendation prototype system is mainly divided into client, server, and database. The client is mainly a mobile terminal, which is responsible for interacting with users, obtaining user information, music information, and context information, and showing different services of the platform to users. The client is the closest terminal to the user, and it is also the entrance for users to obtain services. Whether the interface design is beautiful or not affects the user experience. The server obtains the input information from the client, retrieves the corresponding information from the database, and then conducts offline model training or real-time recommendation. The server is the most critical part, which requires the service state to be stable. A few seconds of downtime will affect the experience of tens of millions of users. In the real-time prediction system, because the recommender system needs to use the model to predict the user’s preferences in real time, the candidate set data may be large, so this process will be time-consuming, but letting the user wait for a long time will seriously affect the user experience, so fast response to the user’s service request is a necessary factor for the real-time recommender system. The database stores user data, song data, interaction behavior, and so on and is responsible for providing data support for large-scale computing. The recommendation engine system is divided into offline model system and real-time prediction system. After obtaining the data, the offline model system carries out offline calculation and model training and updates the model periodically. When the real-time prediction system receives the request from the client, it uses the existing model to predict the user’s preference for the songs in the candidate set and pushes the user’s favourite songs to the user terminal in real time to complete the closed-loop recommendation. The structure diagram is shown in Figure 1. It should be pointed out that multidimensional mainly refers to multiple time dimensions, corresponding to the short-term behavior, medium-term behavior, and long-term behavior.

As shown in Figure 1, the client is the platform entrance to interact with the user, and the user’s browsing, listening, liking, collecting, and other behaviors are generated on the client. When the user enters the personalized recommendation module, the client requests the recommendation service from the server, and the server’s real-time prediction system matches the user and the candidate songs one-to-one in the candidate music set, extracts the corresponding user from the database and music features, using the high-order multi information dimensionality reduction model for preference prediction, generates candidate recommendation list, and then after fine sorting, selects the final recommendation list of top music to push to the client. The offline model system of the server periodically extracts data from the database for model training and updating and selects the optimal model to go online to the real-time prediction system according to the performance evaluation.

3.2. Music Recommendation Structure Based on Multidimensional Time-Series Model

In order to describe songs comprehensively, this paper describes songs by the topic model based on mapping songs into documents. The implicit Dirichlet assignment model is the most representative and popular probabilistic topic model, which has been widely used in text mining, information processing, multidocument summarization, and other fields. The model can map the document to a probability distribution composed of several hidden topics, which can be represented by a topic weight vector. Each dimension of the vector represents the contribution of the corresponding topic to the document content. Through topic model modelling, we can not only abstract the hidden topic set contained in the document but also express the distance and similarity between different documents in a quantitative way. This section describes in detail the process of using the theme model to model songs.

3.2.1. Song Theme Modelling

First of all, we grab users’ labels on songs from music websites such as Google and LastFm. These labels describe the content of songs comprehensively, including not only the name of the song, the information of the composer, the information of the album, the release date, and other basic information but also the theme of the song, the type of song, the user’s mood, the suitable occasion, and other extended information.

In order to reduce the noise, we only use the tags that are used by most people to complete the construction of the text document corresponding to the song (Figure 2). Specifically, we only consider the tags that are marked more than 10 times. In this way, each song s will correspond to a document D, and the song set will correspond to a document set D. Finally, we model the topic model of the document set D and get k implied topic sets T, which are used to describe the characteristics of songs comprehensively.

At the same time, for any song in the song set , we can get its corresponding hidden topic weight vector; this is as follows:where means that the song does not belong to the category represented by the implied theme; that is, the implied theme has no contribution to the content of the song; means that the song completely belongs to the category represented by the implied theme; means that the song belongs to the category represented by the implied theme to a certain extent.

3.2.2. Time-Series Construction

For the user who has listened to a total of songs and songs in the current session, in order to predict the next song that he may listen to, we need to obtain the topic weight vector corresponding to the song:where is the probability that a song belongs to the i-th implied theme. It can be seen from the previous article that user listening behavior is chronological. If we regard the topic vector corresponding to the songs that users listen to at a certain time point as a variable, then we can get the multidimensional time series as shown in the formula by arranging the values of this variable at different time points in chronological order:

Furthermore, the above expression is expanded on the dimension implicit topic vector:where is the number of the implied theme and is the index of the song that the user listens to in the current session. Obviously, any time point in the multidimensional time series corresponds to a k-dimensional topic vector, and each dimension of the k-dimensional topic vector corresponds to a single variable time series:

3.2.3. Multidimensional Time-Series Prediction Modelling

Through the analysis of , we can get the change of the songs that users listen to in the current session on the number of the implied theme, and then we can predict the membership degree of the next song that users may listen to on the number of the implied theme, namely, . Furthermore, we can get the change of user behavior on other implicit topics and estimate the membership of the next song that the user may listen to on other implicit topics.

In this paper, the time-series data of real-time behavior, medium-term behavior, and long-term behavior are used as the input of the music recommendation algorithm, and LSTM network is used to train the optimal music topic probability. At the same time, the user’s search information or the theme information of the search song is also processed through LSTM network, and then the probability information is generated through softmax function. At the same time, AVG or max function is used to arrange or integrate the probability. Finally, all the probability information of possible songs is obtained and sent to the probability distribution. The specific implementation process is shown in Figure 3.

3.3. The Structure of LSTM

LSTM is the core network unit in the whole recommendation algorithm model. We will explain its structure below. Recurrent neural network (RNN) is widely used in the fields of machine translation, language prediction, and weather prediction because it can comprehensively use sequence data to predict the future development trend of sequence. In this case, the long short-term memory (LSTM) network with gating unit and tanh layer structure is used to solve the problem of gradient disappearance. This LSTM network is a special recurrent neural network characterized by forgetting gate unit. The existence of forgetting gate is mainly used to choose to discard some useless information and keep the useful information. The structure of LSTM network is shown in Figure 4.

The structure diagram is shown in Figure 4. The network structure includes input layer, hidden layer, and output layer. The input layer takes the processed sequence data as the input of the model, and the data are preprocessed by normalized data. The hidden layer contains LSTM neural network unit, which is a typical recurrent neural network. The output layer is the preference probability of all song topics; that is, all music topics are calculated, and then the probabilities are sorted. The sorted probabilities are combined with the data of search songs or information to carry out a comprehensive probability sorting so as to form a joint distribution model of probability.

4. Simulation Results and Performance Analysis

4.1. Data Sources and Simulation Setting

In this section, we will verify the effectiveness of the music recommendation method based on multidimensional time-series analysis through experiments, including the design idea, experimental results, and result analysis of the experiment.

Through the analysis of the music recommendation method based on multidimensional time-series analysis, we can see that the datasets we need mainly include the song dataset containing label text information and the song dataset that users listen to in a certain session cycle. To do this, we recrawled a dataset from the basic information of songs and users. The dataset contains not only the basic information of songs, including text information such as tags, but also the basic information of users, including the songs that users listen to in a certain session cycle. In order to reduce the noise, we only select the list containing more than songs, the tags with frequency greater than 4, and the songs with available tags greater than 4. The statistical information of the dataset is summarized as follows. The dataset information is summarized as follows: listening to music events is 34930; total number of songs is 24992; total number of users is 1530; total number of singers is 5479; maximum length is 30; minimum length is 10; and average length is 22.81. And the dataset has been published on http://lastfmseq.sinaapp.com. In addition, the short-term behavior is defined as the latest 20 songs, the medium-term behavior is defined as the latest 50 songs, and the long-term behavior is defined as the latest 100 songs.

In this paper, the experiment is carried out on a real desktop computer Dell Optiplex74, the operating system is Ubantu12.04, the core Intel E6300, the memory size is 4G, and the hard disk space is 1T. The programming language used in the experiment is python2.7.

4.2. The Optimal Selection of LSTM Neural Network

Because the relationship between the size of test data set and the model structure is difficult to determine, we try to find the best model structure under different test datasets by different combinations. The typical network structure is shown in Table 1.

Because the training dataset contains different user real-time behavior, medium-term behavior, and long-term behavior, in order to avoid any deviation, we randomly extract data from the available dataset as the training dataset and the rest of the data as the test data set. At most 30% of the training datasets are used for verification. Therefore, this paper combines 10%, 15%, 20%, and 25% of the test datasets with different network structures to select the appropriate training scale and neural network structure. The simulation results are shown in Figure 5:

The performance indexes of our model under different network structures are shown in Figure 5 (where the time interval is 1 s). It can be seen that the performance index of the music recommendation model based on LSTM network is similar to that under different structures, and Mae will not decrease with the increase in hidden layer and neuron. The results show that 20% of the test dataset and structure (7, 4 hidden layers contain 50, 30, 20, and 10 neurons, respectively) is the best choice based on the multidimensional temporal recommendation model. In the rest of the paper, the structure of the recommendation algorithm model is structure 7, and the appropriate size of the test dataset is 20%

Because the training dataset contains different user real-time behavior, medium-term behavior, and long-term behavior, in order to avoid any deviation, we randomly extract data from the available dataset as the training dataset and the rest of the data as the test dataset. At most 30% of the training datasets are used for verification. Therefore, this paper combines 10%, 15%, 20%, and 25% of the test datasets with different network structures to select the appropriate training scale and neural network structure. The simulation results are shown in Figure 5.

The performance indexes of our model under different network structures are shown in Figure 5 (where the time interval is 1 s). It can be seen that the performance index of the music recommendation model based on LSTM network is similar to that under different structures, and Mae will not decrease with the increase in hidden layer and neuron. The results show that 20% of the test dataset and structure (7, 4 hidden layers contain 50, 30, 20, and 10 neurons, respectively) is the best choice based on the multidimensional temporal recommendation model. In the rest of the paper, the structure of the recommendation algorithm model is structure 7, and the appropriate size of the test dataset is 20%

4.3. The Accuracy Validation of Sequence Mining

In order to obtain the optimal sequence length, we use the experimental method to verify. First of all, we define the network structure of LSTM as network number 7 of part 4.2, and the training data account for 20%. We will listen to the length of the song from 1 to 100, respectively, to get the hit radio.

Figure 6 shows the change of hit rate of different recommendation algorithms when the number of recommended songs increases from 0 to 100, including collaborative filtering algorithm based on users (UserKNN), recommendation algorithm based on the Markov model (Markov), recommendation algorithm based on global features (Globals), music recommendation based on user’s real-time behavior (Local), and multidimensional time recommendation algorithm proposed in this paper, i.e., music recommendation method based on sequence analysis (our). The abscissa represents the number of recommended songs, and the ordinate represents the hit rate of the algorithm.

It can be seen from Figure 6 that the curve representing the method described in this paper can be obviously separated from other curves and is above other curves, indicating that the method proposed in this paper can obtain better recall rate than other similar work, and the improvement effect is obvious. In addition, with the increase in the number of recommended songs, the recall rate of the method described in this paper also increases and shows a gradual upward trend.

4.4. The Accuracy Validation of Proposed Scheme

Figure 7 shows the change of recommendation accuracy of different recommendation algorithms with the increase in recommendation list length. It can be seen intuitively from the figure that although the accuracy of several recommendation algorithms decreases with the increase in the length of the recommendation list, the music recommendation algorithm based on multidimensional time-series analysis presented in this paper can achieve the best recommendation accuracy.

It should be noted that with the increase in the length of the recommendation list, the recommendation accuracy of each algorithm is relatively low and continues to decline, which can be derived from the definition of accuracy. If the number of users is 20, the accuracy of the KNN recommendation algorithm is about 95%. When the length of the recommendation list increases to 60, the accuracy of the KNN recommendation algorithm is about 96%. The accuracy of the algorithm in this paper is always the highest. In short, for music recommendation, the change of numerator in the definition of accuracy is very small and the value is also very small, but the denominator is growing rapidly, so the value of recommendation accuracy is very small and the accuracy is declining with the increase in recommendation list length. Therefore, the low prediction accuracy and the decrease with the increase in recommendation list can be explained, but we are not concerned about its absolute value. In a larger sense, we are concerned about the relative effect between the recommendation accuracy obtained by different recommendation algorithms.

Figure 8 shows the trend of root mean square error and average absolute error of several different music recommendation algorithms as the length of recommendation list increases. The abscissa represents the number of recommended songs, and the ordinate represents the error of different recommendation algorithms. It can be seen from the figure that the recommendation error of the proposed method is smaller than other algorithms. With the increase in the length of the recommendation list, the number of invalid songs in the list increases, which makes the recommendation error increase, but the increase is also very small. In addition, the proposed multidimensional temporal music recommendation model has the smallest RMSE index compared with the other four music recommendation algorithms.

From the above experimental results, it can be seen that the music recommendation algorithm based on multidimensional time-series analysis described in this paper can achieve better results than the reference algorithm, whether from the perspective of hit rate accuracy or from the perspective of prediction error. This verifies the rationality of the proposed method and shows that modelling user behavior as a multidimensional time series for analysis can achieve a comprehensive and detailed description of user behavior so as to improve the recommendation effect.

5. Conclusion

At present, music recommendation cannot fully consider the time correlation of users’ long-term behavior, medium-term behavior, and short-term behavior. This paper proposes a personalized music recommendation method based on multidimensional time-series analysis. This method uses topic model to represent songs as probability distribution composed of several hidden topics and models users’ behavior in the current session as a multidimensional time series. Through the analysis of the multidimensional time series, this method can better predict user behavior preferences and give reasonable recommendation results. In addition, this method gives a comprehensive music recommendation method based on users’ long-term, medium-term, and real-time behavior, which comprehensively considers the role and contribution of users’ long-term behavior, medium-term behavior, and real-time behavior to users’ future behavior. The experimental results show that the music recommendation algorithm based on multidimensional time-series analysis can achieve better results than the reference algorithm, whether from the perspective of hit rate accuracy or from the perspective of prediction error.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The author declares that there are no conflicts of interest or personal relationships that could have appeared to influence the work reported in this paper.