Abstract

The user data mining was introduced into the model construction process, and the user behavior was decomposed by analyzing various influencing factors through the factorization machine (FM) learning method. In the recommendation screening stage, the collaborative filtering recommendation is combined to screen the recommendation candidate set. The idea of user-based collaborative filtering (CF) is used for reference to obtain music works favored by similar users. On the other hand, we learn from item-based CF, which ensures that the candidate set covers user preference. Firstly, the user’s interest value is predicted by using dynamic interest model. Then, the common problems such as cold start and hot items processing are fully considered. The frequent pattern growth algorithm is compared with the association rule algorithm based on the collaborative filtering recommendation algorithm and the content-based recommendation algorithm, which proves the superiority of the algorithm and its role in solving the recommendation problem after applying the recommendation. The music data in the database data conversion effectively improve the efficiency and accuracy of mining. According to the implementation of the algorithm described in this article, the accuracy of the music recommendation results used to recommend user satisfaction is proved. And the recommended music is indeed feasible and practical.

1. Introduction

Data mining algorithms use the results of the analysis to define the best parameters for creating a mining model, which are applied to the entire dataset to extract viable patterns and detailed statistics. Data mining algorithms create mining models based on your data that can take many forms, including a set of classifications that explain how the stories in the data set are related, to predict the outcome and describe how different conditions affect the decision tree of the outcome, and how to improve the efficiency of analysis and calculation.

Recommendation needs to give a more appropriate result based on the time and location of users at that time, considering the recommendation of the scene [1, 2]. We review and summarize the principles and challenges of the recommendation system used in online music and look forward to some of the technologies that may be used to improve music recommendation results in the future [3, 4]. This paper studies the contribution of social network information in the recommendation system and how to protect privacy by treating social network as a distributed P2P network [5]. A large number of matrix decomposition methods applied in the recommendation system are studied [6]. How to apply deep learning to recommendation system and the status of content-based recommendation and collaborative recommendation under deep learning are studied [7, 8]. Combined with the weight assigned by users to calculate the similarity [9, 10] between the four factors of timbre, genre, rhythm, and emotion, the mixed distance function to calculate the similarity of songs was constructed by taking the weight coefficient and the attribute similarity as parameters to complete the recommendation [11, 12]. Firstly, acoustic features are extracted from the audio query, and music recommendation is completed by content-based collaborative filtering recommendation method. Research on personalized recommendation of music has also formed a series of scientific research results [13]. Acoustic spectrogram analysis is used to obtain the feature data matrix of music works, and the similarity of saved music and query data is calculated to achieve top-n recommendation [14]. By using the method of reference analysis and the differential strategy to complete the recommendation algorithm from different angles, various descriptions of musically related literature are analyzed to obtain their rules to complete the recommendation [15, 16]. Based on the bipartite graph with similar nodes and heuristic random walk, the sorting results of the fusion algorithm are used to complete the recommendation. At the same time, there have been extensive and in-depth studies on the computational optimization of the recommended algorithm, and a large number of scholars have studied the algorithm parallelization. Referring to map-Reduce programming ideas and based on Hadoop platform, the parallel computation of high-dimensional data mining clustering was improved and completed [17]. The whole process of e-commerce recommendation system integrates data mining framework into recommendation calculation process. Collaborative filtering recommendation is implemented on Hadoop platform, which alleviates the limitations of traditional collaborative filtering brought by complex computing in data mining environment. In view of the recommendation system’s inability to recommend a large number of users within seconds, the three main calculation stages of collaborative filtering algorithm were divided into four MapReduce processes, which improved the efficiency of recommendation calculation [18]. It improves the environment preparation, task assignment process, and communication mode between nodes during the execution of short jobs in Hadoop and greatly improves the computing efficiency of short jobs in Hadoop. As the deep learning hot rise, music recommendation algorithm based on content is a new development, and this paper proposes a new depth music recommendation method based on content, through the deep learning method, regression model, using convolution neural network training directly to analyze the content of the audio signal to predict results marked by experts [19, 20]. Recommendations based on collaborative filtering algorithm were predicted in the contest of Netflix’s recommendation system of music, which has a high degree of attention and development in the recommendations, puts forward the model of collaborative filtering algorithm based on matrix decomposition, and joins time dynamic factor improvement techniques [21, 22]. This consideration implicit feedback model reduced data sparseness; joining the time factor can grasp the user interest changes and the changes of music popularity and improve the accuracy and stability of the recommendation. In terms of music recommendation based on labels, there are two different approaches. Innovation has been made in the representation method of music labels. A mixed representation method has been proposed to strengthen sparse label representation without introducing content, and a dynamic weighting scheme has been introduced to limit the number of proposed labels [23, 24]. It is believed that the diversity of evaluation will lead to the risk of recommending music based on users’ evaluation of music. Therefore, a method based on social media labels to calculate the similarity between music segments for recommending music is proposed [25, 26], and it is proved that this method is superior to the recommendation only based on ratings. On the basis of the above method based on content and emotion, a personalized music recommendation system has been established, mixing it with that based on the content, based on the basis of the collaboration and the method based on emotion, by the user’s interest to calculate the weights of these methods, and to combine these methods, and also a combination of the user to log recommended users interested in music network [27, 28]. It combines music content and collaborative filtering, has set up a hierarchy music recommendation system, and puts forward the recommended level of [29], on the one hand, the process of music to recommend music of collaborative filtering recommendation to music preference similarity between users and users and, on the other hand, the similar process including emotion, rhythm and music content, tone, melody, and lyrics, a number of dimensions, to connect these two aspects, give full play to the advantages of both, and effectively improve the satisfaction of recommendation [3033]. In the era of data mining, poor scalability, traditional recommendation algorithm, a single recommendation algorithm, and data characteristics are difficult to capture the user’s personality, and the mainstream collaborative filtering algorithm has problems such as sparse data and ineffective utilization of massive multidata.

This paper analyzes the influence of context by combining traditional user interest acquisition method. The user data with the same interest pattern is grouped into one category through clustering. At the same time, it is necessary to study and improve the generalization performance of the model and prevent overfitting, which are the prerequisites for providing decision support for the recommendation part. In the recommendation generation stage, although the user’s dynamic interest model and recommendation candidate set obtained previously can be used to directly predict the user’s interest value, in view of the common problems in the recommendation, such as cold start and hot items, the predicted interest value can be adjusted by introducing the recommendation weight to optimize the recommendation results, so as to alleviate these problems. At the same time, FP-Growth algorithm is compared with the recommendation algorithm based on collaborative filtering and the recommendation algorithm based on content-based recommendation algorithm according to the evaluation criteria for comparative analysis and evaluation, and the superiority of FP-Growth algorithm is preliminarily verified.

2. Research on Key Techniques of Personalized Music Recommendation Based on Data Mining

Music recommendation algorithm classifies users in the massive user data (behavior records, etc.) and recommends music that other users like to the same group of users. In this way, it is necessary to classify music and establish detailed rules for rating, establish user model, find similar users, and classify and match songs based on user behavior data to achieve “blind listening.” High diversity can bring freshness to users. If you find a favorite song that you have never heard of before, it will bring a sense of surprise and arouse users’ positive emotions. Since the UGC playlist is created by many users, the UGC playlist has diversity. The combination of the two ensures the coexistence of accuracy and diversity. The low cost of consuming each song means that the user’s interests will be more varied and changeable with fewer limiting factors. Figure 1 shows the recommendation process.

This paper carries out personalized music recommendation based on data mining analysis, so, in the process of data acquisition, much data related to recommendation should be obtained as possible, which is the premise of data mining analysis. The key problem lies in the acquisition of characteristic data of musical works. The traditional acquisition method cannot meet the requirements of massive music due to too much computation and too high cost. In the process of data acquisition, this paper needs to study a feature extraction method suitable for sea volume music works, which needs to reduce the computation as much as possible on the basis of ensuring the quality of data acquisition results, and reduce the burden of data acquisition process of the system.

Data preprocessing is the work after data acquisition is completed. Preprocessing is to solve the data mining and collection of noise, lack of data and diversity, and other problems, and the most important is data standardization. User information, music information, behavioral information, and behavioral context information are data of different dimensions, which contain various types. Therefore, it is necessary to explore appropriate methods to convert data into the forms required for subsequent computation. After standardization, the data is still scattered in different dimensions, so it is necessary to combine the data into a whole in a reasonable form in combination with the subsequent data analysis process.

(1) User preference analysis is to obtain users’ interests and preferences. By establishing user dynamic interest model, users’ interests can be accurately grasped. In this stage, it is necessary to solve the problem that users’ preferences in traditional music recommendation are not accurate enough and users’ specific preferences for different points of interest cannot be obtained. Because of the massive existence of music works, the process of selecting recommendation candidate sets and generating recommendation results is carried out separately. (2) In the selection stage of the candidate set of recommendation results, the problem of repeated listening of music works should be solved, and the candidate set should be guaranteed to cover users’ interests comprehensively. (3) The process of data analysis and recommendation not only needs to meet the particularity of music recommendation, but also needs to integrate data mining and analysis technology into the calculation process, in which the objectives of each stage are different, and appropriate methods should be chosen to complete the task in this stage, rather than to obtain the recommendation results in a general way.

Aiming at the problem of feature extraction of music works, a low computation and simple audio feature extraction method is studied. Through the research of MIDI audio files, it is found that MIDI audio files contain audio characteristics such as pitch and length, which makes the extraction work extremely easy. In the data mining environment, a large amount of music works can save a lot of time and computation by using MIDI audio features compared with traditional methods. Table 1 shows the format of MIDI music works files.

Musical Instrument Digital Interface (MIDI) is proposed to solve the communication problem between electroacoustic instruments. MIDI is the most extensive music standard format in the composing world, which can be called “computer understandable score.” It uses the digital control signal of the notes to record music. A complete MIDI music is only dozens of KB in size and can contain dozens of music tracks. Almost all modern music is composed using MIDI and a sound library. MIDI transmits instructions, such as notes and control parameters, rather than sound signals, which tell the MIDI device what to do and how to do it, such as which notes to play and at what volume. They are uniformly represented as MIDI messages. The Baud rate of standard communication is 31.25 × (1 ± 0.01) KBAUD.

Pitch: if the number of tracks in a musical work is m(m ≥ 1), the extraction of pitch needs to analyze each track and define the pitch feature of the musical work as according to the corresponding attributes of the main track. In this paper, it is defined that the eigenvalue of a musical work is under the effect of the current pitch, and the calculation formula of its pitch is

The value formula of judging function H of sound length can be obtained from this, and the sound length of Felling is long:

Timbre: a piece of music has multiple tracks, with different timbre on different timbre. MIDI divides the timbre of musical works into 128 categories. Therefore, the acquisition of timbre is more convenient. In this paper, based on the types of timbre, the acquisition method of defining timbre characteristic function T is shown in formula (3). The timbre of Felling is a British classical numbered 19, so its timbre characteristics are 19:

Data analysis and recommendation are the core of the whole recommendation process. Based on the traditional recommendation method, this paper introduces data mining analysis technology and divides the whole process into three stages: user preference analysis, selection of recommendation candidate set, and generation of recommendation results. Compared with traditional methods, it is more intelligent in data analysis and recommendation through implicit feedback behavior data in music recommendation system. Explicit feedback is mainly about the behavior of the user to show and express his preference for music, such as the user’s collection and purchase of music. In contrast, it is implicit feedback information that does not directly represent users’ preferences. In the music personalized recommendation system, users mainly listen to the frequency and duration of music works. If users have a high frequency of listening to a certain type of music, it indicates that this user has a preference for this type of music. The comparison results of the two are shown in Table 2.

First, the data is irrelevant to interest in the user behavior data, because the final data merge needs to be based on the behavior data, so as to ensure that the data completed with preprocessing does not contain noise. Behavioral data is extracted from the system log, so the acquired data is filtered, and only users’ listening, cutting, buying, collecting, scoring, and sharing behaviors are retained to complete the cleaning work.

We combined that with data mining processing to study more suitable methods for music recommendation needs. For example, if user U1’s rating of music work S1 is missing, the average rating of other users of music work S1 can be used to supplement it. Using this method to supplement the incomplete data is fast and effective, and it will not change the estimation of the mean value of this variable. For nonnumerical data, the processing of incomplete values depends on the mode principle in statistics, and the most frequently occurring values of this variable are usually used to fill in. For example, the missing location information of a behavior-context data of user A can be replaced by the user’s common location.

Data standardization is the conversion of all data into computable values. The obtained user data is in the form of D = {Dl, D2, D3, D4}, where D1 represents user information, D2 represents music works information, D3 represents users’ behavior information on music, and D4 is the context information when the behavior is generated. The format of the original data is shown in Table 3.

In a multidimensional space, points X and Y represented by the data have a high similarity, whereas, on the contrary, the similarity is low. If the distance between them is similar, the calculation of X-Euclidean distance is shown in

Cosine similarity calculation method is more about the differences between individual vectors and higher dimensional data, but the purpose of clustering is similar user data divided into the same class, the similarity calculation for the whole data, so the user in the similarity calculation of clustering data as a multidimensional space of points uses the Euclidean distance method for calculation. The similarity between behavioral data is shown in Table 4.

User information after processing is expressed as x, where x is for the user to express the behavior of values to be fond of, vi is the user of this behavior in the case of a feature I value, and FM method is to use the user data by learning to obtain a prediction function, which is the dynamic user interest model.

3. Personalized Music Recommendation Based on FP-Growth Data Mining Algorithm

FP-growth algorithm based on recommendation is based on the current user’s music record, according to the establishment of the minimum support and minimum confidence mining strong association rules. Unheard music is calculated based on music correlation, and the rules of listening are found to form a knowledge base. When a user listens to X and Y recordings, Y will not be recommended to him; the algorithm flow chart is shown in Figure 2.

It can be seen from Figure 3 that FP-growth algorithm in diversity and various aspects based on collaborative filtering recommendation algorithm has more advantages, mainly because the FP-growth algorithm is for mining association rules of the music recommended, and not only listening to collect the user long-term history, but also recommendation based on the current user interest; therefore, recommendations to the user’s songs are selected from all songs in the library for the user to listen to the choice of most parts of the song, and it can ensure the precision of music recommendation and also can make the recommended songs richer and diversified in characteristics, as well as retaining freshness.

Content-based recommendation is mainly based on the music characteristics of music to recommend music, pitch, beat, rhythm, and other information that are the audio characteristics of the song itself; these elements belong to the objective reflection of the song. The four elements of music include melody, rhythm, timbre, and harmony. Melody is the sound movement track formed by the arrangement and combination of different high and low notes. It can express sad or happy emotions and is the most important element. Rhythm is the arrangement and variation of the time value of a sound. Timbre is a means of musical expression, which can distinguish between the vigorous male voice and the gentle female voice. Harmonies sound simultaneously in music, but they are in harmony with each other. The composition of multiple parts is used to enrich the expression of music. Multifunction personalized music recommendation system with recommendation engine is the most important part of the system; namely, the realization of the system logic structure layer distributed recommendation algorithm can also be understood, as the system structure includes a Spark of the HDFS distributed cluster, embedded in the study based on collaborative filtering and music recommendation algorithm based on spatial embedding parallelization model list of music recommendation algorithm implementation.

4. Example Verification

The experiment used the open Million Song Dataset as the experimental data. The Million Song Dataset contains many dimensions of data, which is extremely effective in verifying personalized recommendation methods of music. It also provides a subdataset that contains data of the behavior of about 13,490 users toward 150,000 music tracks, as well as information about those users and music works. In this experiment, four subsets of data, links, SecondSongs, Users, and behavior-context are formed by screening and cutting part of the data, where links are the data association file.

As can be seen from Figure 4, when the quantity is less than 150, the convergence speed of the personalized recommendation method proposed in this paper is slightly slower than that of the traditional user and item-based CF method, because the traditional distributed computing itself will also bring some computing overhead. It executes faster than traditional User-based CF and Item-based CF methods when the user size is greater than 150. And with the continuous expansion of data volume, the advantages of our proposed method in computing speed become more obvious.

Figure 5 shows the accuracy comparison of the recommended results of the three methods. As can be seen from Figure 5, in terms of accuracy of recommendation results, the recommendation method proposed in this paper has high accuracy and obvious advantages. Especially when the user’s music data reaches a certain level, the recommendation accuracy will also be increased by about 5%. At the same time, with the increase of the number of users’ music, the accuracy of recommendation results also gradually and steadily increased, finally reaching more than 30%.

Figure 6 is a comparison chart of the recommended coverage of the three methods, personalized recommendation, Item-Based CF, and User-Based CF. According to the data in the figure, the coverage of the three methods decreases gradually with the increase of the number of users. However, this paper proposes that the recommended method is obviously superior to the traditional User-based CF and Item-based CF in coverage under any circumstances.

We experiment the average 800 users into two groups: the first set of A is 400 users, according to their favorite songs list as A training set, and the other 400 users are group B, listening to records of hybrid as A test set, to the user group B for mining association rules in A strong music recommendation, and recording them to recommend effect in the lyrics, melody, novelty, rhythm, and emotional aspects of satisfaction, and satisfaction experiment is divided into not satisfied, general, satisfied, and very satisfied with four levels, and the experiment of the measurement is shown in Figure 7.

Figure 7 shows that users in general and to the satisfaction degree of the recommended songs above rate reached 75% or more, in the music more emotional control and precision, and the causes of this phenomenon are recommended strategy to consider the user to listen to recorded songs representing emotional comparison and brightness, which can accurately determine the same types of songs and push to the user, thus user’s emotional resonance, such as when a user has types of songs, listening to melancholy system can also belong to this type of songs in the top ranking music or have a similar music taste in the user’s songs to recommend to the user. But this dimension is on the lowest satisfaction on novelty and is underwhelming, mainly because the recommendation strategy only considers the existing user preferences. It is difficult to recommend fresh music to the user, to provide music limited to users themselves who are in love with the circle of music, which cannot be extended to other music styles. In the graph, the higher the column, the higher the satisfaction. The blue bar shows the time consumption of the project-based collaborative filtering algorithm, the orange bar shows our relatively satisfied satisfaction, and the yellow bar shows the general satisfaction of the improved algorithm. As can be seen from the figure, for each data set, the larger the data set of the improved algorithm is, the more obvious this gap is. It can be seen that the improvement of our algorithm has achieved obvious results.

In Figure 8, with the increase of the number of recommendations, the accuracy of the model decreases. Spatial data mining model, using spatial association rule discovery, spatial classification discovery, spatial clustering discovery, spatial data summary mining methods, and spatial sampling data processing methods are to mine personal music preferences, according to the behavior of music preferences, the establishment of users, preferences and other personalized labels, and then the establishment of label model. The accuracy of the tag and space model is higher than that of the single song model, and the personalized recommendation algorithm of data mining has higher accuracy and is more stable.

5. Conclusion

This paper analyzes the needs of personalized music recommendation system, verifies the feasibility and effectiveness of the system through testing, compares the recommendation algorithm based on collaborative filtering and the recommendation algorithm based on music content, and evaluates the system using evaluation criteria. At the same time, a personalized music recommendation system based on data mining is designed, which realizes the function of user song recommendation and list recommendation based on the distributed platform, and considers the user’s long-term interest and short-term preference demand. Through the clustering analysis of user attribute information, the correlation between users is found, the users are grouped, and each group of users is trained by data mining model, which improves the accuracy of recommendation and solves the problem of cold start of new users.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no known conflicts of interest or personal relationships that could have appeared to influence the work reported in this paper.