Abstract

With the advent of the era of big data, data mining has become one of the key technologies in the field of research and business. In order to improve the efficiency of data mining, this paper studies data mining based on the intelligent recommendation system. Firstly, this paper makes mathematical modeling of the intelligent recommendation system based on association rules. After analyzing the requirements of the intelligent recommendation system, Java 2 Platform, Enterprise Edition, technology is used to divide the system architecture into the presentation layer, business logic layer, and data layer. Recommendation module is divided into three substages: data representation, model learning, and recommendation engine. Then, the fuzzy clustering algorithm is used to optimize the system. After the system is built, the performance of the system is evaluated, and the evaluation indexes include accuracy, coverage, and response time. Finally, the system is put into a trial operation of an e-commerce platform. The click-through rate and purchase conversion rate of recommended products before and after the operation are compared, and a questionnaire survey is randomly launched to the platform users to analyze the user satisfaction. The experimental data show that the MAE of this system is the lowest, maintained at about 0.73, and its accuracy is the highest; before the recommended threshold exceeds 0.5, the average coverage rate of this system is the highest: 0.75; in Q1–Q5 subsets, the shortest response time of the system is 0.2 s. Before and after the operation of the system, the average click-through rate increased by 11.04%, and the average purchase rate increased by 9.35%. Among the 1216 users, 43% of the users were satisfied with 4 and 9% with 1. This shows that the system algorithm convergence speed is fast; it can recommend products more in line with user needs and interests and promote higher click-through rate and purchase rate, but user satisfaction can be further improved.

1. Introduction

1.1. Background Significance

The information of our times is expanding unprecedentedly, and all kinds of information are dazzling. Faced with various problems brought about by huge information, personalized intelligent recommendation system came into being. The core of recommendation system is recommendation algorithm, which determines the effect of recommendation [1]. Data mining integrates the theory and technology of many fields and has been widely used in various industries [2]. All kinds of massive data from the Internet pose new challenges to data mining technology [3]. The application of intelligent recommendation system to data mining technology is of innovative and practical significance, which can provide more targeted and intelligent information for people.

1.2. Related Work

Intelligent recommendation system is widely used in video websites and e-commerce platform, so the relevant research results are also relatively more [4]. Li proposed a new model, which uses social network and mining user preference information expressed in microblog to evaluate the similarity between online movies and TV dramas [5] and uses a series of data mining methods and social computing model [6]. Yang proposed a solution based on hybrid recommendation algorithm, including content-based recommendation algorithm, item-based collaborative filtering recommendation algorithm, and demography-based recommendation algorithm [7]. In order to expand the recommendation dimension, he uses classification clustering algorithm to mine the historical data of items and users. Lu trains the network to achieve the required accuracy. Then, redundant connections in the network are removed by network pruning algorithm, the activation value of hidden units in the network is analyzed, and classification rules are generated according to the analysis results [8]. Angeli elaborated and explained some key issues of data mining in educational technology classroom research, investigated students’ learning behavior and experience in computer supported classroom activities, and used fuzzy representation to summarize questionnaire data [9]. His research provides data support for the application of data mining technology in the field of education, but the number of training samples used in his experiment is too small, which will have a certain impact on the mining effect.

1.3. Innovative Points in This Paper

In order to improve the efficiency of information processing and the quality of data mining, more personalized information recommendation services should be provided for people. Based on the related algorithms of mathematical modeling intelligent recommendation system, this paper makes an in-depth study on personalized data mining. The innovations of this study are as follows: (1) based on J2EE technology and association rule algorithm, an intelligent recommendation system is constructed. The system architecture includes the presentation layer, business logic layer, and data layer. The system recommendation module includes three substages: data expression, model learning, and recommendation engine. (2) The fuzzy clustering algorithm is used to optimize the recommendation system and improve the confidence degree of the fuzzy clustering algorithm. Then the fuzzy clustering of items and users is established, respectively. (3) The prediction accuracy, coverage, and response time of the system are tested. The data show that the system has high accuracy and coverage and short response time. (4) After the system is put into operation, it can recommend products that are more in line with the needs and interests of users by comparing the click-through rate and purchase conversion rate before and after the operation.

2. Intelligent Recommendation System of Mathematical Modeling and Personalized Data Mining

2.1. Composition and Structure of the Intelligent Recommendation System
2.1.1. Common Methods of the Intelligent Recommendation System

The recommendation system based on demography is easy to implement. It can discover the correlation between users according to the demographic characteristics so as to predict the interests and preferences of users and recommend resources with similar preferences to the target users. The method will not involve the historical data of current users’ preference for resources, nor will it involve the information of resources themselves. However, it has the disadvantage of too coarse recommendation granularity, and the collected information may be false, which will affect the prediction results.

The content-based recommendation system obtains the user’s interest preference by analyzing the user's use or viewing history and then compares the similarity between the user’s interest description and the resource content and sorts the resources to recommend [10]. The system also adjusts and optimizes the user’s interest description according to the user’s feedback on the recommended resources.

The recommendation methods based on collaborative filtering can be divided into two types: user-based and project-based. Based on the user’s needs, the data expression is used to deal with the modeling problem between the user and the resource, and then the neighbor users are calculated based on the similarity of the user’s behavior. Finally, the resource with the highest evaluation is found from the neighbor users and recommended to the current user [11]. The project-based type is to use the matrix of analyzing users and resources to calculate the relationship between resources, so as to generate recommendations. The recommendation method of collaborative filtering faces the problems of cold start of new users, the neglect of new project resources, and data sparsity.

2.1.2. Composition of the Intelligent Recommendation System

The intelligent recommendation system consists of three modules: input, recommendation, and output. The input module is mainly responsible for collecting, sorting, and updating user information. The information content includes user's personal information, implicit browsing information, rating information, search keyword information, purchase history information, and expert information [12].

The recommendation module uses the appropriate recommendation algorithm to process and analyze the input information and finally produces the recommendation results. This module directly determines the recommendation quality of the first mock exam system. Therefore, different algorithms will be adopted in different actual situations, and specific problems should be analyzed.

The output module will sort the recommended content according to the user’s interest, and the final output will be provided to the user. There are different forms of output. The common output methods are product list, user evaluation and rating, e-mail, and expert introduction. Different output modes will reflect different emphasis.

2.1.3. Evaluation Criteria of the Intelligent Recommendation System

The accuracy of recommendation system is different in different types of systems. For example, in a product recommendation system, accuracy is the ratio of the number of products recommended and purchased by the system to the total number of products in the recommendation set [13], as shown in the following formula:where represent the recommendation set and purchase set, respectively. The calculation formula of the coverage rate of the commodity recommendation system is as follows:

The accuracy rate can describe the accuracy of the recommendation set of the system recommendation engine, while the coverage rate shows the ability of the recommendation engine to be purchased by users.

The diversity of the recommendation system is calculated by the similarity of the recommended resources obtained by users. The greater the similarity, the worse the diversity of the system. Let be the recommendation set provided to user ; then, the definition of diversity is shown in the following formula:where is the similarity of resource and is the length of recommendation list. Therefore, the definition of diversity of the whole recommendation system is shown in the following formula:

2.2. Intelligent Recommendation Algorithm
2.2.1. Association Rule Algorithm

The most classic association rule mining algorithm is Apriori algorithm, which adopts a cyclic method of hierarchical order search [14]. Then strong association rules are generated from frequent itemsets to find the confidence threshold that meets the user’s requirements. The confidence level of rule in the project set is recorded as follows:

The confidence degree is used to measure the credibility of association rules. The high confidence degree proves that it is easier to attract users’ interest to change association rules.

The implementation of Apriori algorithm is very simple, but it needs to scan the database whenever a candidate set with different number of itemsets is generated. When the size of the candidate set is too large, the algorithm takes a long time. In addition, due to the increasing data in the transaction database, each time the data is added, the two tasks of generating association rules from the frequent itemsets calculated by the algorithm need to restart the database after the new data is added, which is not conducive to the effective discovery of relevant rules [15]. This algorithm is suitable for single-dimensional transaction databases but is not suitable for storing multidimensional datasets.

2.2.2. Collaborative Filtering Algorithm

Collaborative filtering algorithm based on articles is often used in e-commerce recommendation system. The algorithm needs to calculate the similarity between items and generate recommendation list by using user’s purchase records [16]. An important step in item-based collaborative filtering algorithm is to find other items with high similarity to one item. There needs to be an appropriate method for measuring the similarity between items. The column vector of evaluation matrix is usually used to calculate the similarity between items. The common similarity measurement methods include vector cosine, Pearson correlation, and corrected vector cosine.

Set the threshold ; when the similarity between an item and item exceeds the threshold, the article is put into the nearest neighbor set of article , as shown in the following formula:

Once the nearest neighbor set is confirmed, the weighted sum of user ’s scores on these nearest neighbor items can be calculated to obtain the predicted score of user on item , as shown in the following formula:

The collaborative filtering algorithm based on articles can deal with the situation that the number of users is greater than the number of items in e-commerce websites and shows good recommendation quality. When the number of items is small, offline mode can be used to reduce the workload [17]. However, collaborative filtering algorithm also has some shortcomings, such as cold start problem in the face of new users and being unable to recommend; the other is the problem of data sparsity, which will directly affect the accuracy of nearest neighbor set construction.

2.2.3. Fuzzy Clustering Algorithm

The basic steps of fuzzy clustering include data standardization, establishing similarity matrix, and fuzzy clustering [18]. Data standardization will map the data to the interval , which is transformed by variance method or standard deviation. The establishment of fuzzy similarity matrix is commonly used in Euclidean distance, Manhattan distance, Mahalanobis distance, and angle cosine [19].

Euclidean distance is derived from the calculation of the distance between two points in geometry, and its calculation is shown in the following formula:

Manhattan distance is not a straight line but a broken line distance in the plane. Its calculation is shown in the following formula:

The Mahalanobis distance has nothing to do with the dimension, so the correlation interference between variables can be eliminated. The calculation is shown in the following formula:

The calculation of included angle cosine is shown in the following formula:

In the whole universe , samples are divided into several disjoint subsets, and the subsets satisfy the following formula:where . The membership relationship of any sample to any subset is shown in the following formula:

Fuzzy clustering divides the sample set into fuzzy subsets and extends the membership of samples to the interval from the binary form of 0 or 1. For such a sample , its membership must satisfy the following formula:

This kind of clustering is called fuzzy clustering.

2.3. Personalized Data Mining
2.3.1. Data Mining Function

The essence of data mining is to mine predictive knowledge in large-scale data, including generalized knowledge, association knowledge, classification and clustering knowledge, predictive knowledge, and bias knowledge [20]. The main functions of data mining include concept description, association analysis, classification analysis, clustering analysis, outlier analysis, and time series analysis [21].

The concept description can summarize and compare the data and give the overall description. It is commonly used in statistical database business data, including mean and variance. Association analysis is carried out in massive data, and the association relationship behind the data can be found out, and then more advanced prediction can be carried out. Classification analysis classifies and models the whole data, while cluster analysis gathers things with the same similarity.

Outlier analysis is used to analyze outlier data that are different from conventional data. Although the number of isolated points is small, the information they carry is very important and cannot be ignored. The data of time series analysis include fixed interval value and dynamic interval value [22, 23]. Its main functions include similarity search, pattern mining, and trend analysis.

2.3.2. Data Mining Process

Data mining is an iterative process of human-computer interaction, which is mainly divided into four parts: problem definition, data sorting, data mining implementation, and interpretation and evaluation of mining results [24]. The purpose of problem definition is to have a clear understanding and definition of mining target.

Data consolidation includes data selection, preprocessing, and reduction. Data selection needs to select samples or data according to the defined problem requirements to determine the target data. Data preprocessing includes checking the integrity and consistency of data and eliminating data noise. If data redundancy occurs, it is necessary to clear and fill in missing data. Data reduction is to reduce the amount of data through projection or other operations and filter out task-related datasets.

The implementation of data mining requires the use of data mining technology and algorithms, mining in the dataset, and finding out useful related information and expressing it. Finally, it is necessary to explain the rationality and evaluate the value of the information. If the information is redundant or less relevant, it should be eliminated.

2.3.3. Technology of Data Mining

The common methods of classification mining include Bayesian classification, decision tree, and support vector machine [25]. Bayesian network is an important technology in data mining, which can easily use graphical patterns to display the causal relationship of time and can also be used for predictive analysis [26]. The conditional probability, joint probability, and total probability formula will be used in the use of Bayesian networks, as shown in the following formulas:where are all events and . According to the above three formulas, Bayesian formula can be deduced, as shown in the following formula:

Bayesian formula is the basis of Bayesian network learning and prediction.

Clustering technology includes traditional pattern recognition methods and mathematical taxonomy, while clustering analysis of data mining includes system clustering, decomposition, addition, and fuzzy clustering [27]. Even for the same record set, different clustering methods will produce different clustering results.

3. Experiments on Construction and Application of Mathematical Modeling Intelligent Recommendation System

3.1. Modeling of the Intelligent Recommendation System Based on Association Rules
3.1.1. Demand Analysis

In practical application, intelligent recommendation system must be able to provide users with real-time, dynamic, and accurate services. Real-time service requires that recommendation algorithm has speed advantage in data mining. Especially in the case of large datasets, online recommendation generation has high requirements for system method running memory. Dynamic service is to ensure that the collection of recommendations can not only reflect the latest needs of users but also pay attention to real time, and if the time interval of recommendation to users is too short, the speed of online recommendation will decline. Accurate service requires that the system can accurately predict the needs of users. In the actual situation, sometimes accurate service will be sacrificed for real-time service, so the improvement of accuracy depends on the improvement of algorithm. This paper takes the intelligent recommendation system of e-commerce as an example to analyze the data mining.

3.1.2. Technology Choice

Using Java 2 Platform, Enterprise Edition (J2EE), technology, the system is divided into customer layer, presentation layer, business logic layer, and data layer. The core design is business logic layer, which is implemented by Enterprise Java Bean (EJB) component. Java Server-Side Page (JSP) technology provides powerful built-in components, which can simplify the program design, and its access to the database can also ensure the portability of the program. Therefore, J2EE architecture has the advantages of high execution, easy-to-use script language, ability to deal with a large number of concurrent users, logic of managing complex things, easy division of development projects, simple component deployment, and maintenance of client applications.

3.1.3. System Design

Combined with J2EE technology, the structure of e-commerce intelligent recommendation system can be divided into three parts: presentation layer, business logic layer, and data layer. The recommendation module of the system is divided into three substages: data representation, model learning, and recommendation engine. The module framework is shown in Figure 1.

3.2. System Improvement Based on Fuzzy Clustering Algorithm
3.2.1. Improved Clustering Algorithm

The membership degree of the traditional fuzzy clustering algorithm is improved. The element , whose nearest cluster is , has the largest membership degree in , and its value is . The second nearest cluster is and the membership degree is . Then, the membership degree of element to clusters and is calculated as shown in the following formulas:

Among them, is an attractive inhibitory factor.

In the improved algorithm, we can adjust the size of to control the size of inhibition and then control the convergence speed of the calculation.

3.2.2. Establishing Fuzzy Clustering

The improved fuzzy clustering algorithm is used for all the data, and the fuzzy clustering of users and items is established, respectively. After successful establishment, the nearest neighbor user set can be constructed by using other users in the target user-based clustering. The higher the membership degree is, the more similar the user is to the target user.

3.3. System Test and Operation
3.3.1. Selection of Test Data

This paper selects the historical data of an e-commerce platform in the database for nearly two years and then takes them as test data after effective screening and filtering. The dataset contains the basic information of 500 users, 2000 shopping records, 800 e-commerce stores, and 3000 pieces of shopping evaluation information.

The dataset is randomly divided into five similar subsets, and cross validation method is used to compare the intelligent recommendation system constructed in this paper with the traditional content-based recommendation system, traditional association rule recommendation system, and collaborative filtering recommendation system, and the performance of the system is analyzed.

3.3.2. Test and Evaluation Index

Firstly, the prediction accuracy is selected as the evaluation index of the test method in this paper. The difference between the target user’s scoring system of the commodity and the real score in the test dataset is the absolute average error (MAE). The smaller the value, the more accurate the prediction of the system.

The average coverage of different recommendation systems is compared, and the coverage is calculated by formula (2) in Section 2. The last test parameter is the response time of the system, that is, the running time required for the recommendation system to generate the recommended result set.

3.3.3. Actual Operation

The intelligent recommendation system is tested in an e-commerce shopping platform, and its application effect in personalized data mining is analyzed. The final result is the change of click-through rate and purchase rate of recommended products. At the same time, an online questionnaire survey was launched to attract users of the platform to participate with shopping vouchers as gifts to ensure the participation rate. The content of the questionnaire survey mainly includes users' satisfaction with the products recommended by the improved platform.

4. Discussion on Application Effect in Personalized Data Mining

4.1. Evaluation Results of the Recommendation System

The test dataset is randomly divided into five subsets, named Q1–Q5. Then, the intelligent recommendation system and the traditional content-based recommendation system, traditional association rule recommendation system, and collaborative filtering recommendation system are used to run the five subsets, and the performance test is carried out. In order to facilitate the recording and sorting of data, the above four systems are named A, B, C, and D, respectively.

4.1.1. Prediction Accuracy of Different Recommendation Systems

The prediction accuracies of the intelligent recommendation system (A), the traditional content-based recommendation system (B), the traditional association rule recommendation system (C), and the collaborative filtering recommendation system (D) for Q1-Q5 datasets are compared, and the difference of absolute average error (MAE) is analyzed.

As shown in Table 1, different precision will appear in the same recommendation system in different subsets. In the same subset, different recommendation systems show different precision. The accuracy of each recommendation system in different subsets is analyzed.

As shown in Figure 2, the most accurate is the intelligent recommendation system created in this paper, and its MAE value always remains between 0.72 and 0.74. The same subsystem was compared with different prediction accuracies.

As shown in Figure 3, the highest MAE is that of the traditional content-based recommendation system, which is maintained at about 0.84, so the accuracy is lower. The lowest MAE is that of the intelligent recommendation system, which is maintained at about 0.73, with the highest accuracy.

4.1.2. Coverage of Different Recommendation Systems

The average coverage of four recommendation systems in five data subsets was calculated, and their changes under different thresholds (0.1, 0.2, 0.5, 0.7, and 0.8) were compared.

As shown in Table 2, with the increase of the recommendation threshold, the coverage rate of each recommendation system decreases. In particular, when the recommended threshold value is 0.8, the coverage rate of the intelligent recommendation system reduces to the lowest, which is 0.26. The change trend is shown with more intuitive graphics and analyzed.

As shown in Figure 4, before the recommendation threshold exceeds 0.5, the average coverage of the intelligent recommendation system created in this paper is always the highest, which is 0.75, 0.71, and 0.68, respectively. However, after the recommendation threshold exceeds 0.5, the average coverage of the intelligent recommendation system created in this paper has decreased significantly. However, the traditional content-based recommendation system with low coverage has the highest coverage rate of 0.35 when the threshold value is 0.8.

4.1.3. Response Time of Different Recommendation Systems

Record the running time of the recommendation system to generate the recommended result set in each subset, and test its response time.

As shown in Table 3, the same recommendation system has different response times in different data subsets. In the same data subset, the response time of different recommendation systems is more different. Furthermore, the response time of each recommendation system in different subsets is analyzed.

As shown in Figure 5, in Q1-Q5 subsets, the response time of the intelligent recommendation system created in this paper is always the shortest, which is 0.24 s, 0.21 s, 0.24 s, 0.2 s, and 0.23 s, respectively. The longest response time of the traditional content-based recommendation system in the Q3 subset is 0.45 s. This shows that the algorithm of the intelligent recommendation system established in this paper has fast convergence speed and short response time.

4.2. Operation Results

After testing, the intelligent recommendation system is tested in an e-commerce shopping platform, and its application effect in personalized data mining is analyzed. This paper analyzes the click-through rate and purchase rate of the recommended products in a week before and after use.

As shown in Table 4, before the system runs, the average click-through rate is 78.87%, and the average purchase rate is 13.91%. One week after the system was running, the average click-through rate was 89.91%, and the average purchase rate was 23.26%. The system before and after the operation of the commodity click rate is used for a detailed comparison.

As shown in Figure 6, a week before the intelligent recommendation system runs, the click-through rate of commodities shows a relatively stable fluctuation. The highest hit rate was 81.24%, and the lowest was 75.51%. In the week after the system was running, the click-through rate first showed an increasing trend and gradually stabilized after the fifth day, and the highest was 93.94% on the seventh day. This shows that the intelligent recommendation system can create a higher click-through rate of products, so that more users can see the products. Figure 6 makes a detailed comparison of the commodity click through rate before and after the system operation.

As shown in Figure 7, the purchase rate of goods also showed a relatively stable fluctuation in a week before the system was running, with the highest purchase rate on the 6th day, 14.85%, and the lowest on the 7th day, 13.17%. In the week after the system operation, the purchase rate took the lead in the trend of increase and gradually stabilized after the fifth day, with the highest of 25.81% on the 7th day. This shows that the intelligent recommendation system can recommend products that meet the needs and interests of users and promote higher purchase rate.

4.3. Survey of User Satisfaction

After one week of operation, online questionnaire survey was launched randomly for users of the platform, which mainly investigated the satisfaction of users with the recommended products of the improved platform. The questionnaire lasted for five days, and the total number of users involved was 1216. The users are divided into 1–5 grades by age; the larger the number, the higher the satisfaction.

As shown in Table 5, among the users who participated in the survey, 411 users were between 20 and 25 years of age, and 39 users were over 45 years of age. The satisfaction of different age groups was analyzed.

As shown in Figure 8, the number of users below 35 years of age with satisfaction of 4 is the largest, accounting for 94.9% of all users with satisfaction of 4. Among the users over 35 years of age, the number with satisfaction of 5 is the most, accounting for 41.9% of all users with satisfaction of 5. Leaving aside the condition of age, this paper analyzes the satisfaction distribution of all users.

As shown in Figure 9, of the 1216 users, 43% of the users are satisfied with 4, 19% are satisfied with 5 and 3, and 9% and 10% are satisfied with 1 and 2. This shows that although the user satisfaction of the system is high, it can be further improved.

5. Conclusions

There are three kinds of recommendation systems: demographic-based, content-based, and collaborative filtering-based. The intelligent recommendation system consists of three modules: input, recommendation, and output. Common recommendation algorithms include association rules, collaborative filtering, and fuzzy clustering. Data mining is an iterative human-computer interaction process, mainly including problem definition, data collation, data mining implementation, and interpretation and evaluation of mining results.

This paper constructs an intelligent recommendation system based on J2EE technology and association rules algorithm. The fuzzy clustering algorithm is used to optimize the recommendation system and improve the confidence degree of the fuzzy clustering algorithm. Then the fuzzy clustering of items and users is established, respectively. The prediction accuracy, coverage, and response time of the system are tested. The click-through rate and purchase conversion rate are compared before and after the system running. The results show that the system algorithm has fast convergence speed and high accuracy and coverage. It can recommend products that meet the needs and interests of users and promote higher click-through rate and purchase rate.

The experimental results show that the user satisfaction of this system is still insufficient, which can be further improved. Therefore, in the following research work, we should take improving user satisfaction as the main goal. In addition, during the trial operation, the user survey should be carried out many times to make the data more representative. We also need to analyze the reasons for the lack of customer satisfaction in detail and get specific feedback from users.

Data Availability

The data underlying the results presented in the study are available within the manuscript.

Conflicts of Interest

The author declares that there are no conflicts of interest.