Abstract

With the rapid development of information technology and globalization of economy, financial data are being generated and collected at an unprecedented rate. Consequently, there has been a dire need of automated methods for effective and proficient utilization of a substantial amount of financial data to help in investment planning and decision-making. Data mining methods have been employed to discover hidden patterns and estimate future tendencies in financial markets. In this article, an improved macroeconomic growth prediction algorithm based on data mining and fuzzy correlation analysis is presented. This study analyzes the sequence of economic characteristics, reorganizes the spatial structure of economic characteristics, and integrates the statistical information of economic data. Using the optimized Apriori algorithm, the association rules between macroeconomic data are generated. Distinct features are extracted according to association rules using the joint distribution characteristic quantity of macroeconomic time series. Moreover, the Doppler parameter of macroeconomic time series growth prediction is calculated, and the residual analysis method of the regression model is used to predict the growth of macroeconomic data. Experimental results show that the proposed algorithm has better adaptability, less computation time, and higher prediction accuracy of economic data mining.

1. Introduction

With the continuous improvement of China’s market economic system, the prediction of social and economic development has become one of the hot issues. To develop effective implementation policies and plans, relevant research institutions must make accurate forecasts of future macroeconomic trends, based on which to decide whether the size of the economy needs to be stimulated or suppressed [1]. But future macroeconomic trends, influenced by many factors within and around them, are nonlinear and complex systems that change in real time. To ensure scientific and precise financial management and optimize the allocation of social resources, it is necessary to make effective and accurate predictions of financial revenue [2].

Many financiers use various sources of information to forecast target values and make strategies to achieve superiority in competitions. Earlier studies were based on fundamental and technical analysis, such as assessing the value of a company using its performance or estimating future target values based on historical data. Recently, owing to the advancement in computing, different data mining models are proposed. Since the financial markets produce big data in real time, an advantage in competition is the capability to use different modes of data in real time and depict the financial market in the form of data-driven guidelines that are deliverable to business decision-makers [3]. Therefore, it is essential to develop a standard process capable of deriving data-driven insights that are directly related to performance in the financial markets.

Data mining techniques can uncover hidden patterns and predict future trends and behaviors in financial markets. They create opportunities for companies to make proactive and knowledge-driven decisions to gain a competitive advantage. Data mining has been applied to many financial applications, including the development of trading models, investment selection, loan assessment, and portfolio optimization. The competitive advantages achieved by data mining include better revenue, reduced cost, and improved marketplace awareness.

Recently, several studies conducted on data mining frameworks have been proposed for economic growth prediction. Jiao et al. [4] proposed a macroeconomic forecasting algorithm based on an optimized wavelet neural network. Different from the existing backpropagation (BP) neural network prediction model, the wavelet neural network model was used for macroeconomic prediction. In addition, the wolf colony algorithm was applied to optimize the weights of the wavelet neural network model. The normalized economic data was used to examine and train the proposed model. Changsheng et al. [5] combined the improved whale optimization algorithm (IWOA) with adaptive noise-based complete set empirical mode decomposition (CEEMD) and Elman neural network prediction models to depict the stock fluctuation trend. Initially, the Internet public opinion of 180 shares in SSE was mined and quantified by text mining techniques, and the Boruta algorithm was used to screen important attributes and reduce the complexity of the attribute set. Next, the CEEMD algorithm was employed to add a certain amount of white noise with specific variance to the attribute set to realize the decomposition and noise reduction of the attribute sequence. At the same time, the adaptive weight was used to improve the whale optimization algorithm (WOA) to enhance its global search and local mining ability. Finally, WOA is used to optimize the initial weights and thresholds of the Elman neural network. The author in [6] estimated the stock market crisis using logistic regression (LR) and extreme gradient boosting. In some studies, linear models in time series prediction, such as autoregressive integrated moving average (ARIMA), and the autoregressive moving average (ARMA) are combined with data mining methods to predict the stock market. Lin and Pai [7] combined support vector machine (SVM) with ARIMA to forecast the stock price. The combined model enhanced the prediction accuracy by employing the distinct features of linear and nonlinear structures. Likewise, Pan et al. [8] predicted three well-known time series datasets using ARIMA and an artificial neural network (ANN) hybrid model and extracted prominent patterns from time series data. Mostafa [9] combined generalized regression neural network (GRNN) and multilayer perceptron (MLP) that avoid predicting many hidden units to forecast price and made perfect forecasts and better understood the inner dynamics of the stock market in Kuwait. A heterogeneous model based on BP neural network and ARIMA was applied to the Growth Enterprise Market of the Chinese stock market [10]. They verified the strength and generalization of their model. Similarly, Preis and colleagues [11] recognized online precursors of stock prices from search volume data collected by Google Trends to categorize optimal trading strategies. Likewise, Moat et al. [12] studied the relationship 120 between Wikipedia’s views change and total stock market movements. However, the traditional prediction algorithm ignores the acquisition of financial redundancy and equity capital time series and has the problems of multicollinearity and error series correlation, which leads to the prediction accuracy being not ideal. To solve this problem, this paper designed a new macroeconomic growth prediction algorithm based on data mining. We analyzed the sequence of economic characteristics, reorganized the spatial structure of economic characteristics, and integrated the statistical information of economic data. Through the optimization of the Apriori algorithm, the association rules between macroeconomic data are also generated. The algorithm has high accuracy and can provide a reliable basis for macroeconomic change trends.

We organized the remaining paper as follows. In Section 2, we present the proposed economic growth prediction algorithm. The results and corresponding discussion are presented in Section 3. Finally, Section 4 summarizes the proposed work.

2. Designing Prediction Modeling Algorithm

In this section, we propose an improved data mining algorithm for macroeconomic growth prediction. The proposed algorithm is described in the following section.

2.1. Statistical Information Analysis and Fusion of Economic Data

To realize the design of the macroeconomic data growth prediction algorithm, combined with the fuzzy correlation analysis method [13], we analyzed the economic characteristic series and presented the constraint variable factor of macroeconomic time series. Using the analysis methods of explanatory variables and statistical variables, the sample set of fuzzy characteristic sampling information of the characteristic point of macroeconomic time series distribution at time t was formulated and is expressed as , where is the steady-state parameter of the characteristic distribution of macroeconomic data. Under the control of the stable growth trend model, the statistical characteristic quantity of macroeconomic data feature parameter fusion was combined with the template feature matching method, and the weighted coefficient for the macroeconomic data growth prediction was computed. Based on the analysis of supplier concentration, the fuzzy similarity feature set of macroeconomic data growth was calculated. The standard quantitative distribution coefficient of macroeconomic data is calculated using the following equation:where is the fuzzy characteristic clustering function of macroeconomic time series at time . Applying semantic detection, the joint characteristic distribution value of the macroeconomic time series is obtained as follows:where and are the correlation coefficients of the negative impact of concentration. Using the fuzzy clustering method, the growth of macroeconomic data is predicted as

Based on the theory of information asymmetry and earnings growth prediction method, the joint detection statistical characteristic quantity of macroeconomic time series is obtained as follows:where is the time interval distribution difference function, is the decentralized control parameter of economic data, and is the weighting coefficient of macroeconomic data growth prediction. Based on the joint autocorrelation feature analysis method, the mean membership degree of each class of macroeconomic time series samples iswhere is the total number of macroeconomic time series samples of class , the joint feature distribution function is extracted, and the statistical fusion center block function is . The first cluster center is the distribution value of joint statistical parameters of macroeconomic time series to realize the statistical information analysis of macroeconomic data.

Combined with the analysis method of feature detection of accumulated financial redundant resources [14], the equity capital feature matching of macroeconomic data is realized. Applying the separation degree of macroeconomic data , the fuzzy autocorrelation feature matching set of macroeconomic time series and the joint feature distribution set of macroeconomic time series are computed as follows:where is the reliability characteristic quantity of macroeconomic time series and is the joint distribution parameter of macroeconomic time series in S.

Using the method of multiple growth prediction, the macroeconomic dataset X can be represented aswhere is the number of datasets and each element is represented as the -dimensional vector of economic fluctuation risk in the same year. X is comprised of categories, and the economic fluctuation parameter of enterprises in the first category is . Using joint distributed detection, the similarity characteristic variable of macroeconomic time series at the ith moment is computed as , and the corresponding correlation distribution type of macroeconomic time series is the value 1 or −1 of , where represents normal and represents abnormal growth.

Based on the results of information fusion, the big data distribution characteristics of macroeconomic time series are extracted to improve the accuracy of macroeconomic growth prediction.

2.2. Association Rules Mining Based on Improved Apriori Algorithm

The traditional Apriori algorithm [15] not only needs to scan the macroeconomic database repeatedly but also produces a large number of candidate itemsets. Combining the Map-Reduce parallel technique and matrix multiplication, the parallel processing ability of the Apriori algorithm was strengthened, and the algorithm performance was improved. The Map-Reduce is a computation technique used for big data computation. The steps are as follows.Parallelization. The macroeconomic database was divided into several blocks and was represented in the form of a matrix, and finally the matrixed data was allocated to the corresponding itemset for map and reduced calculation.Matrix. The transaction database was scanned, and transformed information was stored in a matrix. Each row in the matrix represented an object, and each column is an item in the transaction. The numbers “0” and “1” represent whether the item exists in the current transaction. If it exists, it is represented as “1”, and if it does not exist, then it is “0”.Calculate Support. The items were sorted according to the itemset. For two frequent itemsets A and B, if their first k items were the same, then they can connect to generate a candidate k itemset C. If the first k items of C come from their first k items, then two candidate sets and will be generated. The complexity of the matrix was obtained by logic calculation. This method reduced the number of scans on the database. After modification, the database was only required to be scanned twice, and the frequent itemsets and association rules were obtained quickly.

The improved Apriori algorithm mining data can make the mining rules more accurate [16]. This method not only meets the needs of users but also reduces the scale of the frequent itemset, saves data mining time, and improves efficiency. Let the mining target item be , and the mining process is as follows.(i)Get a frequent itemset . If contains two items , continue; otherwise, end.(ii)Calculate the support degree of the candidate itemset; stop mining if it is less than the minimum support degree; otherwise, continue mining.(iii)Obtain the candidate item h by connecting the target item and the itemset of a frequent itemset that does not belong to and . At this time, all itemsets in are -type to obtain support.(iv)Select the frequent itemset which is greater than the minimum support; then, each itemset of contains the target item . Then, the candidate itemset connecting the frequent itemset will contain 0 items, and finally, the mining rules obtained will also contain items.

Using these steps to mine data through the modified Apriori algorithm improves the accuracy of mining to a greater extent. The specific process of mining association rules using the improved Apriori algorithm is as follows.(i)Scan the database to get frequent itemsets for parallel and matrix processing. Use this matrix to distinguish whether the macroeconomic data is in the current target item to analyze whether the data need to be deleted or divided into the corresponding project set.(ii)Connect the frequent itemset with all the data that do not belong to the target item in the target itemset, and each itemset in the obtained itemset contains the target item, where is the number of target items, to achieve the purpose of the algorithm.(iii)Use the improved Apriori algorithm to mine frequent itemsets, and use the logical operation of itemsets to obtain the local support of target items.(iv)Use Reduce operation to obtain the local support of each itemset and candidate set after integration, and finally obtain the global frequent itemset.

The improved Apriori algorithm only needs to scan the macroeconomic database twice. The first scan generates frequent itemsets, and the second scan transforms other data into a matrix for a logical operation to generate association rules. Compared with the traditional Apriori algorithm, it reduces the number of database scanning processes, prunes the candidate set by Map-Reduce, and connects the target item and the item other than the target item, so that each item of the frequent itemset contains the target item, which makes the improved Apriori algorithm more decisive. Moreover, it reduces the candidate set and accelerates the speed of data mining.

2.3. Macroeconomic Feature Extraction

Before using a large amount of data in prediction models, we use mining association rules and apply the kernel principal component analysis (KPCA) method to extract the features of economic data [17]. KPCA is an extension of PCA using kernels. Using a kernel, the originally linear operations of PCA are performed in a reproducing kernel Hilbert space. Suppose is training data and is input space. Let the corresponding mapping be ; we combine the mapping with kernel function to realize the high-dimensional spatial mapping of features, so that the macroeconomic data can meet the centralization conditions needed in the feature space.

Suppose there are macroeconomic data samples with features. Using KPCA, we extract the features of macroeconomic data. The steps are as follows:The feature obtained from macroeconomic data with the sample is represented in the matrix and the -dimensional data matrix ‘A’ is obtained as represents macroeconomic data.The matrix K is calculated using the following equation: is the mapping between economic data.The function parameter and Gaussian radial basis function in the matrix are obtained by modifying the kernel matrix as follows:Using the modified matrix, the eigenvector and the eigenvalue of its matrix are tested and calculated. The eigenvalue is processed in descending order, and the corresponding eigenvector is modified. Moreover, Schmidt orthogonalization is used to process eigenvectors.The cumulative contribution rate and the specified information extraction rate are calculated according to the eigenvector. If and , the previous feature vector is extracted.The projection on the eigenvector is obtained by calculating the corrected kernel matrix.

Based on the principle of function replacement, KPCA is used to reduce the dimension of economic data to obtain the feature vector of economic data. The data mapping result is described using the following equation:where is the spatial projection of the eigenvector and is the input space.

Based on the above, through the warrant analysis method of equity capital time series detection, it is assumed that the initial frequency mean of the associated data in the macroeconomic time series is , and the standard deviation is . The training samples update the iterative abnormal state space using the following equation:

According to the optimal grouping detection results of macroeconomic time series, the center point of the cluster is updated, and the correlation characteristic component of the associated data in the macroeconomic time series is obtained. Through mutual information parameter identification of macroeconomic time series [18], the joint characteristic parameter distribution is obtained as follows:where is the characteristic parameter associated with data sample and is the mutual information identification parameter of economic time series [19]. Using the above steps, the information characteristic model of the macroeconomic time series is constructed. Combined with quantitative regression analysis and subsection test method, the joint probability density distribution of macroeconomic time series is obtained as follows:where is a complex Gaussian process with zero mean and one variance. The distribution of associated data in macroeconomic time series is computed aswhere is the Doppler parameter of macroeconomic time series distribution and is the joint characteristic distribution parameter, which is obtained by calculating the information focus parameter of associated data flow in macroeconomic time series. By associating the training set with the big data of macroeconomic time series, the gain correlation characteristic quantity of the associated data in macroeconomic time series is calculated using the following equation:where is the big data information control function of economic time series.

2.4. Macroeconomic Growth Prediction Algorithm

With the recent development of data mining techniques, rather complex models are representing good performance. We obtained the joint distribution characteristic quantity of the macroeconomic time series, the gain correlation characteristic quantity, and the joint parameter fusion of the associated data [20]. The joint autocorrelation characteristic analysis is performed to obtain the Doppler parameter of the macroeconomic time series. We jointly estimated the macroeconomic time series and obtained the maximum likelihood estimates and using the residual analysis method of the regression model . The logarithm of the natural distribution of the macroeconomic time series is given as

We set the measurement index parameter of environmental uncertainty and obtained the category number of macroeconomic time series samples (e.g., the ) and the distribution coefficient of economic time series samples [20], using the method of autocorrelation feature distributed estimation, to calculate the difference between the maximum membership degrees of each type of economic time series samples and get the detection statistical distribution coefficient value of macroeconomic time series as . Using the joint parameter analysis method, the macroeconomic time series prediction results are computed as follows:where n is the number of training samples of economic time series big data distribution and is the standard deviation of economic time series distribution.

3. Experimental Design and Result Analysis

To verify the application performance of the proposed algorithm in the prediction of macroeconomic growth, the simulation test experiment was designed using the MATLAB R2021a platform. Combined with SPSS 17.0 statistical analysis software, the data sample detection model of macroeconomic growth prediction was developed. We used 1024 data samples of the economic data series with 214 training data samples and 120 data samples as the statistical characteristic distribution set. The autocorrelation distribution coefficient was 0.212. We employed the prediction algorithm based on the optimized wavelet neural network and improved the whale optimization algorithm. According to the above parameters, the experimental process was designed as follows.

3.1. Efficiency Test of Macroeconomic Growth Prediction Algorithm

We constructed the descriptive statistical analysis model of macroeconomic growth prediction for the years 2008–2020. The experimental results are shown in Table 1.

According to the descriptive statistical analysis of the macroeconomic time series given in Table 1, the input dispersion diagram of macroeconomic data distribution is obtained, as shown in Figure 1.

According to the distribution of macroeconomic input, the growth of the macroeconomy is predicted, and the predicted output is shown in Figure 2.

Figure 2 shows the year-wise return on net assets. It is evident that the proposed algorithm can effectively predict the macroeconomic growth of economic time series.

3.2. Computation Time of Macroeconomic Data Mining Algorithm

We compared the computation time of the proposed prediction model with the computation time of different algorithms to mine the economic data. The experimental results of the computation time for different data mining algorithms are shown in Figure 3.

We compared the computation time of the proposed data mining algorithm with the algorithm of the optimized wavelet neural network given in [4] and the method based on the improved whale optimization algorithm given in [5]. Figure 3 shows that the proposed algorithm is obviously more efficient in terms of data mining time than the other two algorithms. This is because the proposed algorithm improves the traditional Apriori algorithm by matrix multiplication so that it only needs to scan the macroeconomic database twice. We can then get the association rules between the data and realize the mining of macroeconomic data based on association rules, which leads to time reduction in mining data and improves the mining efficiency.

3.3. Comparative Test of Macroeconomic Data Mining Rate

Data mining extracts useful data from a large set of raw data. Figure 4 shows the comparative results for the data mining rate of the proposed algorithm with optimized wavelet neural network and improved whale optimization algorithms.

The mining rate of the proposed algorithm is higher than that of the optimized wavelet neural network and improved whale optimization algorithm. In addition, the mining rate of the proposed algorithm decreases more smoothly with the increase of detection times. The main reason is that the proposed algorithm uses the Apriori algorithm to quickly mine association rules in a short time and uses the Map-Reduce method to prune the candidate set in the mining process. With the help of pruning, the precision of association rules is improved, and the high-precision association rules are applied to extract the distinct features of macroeconomic data and consequently improve the mining rate of the proposed algorithm.

3.4. Comparative Analysis of False-Positive Rate of Different Economic Data Mining Algorithms

Each data mining technique has its intrinsic characteristics and underlying assumptions that make it a better choice for macroeconomic growth prediction. We also computed the false-positive rate of the proposed algorithm, optimized wavelet neural network, and improved whale optimization algorithms. The results are shown in Figure 5.

The false-positive rate of the proposed algorithm is relatively higher than that of the optimized wavelet neural network and improved whale optimization algorithms. However, the proposed algorithm is more stable than the other two methods. In the proposed method, the improved Apriori algorithm is applied in the process of feature extraction, and the kernel matrix is modified to obtain a new feature vector, which makes the feature extraction more targeted, which reduces the false-positive rate in the process of macroeconomic data mining.

3.5. Comparison of Economic Growth Prediction Accuracy of Different Algorithms

We investigated the accuracy of different algorithms used for economic growth prediction. Figure 6 shows the prediction accuracy achieved by different macroeconomic growth prediction algorithms. It can be seen that the economic time series growth prediction accuracy of the proposed algorithm is higher than the other two algorithms.

The proposed algorithm constructs the financial redundancy and equity capital time series detection model of macroeconomic data growth prediction. Through the warrant analysis of equity capital time series detection, the accuracy of the growth forecast is effectively improved, and the application effect of the algorithm is optimized.

4. Conclusion

To solve the problem of large deviation of traditional economic growth prediction algorithms, a new macroeconomic growth prediction algorithm based on data mining is developed. This study analyzes the sequence of economic characteristics, reorganizes the spatial structure of economic characteristics, and integrates the statistical information of economic data. Using the optimized Apriori algorithm, the association rules between macroeconomic data are generated. The residual analysis method of the regression model is used to predict the growth of macroeconomic data. To verify the effectiveness of the algorithm, a simulation experiment is designed. The experimental results show that the proposed algorithm has ideal application adaptability, and the economic growth prediction results are in line with the actual situation, which shows that the algorithm has high accuracy and can provide a reliable basis for macroeconomic change trends.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.