Abstract

In the past two years, equestrian sports have become more and more popular with the public. Due to the comprehensive development of equestrian preparations for the 2020 Olympic Games in China, the equestrian sports industry presents an unprecedented favorable development environment in China. This article is aimed at studying the construction of an equestrian event information intelligent processing platform based on data fusion and data mining. This article introduces the relevant theoretical knowledge of data mining and data fusion, including the description of the concept of data mining, the common analysis methods and algorithms of data mining, the basic concepts of data fusion, and the functional structure of data fusion. It discusses various algorithms in cluster analysis and focuses on the analysis of distance measurement and similarity coefficient in cluster analysis. In the experimental part, in order to intelligently process and acquire information, an information intelligent processing platform is constructed based on data fusion and data mining technology. The experimental results of this paper show that the precision rate, recall rate, and -score of the platform under closed test are much higher than those under open test, and the precision rate is increased by about 7.26%.

1. Introduction

China is one of the countries in the world with a long history of horse breeding and one of the countries that develop horse culture. For a long time, horse culture has always been an important part of Chinese national culture, so “horse culture” also represents a kind of “cultural trust.” This “cultural trust” equestrian sport is a sport where athletes and horses work together. One of the most obvious differences from other sports is that the object that the athlete grasps is not a lifeless equipment, but a living horse. After the reform and opening up, the rapid development of China’s tertiary industry has greatly promoted the development of sports in my country. Although equestrianism is a very special sport, as a high-level sporting event in our country, it has become more and more popular in recent years. Equestrian sports began to show a trend of vigorous development. The rapid economic growth and the improvement of people’s spiritual and cultural needs have made equestrian sports possible. With the increasing number of equestrian events held, it is urgent to build an intelligent information processing platform.

National policies have promoted the rapid development of equestrian sports. The holding of equestrian events requires a logical and scientific platform to intelligently process event data. There are various forms of data expression and big data storage capacity in the intelligent data processing platform. Due to the increasing requirements for data processing speed and accuracy, conventional data processing methods can no longer satisfy the platform. In addition, with the rapid development of computer technology and data processing technology, a large amount of information can be processed in real time. In this case, data fusion technology has emerged. In addition, too much information is a problem that people must deal with. In order to obtain useful data from a large amount of information, data mining technology is further introduced. Data mining is a broad interdisciplinary field that gathers researchers in various fields, especially researchers and engineers in databases, artificial intelligence, and data processing. At the same time, the concept of data mining also points out a new research direction for the use of new technologies and methods to develop intelligent data processing.

The Internet of Things technology has been used as the core technology of converged services that require intelligent information processing, and its importance is gradually emerging. Am-suk proposed a beacon device using acceleration sensors and hole sensors. The beacon device can control the target under specific conditions by sensing the moving target. In addition, we expect to apply it to various types of factory environments, such as detachable installation and optimal management using sensors. Through the internal network construction of the Internet of Things interactive devices, it can be effectively connected with the Internet of Things devices, and diversified services can be provided through the connection with the open platform. However, in addition to the basic functions of the beacon technology, there are still many requirements under specific environments and conditions [1]. With the rapid development of big data technology, a new way of thinking about intelligent transportation has become an obligation. Mendili SE creates a big data modeling method by using a data modeling method called ITS for processing transmission and processing data. The method pays special attention to the creation of multiple layers. Among these layers, you can find the management and processing layer, which contains three layers: processing, analysis, and storage. The disadvantage is that different types of different traffic data are numerous and huge [2]. Quan’s research on intelligent data processing is aimed at establishing a theory, algorithm, system and technical method for processing data, complex systems, and uncertainties, researched the theory and methods of processing intelligent information using the sensory mechanism of perception, established usable calculation models, and developed application programs. It has a wide range of applications in complex system modeling, system analysis, decision-making, control, optimization, and design. However, the effectiveness of this research on the modeling method of intelligent information processing has yet to be verified [3].

The innovations of this paper are (1) introduction and improvement of the clustering algorithm, which improves the efficiency of the network monitoring system and paves the way for the intelligent information processing technology to be studied in the experimental part. (2) Combining URL processing, content processing, and port processing and applying this mixed processing method to the platform, through automatic classification, reduce the burden of managing large-scale data processing systems and improve the efficiency of data processing functions.

2. Construction Method of Intelligent Processing Platform for Equestrian Event Information Based on Data Fusion and Data Mining

2.1. Data Mining
2.1.1. The Concept of Data Mining

Data mining is to extract the knowledge that people are interested in from the database [4, 5]. This kind of knowledge is indirect knowledge, which is unknown and may be useful, and can be expressed in the form of concepts, rules, laws, standards, etc. Generally speaking, data mining is a decision support process, a mode used to find and collect facts or observe the results. The object of data mining is not only a database but also a file system or other data collection [6, 7].

The task of data mining is to extract knowledge from it. Divided by function, this knowledge is divided into two categories: predictive and descriptive [8]. The prediction type can accurately determine a specific result based on the value of the data item. Descriptive is the description of the rules contained in the data or data grouping through the similarity of the data. According to the differences in knowledge found, we can classify data mining tasks into the following categories: (1) feature rules: extract feature formulas about these data from a set of data related to the learning task to express the overall characteristics of the data set [9]; (2) classification: search for the concept of a category, which represents the total information of this type of information [10]; (3) clustering: first, group the data and then put it in a cluster. The differences between the groups are as large as possible, and the differences within the groups are as small as possible. The difference between clustering and classification is that clustering does not depend on predefined classes and does not require a training set [11, 12]; (4) association analysis: means that when the value of two or more data elements is represented and the probability is high, at time, there is a specific relationship, and the relationship law of these data elements can be determined [13]; (5) prediction: learn the law of change from the past data, and create a model, and use this model to predict the type and characteristics of future data, etc. [14]; (6) deviation detection (anomaly analysis): deviation detection is to detect significant changes and deviations between the current status of data, historical records, and standards. Bias includes a large category of potentially useful knowledge [15].

2.1.2. Commonly Used Analysis Methods and Algorithms for Data Mining

Data mining technology is widely used in all aspects, so its shortcomings are becoming more and more perfect. The key problem to be solved is the type of problem and the type and scale of data. According to the powerful functions of data mining, the analysis method can be used and divided into the following four types: (1)Analysis based on the degree of relevance: the purpose of relevance analysis is to dig out the interrelationships hidden in the data [16](2)Based on sequence analysis: the focus of sequence pattern analysis is to analyze the before-and-after or causal relationship between data [17](3)Classification analysis: the input set of the classification analysis method is a set of records and several kinds of marks. First, a mark is assigned to each record, that is, records are classified by mark, and then, these calibrated records are checked and the characteristics of these records are described [18](4)Cluster analysis: the input set of the cluster analysis method is a set of uncalibrated records, which means that the input records have not been classified at this time [19, 20]. Its purpose is to reasonably divide the record set according to certain rules and use explicit or implicit methods to describe different categories. These rules are defined by cluster analysis tools

In the process of using any technology, its used conditions and scope will be limited, and data mining technology is no exception. Therefore, the important selection link in the use process is to select effective data mining models and analysis algorithms for specific fields.

The main data mining methods of database technology are (1) statistical methods; (2) association rules: mining association rules is to search for correlations in the data set [21]; (3) genetic algorithms; (4) rough set methods: never accurate; knowledge is found in fuzzy and uncertain information; traditional set theory can be used to classify and find incorrect information or internal links that interfere with information [22]; (5) fuzzy method: fuzzy logic systems are widely used in the field of classification; (6) neural network method: neural network is used to obtain the classification model [23].

2.2. Data Fusion

Data fusion is the multilayer processing and aggregation of multiple sensor data sets obtained from the same target to generate important new data [24]. The sensors here refer to various data acquisition systems and related databases. Data integration is a method of dealing with multiple data sources. In short, data fusion is a complete multidata algorithm [25, 26]. The purpose of processing is to reason and identify the information received and to evaluate and judge accordingly. By combining data from multiple sensors, data from multiple sensors can be combined to increase confidence, reduce ambiguity, and improve system reliability.

Data fusion is a comprehensive process through which computers can be used to process, control, and make decisions about information from different sources [27]. The functions of most data fusion systems include correlation detection, perception, and estimation, and the functional mode is shown in Figure 1. The fusion system is divided into two levels of low-level processing and high-level processing [28, 29]. Low-level processing includes data extraction, data connection, target state evaluation, and characterization and is numerical processing that produces numerical results. Advanced processing mainly includes behavior prediction evaluation and state evaluation. It is a symbolic processing and can lead to more abstract results.

As show in Figure 2, data detection is the continuous scanning and observation of targets using multiple sensors in a data fusion system, and key detection and key detection are performed at the signal level [1]. The function of the data interface unit is to determine whether the time and space data are different from the same target. The state evaluation is based on the approximate value of the target parameter of the sensor observation and uses these estimated values to predict the state of the next observation target. Target perception is to generate -dimensional feature vectors based on feature targets measured by different sensors. Each dimension represents the independent characteristics of the target. If it is known in advance that the target has an M type and a single target type, the metric vector can be compared with the known category features to define the target category. According to the location, parts, type, and other information, understand the overall development goals and general situation. In distributed processing, each sensor makes an independent decision and then sends the result of the decision to the fusion center for the CPU to make a final decision. The advantage of this structure is that it requires low channel capacity and is easy to implement in engineering. But each sensor will decide on its own, which will make the fusion process unstable. The hybrid structure has the following two advantages: low data transmission channel requirements and availability of main measurement data.

2.3. Cluster Analysis

Clustering refers to the process of dividing a collection into multiple groups in a certain way. Among them, objects in the same group have a high degree of similarity, but are very different from objects in other groups. Cluster analysis is used as a tool to classify information, observe the characteristics of each category, and discover the hidden information. At present, cluster analysis is used in various industries, such as graphics processing, information retrieval, and statistics.

2.3.1. Definition of Cluster Analysis

In the field of machine learning, unlike classification, clustering is an unsupervised learning process; the result of which is unknown. Here is the mathematical description of cluster analysis:

For the data set , according to the similarity between the data objects , it is divided into groups, , and meets and , where is called a cluster.

Assuming that the data to be clustered contains objects, each object contains attributes, and represents the th attribute of the th object, and the attributes of the th object form an matrix.

Another representation is called the dissimilarity matrix, which can be used to store the dissimilarity between objects.

The dissimilarity matrix is an matrix, where represents the dissimilarity between object and object . Its value is a nonnegative number, and the larger the value, the greater the difference between the two, and the smaller the value, the more similar the two. Obviously , so the dissimilarity matrix is expressed as a lower triangular matrix.

2.3.2. Distance Measurement in Cluster Analysis

Most clustering algorithms divide data objects into similar clusters. Therefore, the similarity between clusters is very low, but the similarity between cluster objects is high. Generally, the distance of an object is used to measure the similarity of an object. There is a special relationship between inequality and similarity, which can be transformed into the following formula. or

For objects and with attributes, the common distance formulas are: (1)Euclidean distance (2)Manhattan distance (3)Chebyshev distance (4)Minkowski distance

For , when , it is the Manhattan distance; when , it is the Euclidean distance, and when is the Chebyshev distance.

2.3.3. Mahalanobis Distance

P. C. Mahalanobis proposed Mahalanobis distance, which is an effective method to calculate the similarity between two groups of unknown samples. where is the covariance matrix.

In addition to distance, similarity coefficients can also be used as a metric. The closer the nature of the transaction, the closer the similarity is to 1, and the closer the similarity of unrelated transactions is to 0. Common similarity coefficients are: (1)Cosine of included angle

The angle cosine is derived from the similarity. When , then ; it means that the object and the object are completely similar. When , then , which means that the object and the object are not related. (2)Correlation coefficient

The value range of the correlation system is , which is an indicator used to measure the degree of linear relationship between variables with specific values. The correlation coefficient can be positive or negative. The higher the correlation coefficient, the higher the linear relationship between the variables.

3. Experiments on the Construction of an Intelligent Processing Platform for Equestrian Event Information Based on Data Fusion and Data Mining

3.1. Overall Design

For this network information processing system, in order to be able to grasp the operation flow of data packets in the system, different processing strategies and strengths will be implemented according to different protocol layer data information. For example, in the TCP/IP protocol stack, corresponding to the network layer, transport layer, and application layer, the data processed, respectively, are IP packets, TCP or UDP packets, and complete application protocol data, but for example, the domain name and specific content are different. For the HTTP application layer protocol of the nature of the data, these requirements must be considered separately in the information processing process. Therefore, to design a secure network information processing system, many users will have their own personalized processing strategies, so to divide these users into different grouping levels, it is necessary to implement different processing strategies for different types of content at different levels.

To design an intelligent data processing system based on address processing and web content processing based on data mining technology, we must first create a sample library and maintain a URL list through intelligent web content analysis and then dynamically update the URL list. Before connecting to the remote target network, check whether the address list URL matches the keywords defined on the web page, and then, determine whether to process the URL and bad web page information, and add the web page URL of the bad information to the blacklist. This intelligent information processing can be dynamic, constantly update the address list, and accurately analyze the content of the web page. The intelligent information processing system designed is mainly considered from three aspects: user’s personalized processing rule setting, matching processing algorithm, optimization, and sharing of rule library and sample library. Considering the requirements of each of the above key steps, the basic structure of the information processing system is as follows.

The main responsibilities of the main management modules of the system are user personalized management, rule list library management, sample information library management, cluster analysis results, and processing level setting management. (1)User personalized management: create a user management system based on personalized processing strategies(2)Rule library management: create a URL address library list and keyword management library(3)Sample library management: the administrators or ordinary users perform operations such as adding, clearing, refreshing, and changing the harmful sample libraries of the system, as well as the subject ownership of the categories in the sample library(4)Cluster analysis: perform periodic cluster analysis of the sample library or manually performed by the user, and use it in conjunction with module c, and use the sample library reasonably(5)Processing level management: the processing level setting is completed by the system administrator according to different processing strategies

3.2. Design Goals

The network information processing monitoring system is mainly used in the Ethernet environment. The network monitoring system gives full play to the monitoring role of the software to effectively monitor and manage all the computers in the system, according to the current network application status and the current network monitoring technology level. The design of the network monitoring system for information processing is as follows: (1)Comprehensiveness of monitoring: not only need to know the usage of the internal network of the LAN but also to know the specific network information of the current application of a certain computer, to see if it can run hacker software or other illegal network applications(2)Strong operability: in order to enable network managers to use the system interface simply, quickly, and conveniently, the graphical interface must be designed to be simple and easy to use(3)High security: the monitoring station program must use password authentication to log in to prevent illegal users from using the monitoring system(4)High efficiency: the network monitoring system usually analyzes thousands of pieces of data in a very short time. The amount of data processed is quite large, and the analysis results are required to be very accurate. The system must have extremely high operating efficiency. And the detection algorithm must have low time complexity

3.3. Processing Flow

First, the user sends information to make a request to the network. Then, the transmission layer processing module checks the request sent by the network user according to the URL blacklist or the rule setting of the keyword, and judges whether the information is reserved or blocked according to the matching algorithm. Then, obtain the data information from the result of the transport layer inspection, and then, preprocess the web page text. After querying the categories in the sample library, the text is divided into roots based on the similarity to determine the content level of the text. If it is bad information, you need to set a warning or clear the log, and put its address in the blacklist. Give the user prompt information. Perform periodic cluster analysis and other operations on the sample library or other training sample libraries for various topics.

3.4. Development Environment

The monitoring station module of the monitoring system mainly operates the database. It uses Microsoft’s .NET visualization and easy-to-operate platform. The development tool uses Visual Studio 2008, which is based on .NET technology and web applications. Program development provides a well-supported integrated development environment [30, 31]. The programming language is C#, and ADO.NET is used for database access operations in Visual Studio 2008, because it can easily and efficiently implement operations such as database connection, access, and editing. In order to meet the real-time data change frequency of the network monitoring system and the requirements for the database of the system with large traffic, the system adopts the Oracle9i database, because the Oracle database has efficient data storage efficiency and powerful data maintenance functions as well as a good database structure design. The detailed development environment configuration of the network monitoring system based on information processing is shown in Table 1.

4. Experimental Results and Analysis

4.1. Matrix Analysis

In actual use, users tend to certain categories. For example, in the process of using the Internet, users may only be interested in documents on a certain topic. Generally speaking, the category that the user is interested in can be called the positive category, and the other categories are called the negative category. The recall rate and precision rate can reflect the accuracy and completeness of affirmative case classification. The hybrid matrix used to analyze these two evaluation criteria is shown in Table 2.

In practical applications, high precision is usually at the expense of recall, and the importance of evaluation criteria depends on actual application needs. If only one standard is needed to measure the performance of two different classifiers, -score is usually used. -score represents the harmonic average of recall and precision. The relationship between the three is shown in Figure 3.

4.2. Sample Analysis

In order to build an information intelligent processing platform based on data fusion and data mining, 1,800 copies of legal and illegal webpages were collected from the Internet to form a sample database, of which 1,200 were legal webpages, and 600 were illegal webpages. This ratio is close to a general ratio of legal and illegal web pages that we usually encounter. During the testing process, the number of training samples can be added at any time, and the administrator will analyze the test results regularly. The 600 illegal web pages are divided into 8 copies, each with 75 copies, and 1200 legitimate web pages are randomly selected out of 600 copies. Divide into 8 pages, combine these 16 pages according to the cross-integration of illegal pages and legal pages, plus one page used for the open test is composed of 75 normal pages and 75 illegal pages, so there are 9 sets of rules and 150 pages each.

Then, each time, 2 of these 9 web pages are drawn, one is included in the training sample library as a closed test example, and the other is not included in the training sample library as an open test example and continues to loop according to the above combination rules. Test 9 times and records the results while testing. The test result of the closed test is shown in Figure 4.

The test result of the open test is shown in Figure 5.

It can be seen from Figures 4 and 5 that in the two test environments of closed testing and open testing, the precision, recall, and -score under closed testing are higher than those under open testing. And -score, the precision rate under the open test, is much lower than that under the closed test, which is reduced by about 7.26%. The overall calculation results are shown in Table 3.

5. Conclusions

Due to the popularity of the Internet, information technology and equestrian competitions have formed an inseparable relationship among various equestrian events. Intelligent data processing is a pioneering interdisciplinary computer science and a comprehensive applied subject. The purpose is to process large and complex data and study new and advanced theories and technologies. The research of intelligent data processing includes multilevel basic research, applied basic research, basic research technology, and applied research. This article introduces the relevant theoretical knowledge of data mining and data fusion technology and conducts analysis and discussion to build an information intelligent processing platform based on data fusion and data mining. This article focuses on the design of the sample database, the function of the platform interface, and the interception effect of the data mining information processing system. Through experiments, network monitoring and information processing are realized. The platform can control illegal networks or bad data information and effectively process and monitor information. The platform constructed in this article truly embodies the characteristics of “letting the data speak,” and when the data sample is particularly large, it also reflects the advantages of rapid and accurate computer automated processing. In theory, although some progress has been made in the construction of a sample database, a database based on web addresses, and web page text content, other factors of the entire network have not been considered in the research, and there is still a lot of work to be continued.

Data Availability

No data were used to support this study.

Disclosure

We confirm that the content of the manuscript has not been published or submitted for publication elsewhere.

Conflicts of Interest

There are no potential competing interests in our paper.

Authors’ Contributions

All authors have seen and approved the manuscript.

Acknowledgments

This work was supported by the Wuhan University of Business Doctoral Fund Projects (2021KB002), the Application Foundation Frontier Project (2020010601012294), and the School-Level Academic Team Project of Wuhan Business University (2018TD011).