Abstract

An optimized neural network classification method based on kernel holistic learning and division (KHLD) is presented. The proposed method is based on the learned radial basis function (RBF) kernel as the research object. The kernel proposed here can be considered a subspace region consisting of the same pattern category in the training sample space. By extending the region of the sample space of the original instances, relevant information between instances can be obtained from the subspace, and the classifier’s boundary can be far from the original instances; thus, the robustness and generalization performance of the classifier are enhanced. In concrete implementation, a new pattern vector is generated within each RBF kernel according to the instance optimization and screening method to characterize KHLD. Experiments on artificial datasets and several UCI benchmark datasets show the effectiveness of our method.

1. Introduction

In the field of pattern recognition, set classification [13] is a common classification task. It is widely applied in text classification, speech recognition, image recognition, and multiple other fields. Taking the classification task based on image set as an example, each image set is composed of a class of image frames with a certain number of similar features. Due to the use of relevant information from adjacent frames, image changes can be effectively explored in the actual conditions. The main challenge is how to effectively integrate the information from all existing images to reach a reliable decision. A typical approach is to establish an optimized representation of different subsets of images and to achieve effective measurements between different subsets.

Different from the set classification methods mentioned above, almost all the current neural network [47] optimization algorithms and models are based on the training and classification of instances instead of learning and partitioning the subspace region containing those instances. Because the classification surface of the network classifier is essentially determined by the probability distribution of the training samples, if the size of the training sample set is too small or the dimension of the classified dataset is too high, the error in the final classification will be relatively large, which leads to the reduction in the generalization performance of the neural network classifier.

To improve this problem effectively, inspired by the idea of set classification, this paper attempts to introduce the idea of set classification into the neural network and presents an optimized neural network classification method based on kernel holistic learning and division (KHLD), which can improve the performance of the neural network classifier under a given sample set. Different from set classification, the method of KHLD is based on the effective coverage of a local region of the sample space, so the kernel proposed here can be considered a subspace region consisting of the same pattern category in the training sample space. Though it might not obtain the spatial distribution directly, relevant information between instances can be obtained from the subspace. The main reason is that the instances of the same pattern category are relatively close to each other in the spatial distribution and can be considered to have some similarity. Compared to single-pattern vector classification, KHLD considers the similarity information of the local region in the sample space. Due to the expansion of the region presented by the original pattern vector in the sample space, it can be improved to a certain extent when the size of the sample set is too small or when the dimension of the sample space is too high. On the other hand, KHLD can make the classifier’s boundary farther away from that of the original sample, which can further strengthen the robustness and generalization ability of the classifier.

The primary task of achieving KHLD is the establishment and representation of the kernel. In this paper, considering the local characteristics of a certain region covered by the sample space, we take the Gaussian distribution function under different parameters as the representative to establish the corresponding subspace set. Moreover, to integrate the subkernel with different parameters and the mapping effect into the original sample space, we first construct the corresponding RBF kernel by learning the original sample space to realize the local mapping of the different regions of the sample space. Then, RBF kernels with different parameters are further studied and divided. Thus, the KHLD presented in this work has two meanings: the establishment of different RBF kernel parameters and the holistic division of the coverage region.

Typical optimization algorithms for establishing RBF kernels with different parameters include K-mean clustering [8], fuzzy clustering [9, 10], orthogonal forward selection [11], evolutionary algorithm [12], particle swarm optimization [13], and other algorithms [1416]. It is worth noting that the above methods for optimizing the RBF kernel parameters effectively combine the holistic information of the training sample space, but the number of hidden nodes in the RBF network cannot be determined automatically, which may lead to poor adaptability for different sample sets. To automatically estimate the number of RBF kernel parameters, several sequence learning RBF network kernel parameters, including minimum resource allocation network (MRAN) [17], sequential learning algorithm for growing and pruning the RBF (GAP-RBF) [18], and other incremental design of radial basis function networks [1921], can be used. However, the holistic information of the sample space is not taken into account in these methods, and the classification performance will be affected to some extent.

To generate the optimal number and parameters of the RBF kernel, in our previous work, an incremental learning algorithm for the hybrid RBF-BP network (ILRBF-BP) [22] and a hybrid structure adaptive RBF-ELM network (HSARBF-ELM) [23] are presented. In ILRBF-BP, the method of potential density clustering is presented to generate RBF kernels automatically, which utilizes the global distribution information of sample space. However, the local adaptability of each RBF kernel parameter in ILRBF-BP is not fully considered; this disadvantage is overcome by HSARBF-ELM. By combining the potential density clustering and the center-oriented heterogeneous sample repulsive force, the density information of different regions of the sample space and the neighborhood information of the region covered by the initial hidden nodes of the RBF can be used effectively. The optimal number and parameters of the RBF kernel can be generated adaptively according to the distribution of the sample space. However, when the size of the training sample set is too small or the dimension of the sample set is too high, the distribution of the sample set will be very sparse, which leads to the failure of the optimization algorithm to some extent, and the generalization performance of the neural network classifier will be reduced. To solve this problem, an optimal neural network based on KHLD is proposed. The premise of the method of KHLD is to establish the optimized kernel parameters. The geometry of these kernels is a regular hypersphere, and the optimization of the number and parameters of RBF kernels in HSARBF-ELM is just in line with this requirement. Thus, the RBF kernel established in HSARBF-ELM is the research object of this study.

When the number and parameters of the optimized RBF kernel are established, the subsequent task is to realize KHLD. In practice, the training of the weights of network classifiers is carried out in a single instance. When all the RBF kernels are established, according to the probability density distribution of the pattern vectors in each subkernel, we consider generating new pattern vectors within each RBF kernel, which is equivalent to extending the existing pattern vector subset in the current RBF kernel, to characterize KHLD. Intuitively, when the number of samples generated in the region covered by the kernel is sufficient, the covered region can be approximated. In this way, the KHLD is transformed for training and dividing more pattern vectors. On the basis of generating a suitable sample set size, the existing network classifier is used for training and classification; thus, the final classification surface can be modified to improve the generalization performance of the network classifier.

To achieve the effective expansion of the pattern vector in the region covered by the RBF kernel, a suitable sample probability distribution model is first needed to generate new pattern vectors. For this problem, we consider that the effective region covered by the RBF subkernel contains a certain number of original pattern vectors. In the region near the center, the probability density is relatively dense, and the probability density near the boundary is relatively sparse; thus, it can be considered that these pattern vectors similarly obey the multivariate Gaussian distribution with the current RBF kernel as the parameter. Moreover, the new pattern vectors should be constrained by the region covered by the current RBF kernel, and the initial filling of the RBF kernel can be accomplished in this way. Second, we need to measure the density of the region of the original pattern vectors in each RBF kernel. In the dense region of the sample space, the number of generated pattern instances is relatively large; conversely, in the sparse region of the sample space, the number of generated pattern instances is relatively small. When the generated instances are in the mixed region covered by different pattern classes, the probability of preserving the sample is further reduced. In this way, by combining the density and location information of the region, the optimal selection of the generated pattern instances can be completed without changing the probability density distribution of the original sample space.

According to the above methods, we take the idea of KHLD as the prototype and approximate the idea of KHLD by filling and screening the pattern vector of each kernel. On the other hand, the KHLD of each RBF kernel is converted to learning and division of more pattern vectors, which can improve the sparse sample spatial distribution caused by a sample size that is too small or sample space dimension that is too high, and the classification accuracy of the classifier can be enhanced. Note that, due to the inhomogeneity of the sample distribution inside the kernel, the approximation of the idea of KHLD by filling and screening the pattern vector of each kernel can be considered a soft partition; that is, the final classified surface can pass through the kernel to improve the overlap of different pattern subclasses effectively. Thus, it is more conducive to the adjustment of actual classification surface parameters.

In summary, the main contributions of this work are as follows:(1)The idea of KHLD is introduced into the neural network classifier, and its characteristics are analyzed(2)The internal sample generation and optimization screening mechanism of the RBF kernel is designed to achieve the approximation of KHLD(3)The performance of KHLD is combined with existing classification algorithms and compared with these algorithms in two artificial datasets and several benchmark datasets, and the experimental results show the superiority of the proposed method

2. Methods

2.1. The Establishment of KHLD

Considering that the method of KHLD is based on the RBF kernel of HSARBF-ELM, here, we give the optimization of RBF kernels in HSARBF-ELM, which is ready for the optimization method of kernel holistic learning and division.

For the input sample x, when it passes through the RBF kernel function, its output can be expressed aswhere are the center and width of k-th RBF kernels.

In HSARBF-ELM, by combining the methods of density clustering with a potential function and center-oriented unidirectional repulsive force, the numbers of RBF kernels and parameters can be effectively generated. The main methods are as follows.

Given a training set , where , is the i-th pattern category set, , h is the number of pattern categories, and is the number of samples in the i-th pattern analogy, for each pattern category set :(1)Compute the potential value of according towhere is the distance weighting factor and is the distance measure between and .(2)Determine the sample with the maximum potential as the center of the hidden nodes of the generated RBF, and set the sample with the maximum potential to be ; the corresponding expression is as follows:(3)Adjust the centerwherewhere denotes the heterogeneous repulsive force from to , is a heterogeneous sample covered by the current hidden nodes of the RBF: and , where is the width covering factor, is the repulsive force control factor, and is the iteration step. and denote the number of samples covered by the current hidden nodes of the RBF before updating. denote the number of samples covered by the current RBF hidden nodes after updating.(4)The width is adjusted as follows:where is the width constraint factor, is the constrained minimum width parameter, and is the initial width. This adjustment ensures the relative diversity of each generated RBF hidden node, which can achieve a balance between the coverage effect and the generalization performance.(5)Counteract each sample potential of the region covered by the current RBF hidden node and find the sample with the maximum potential to generate the next RBF hidden nodewhere is the updated potential value of .(6)Set the iteration termination condition as follows:If  Go to Steps 2-4.Else The process of learning the current pattern category is complete. Go to learn other pattern categories.EndIf

According to the above steps, the number of RBF hidden nodes and the center and width, which can be denoted as , can be generated optimally. For HSRBF-ELM, once the optimized RBF hidden nodes are generated, the output can be the input of the subsequent ELM network. The update of the ELM network weights is based on the existing ELM [24] learning algorithm.

2.2. The Method of KHLD
2.2.1. Main Idea

To explain the characteristics and advantages of the method based on KHLD, Figure 1 gives a diagram of the direct classification and comparison of KHLD and direct pattern vector classification. The method of KHLD is transformed for training and dividing more pattern vectors.

To realize KHLD, it is necessary to establish a suitable RBF kernel to complete the effective coverage of the different regions in the original sample space. Then, to ensure the validity of the generated samples, the newly generated samples in the kernel should be approximately consistent with the original pattern vector distribution, and the number of newly generated samples should be proportional to the distribution density of the original sample region. In addition, when the kernels of different pattern categories overlap, it is necessary to further screen the generated pattern vector in overlapping regions. To this end, the following steps need to be completed:Step 1: the optimal coverage of the original sample space is completed by the potential function density clustering and center-oriented heterogeneous sample repulsive force; the appropriate RBF kernel parameters, including the number, center, and width of the RBF kernels, can be determined adaptively according to the distribution of the sample spaceStep 2: with the center and the width of each RBF kernel as constraints, a probability distribution similar to the original sample is set up to generate a new pattern vector in the effective region covered by each RBF kernelStep 3: the newly generated pattern vector is judged to determine whether it is retained or not, and finally, a new pattern vector subset is formedStep 4: w new set of samples is formed by combining the original sample with all the screened pattern vectors that are eventually retained to train the weights of the output classifier

The difficulty of realizing the above steps lies in Step 3, that is, to establish the appropriate standard to measure the relationship between the newly generated pattern vector and the original sample density and to determine whether the kernels with different pattern categories overlap each other to complete the optimization screening of the newly generated pattern vector.

2.2.2. The Implementation

In this section, we first give the definitions of KHLD, overlapping region samples, and nonoverlapping region samples to prepare for the description and implementation of subsequent algorithms.

(1) Definition.KHLD. Training and partitioning labeled RBF kernels after covering the original sample spaceOverlapping Region Samples. The samples in the overlapped region are covered by different pattern categories of RBF kernelsNonoverlapping Region Samples. Samples outside each overlapped region

According to the definition, Figure 2 gives the schematic diagrams of the overlapping region samples and the nonoverlapping region samples, which represent the valid regions covered by two different RBF kernels. In Figure 2(b), , where is the overlapping region, sample 1 and sample 2 are the overlapping region samples, and the other samples are nonoverlapping region samples.

To realize the classification method based on the kernel holistic division and the selection of generated samples, it is necessary to establish each RBF kernel as the research object and randomly generate the pattern category samples within each kernel to further optimize the screening process. To this end, two factors need to be considered:(1)To facilitate the optimization of subsequent generated pattern samples, the probability distribution of the initial generated pattern samples should be approximately the same as that of the original sample.(2)In the process of sample screening, the probability of the generated sample being retained should be proportional to the density of the original sample region. It is also necessary to consider whether the sample is an overlapping region sample and, if so, further reduce the probability that the sample is retained.

For case (1), since the establishment of each RBF kernel parameter is based on the potential function density clustering, overall, the probability density of the region near the center of the original sample is relatively large, and the probability density of the region near the boundary of the original sample is relatively small, it can be considered that the probability density of pattern vectors in these kernels approximately obeys a multivariate Gaussian distribution with the current RBF kernel as the parameter, and it can be taken as the new pattern vector probability distribution model.

For case (2), the key is to establish an appropriate measure to determine the density of the region where each generated sample is located and determine whether the generated sample is retained. If the generated sample is retained, it is necessary to determine whether the generated samples are in the overlapping region and further complete secondary optimization.

According to the above description, given a dataset , where is the number of training samples, is the category labels, and , let be the training sample set of the i-th pattern category, ; here, , . For each training sample category, the number and parameters of the RBF kernels are optimized by the potential function density and the repulsive force between heterogeneous samples, expressed as , where are the center and width of the RBF kernel, respectively, is the pattern category label of the RBF kernel, is the number of RBF kernels generated under each pattern category, and is the number of RBF kernels.

When all the RBF kernels are built, the effective coverage of the different regions of the original sample space is completed. To achieve sample filling for each RBF kernel, it is necessary to establish a suitable sample probability distribution model to generate new pattern vectors. For the current RBF kernel, the probability distribution for generating arbitrary pattern vectors obeys the Gaussian distribution with being the mean and being the variance matrix; that is, . Moreover, the newly generated pattern vectors should be in the effective region covered by the RBF kernel, which is given by

According to the above method, for the RBF kernel in the pattern category, let be the generated initial vector set in the kernel; here, , is the number of generated samples in the k-th kernel. After the initial pattern vectors are generated, they need to be optimized and screened. During the screening process, in the dense region of the sample space, the number of generated pattern instances should be relatively large; conversely, in the sparse region of the sample space, the number of generated pattern instances should be relatively small. In this way, the probability distribution of the sample space can be combined with the density of the region where the pattern vector is generated, and the validity of the resulting pattern vector can be enhanced.

Let be the initial sample set of the RBF kernel in the current pattern category and be the number of . For each initial pattern vector , when and , then . Thus, . For each generated pattern vector in , the probability density of each new pattern vector can be estimated, which is given aswhere is the width of the corresponding Parzen window in the k-th RBF kernel.

To achieve this metric while preserving the randomness of sample generation, we consider generating a uniformly distributed random number between 0 and 1, which is used to for comparison with the probability density of each newly generated pattern vector. If , is retained; otherwise, is eliminated. Therefore, in the region where the original samples are relatively densely distributed, the probability that the newly generated samples will be retained is relatively high.

Due to the complexity of different sample sets, heterogeneous samples are often mixed into the generated RBF kernel. Thus, it is necessary to further improve the sample screening in the overlapping region. When the generated sample is in the overlapping region, two factors need to be considered:(1)The probability of the sample being retained should be reduced.(2)It is necessary to consider the sample spatial distribution density under the current pattern category and other pattern categories at the same time. According to the principle of inhibiting the probability density of heterogeneous samples, when the spatial distribution density of the sample in the current pattern category is higher than that in other pattern categories, the probability of the sample being retained is relatively large.

Combined with the above two factors, for the sample generated in the RBF kernel, we can get . Moreover, when is satisfied, can be considered the sample in the overlapping region between the k-th and the n-th RBF kernel.

When the samples in the overlapping region are determined, it is necessary to further screen the samples. Let be the initial sample set contained in the j-th pattern category by the kernel , and set the number of samples in as . For an arbitrary pattern vector , when and are satisfied, . Thus, . For the sample in the overlapping region intersecting the k-th and the n-th RBF kernel, the probability density estimations of the heterogeneous sample regions are expressed as

According to the above method, for a randomly generated number between 0 and 1, when and , the sample in the overlapping region is retained; otherwise, is removed. Here, and .

Combined with the above description, Algorithm 1 gives the concrete implementation of the classification method based on kernel holistic learning and kernel interior sample generation.

Initialization;
for % h is the number of pattern categories
 for
  Count the number of initial samples belonging to the pattern category covered by each RBF hidden node;
  Use (8) to generate a sample set and count the number of generated samples ;
  for % Screening of generated samples according to the density
   Use (9) to estimate the probability density belonging to the current pattern category;
   ;
   if
    ;
   end if
  end for
  update;
 end for
for %further screening of the overlapping region samples
 for
  if
   Use (10) to estimate the probability density belonging to the pattern category;
   if
    ;
   end if
  end if
 end for
end for
end for
2.2.3. The Computational Complexity Analysis of KHLD

In this study, a method of the potential density clustering and the center-oriented heterogeneous sample repulsive force is used to generate optimized kernel parameters. Then, a method of optimized sample filling and screening can realize the effective approximation of KHLD. Assume that the number of samples in the initial training set is , and the initial training set contains two pattern categories; the number of samples are and , respectively. Here, . The computational complexity of the proposed method is analyzed as follows:(1)The optimal kernel parameters are generated by the combination of potential density clustering and heterogeneous sample repulsive force. In the process of quantifying the sample potential value by potential function density clustering, the label information of each category of samples is considered. The calculation of the sample potential value needs to traverse all other samples in the current pattern category. Then, Gaussian kernels with different parameters are needed to cover the sample subspace to update the sample potential. The computational complexity is . Set the number of kernels as ; in the process of optimizing the kernel parameters, the distance between all samples and the center should be considered; the computational complexity is . After merging, the computational complexity of this part is .(2)The process of sample generation and screening will also take a certain amount of time. Let the number of samples generated in all kernels be , where the number of samples generated in the k-th kernel is ; thus, . In the process of calculating the density measurement of the generated sample, the distance between the generated sample and the center of the current kernel should be considered; here, the computational complexity of the generated sample in the k-th kernel is . The computational complexity of all kernel generated samples is combined, which can be expressed as ; here, . Then, in the process of sample screening, we need to further consider whether the generated samples in the current kernel are overlapping region samples, which requires us to compare the distance between these samples and all other centers, and the computational complexity is . The computational complexity of all kernel screening samples is combined, which can be expressed as . Thus, the computational complexity of sample generation and screening in all established kernels is , which can be simplified as .

Combined with Steps 1 and 2, the computational complexity of the proposed KHLD is . Then, the generated training samples and the original training samples are combined to complete the training of the existing algorithms.

3. Results and Discussion

In this section, the performance of KHLD is evaluated with two artificial datasets: Double Moon (DM) [25] and Concrete Circle (CC); 8 UCI benchmark datasets [26]: Blood, Climate, Heart Disease (HD), Sonar, SPECT Heart (SH), Image Segmentation (IS), Forest, and Wilt; and 1 LIBSVM benchmark dataset [27]. Figure 3 shows the graphical display of two artificial datasets. Except for the DM, CC, and IS datasets, all benchmark datasets are imbalanced datasets. In each dataset, the inputs to all the classifiers are scaled to appropriately [−1, 1]; the classification performance of each network is measured by the overall and average per-category classification accuracies [23]. Table 1 gives the description of the classification datasets.

The performance of KHLD is combined with existing classification algorithms and compared with these algorithms, including SVM [27], ELM [24], HSARBF-ELM, a constrained optimization method based on BP neural network (CO-BP) [28], and an optimized RBF network based on fractional order gradient descent with momentum (FOGDM-RBF) [29]. For SVM, the simulations are implemented with LIBSVM [27]. All these simulations are conducted in MATLAB R2013b running on a PC with 3.2 GHz CPU and 4G RAM. Each algorithm is conducted in 20 trials.

3.1. Artificial Datasets: DM and CC

In this section, two artificial datasets are used to verify the graphical and intuitive characteristics of KHLD. In the phase of classification performance comparison, KHLD is combined with HSARBF-ELM and compared with HSARBF-ELM. Figures 4(a)4(d) give a comparison of the learning and classification effects based on the original training set and the KHLD under the DM dataset. It can be seen that the RBF kernel generated in HSARBF-ELM can effectively cover the sample space. The combination of KHLD and HSARBF-ELM can fill the training sample space and effectively improve the classification performance of the method of HSARBF-ELM.

Figures 5 and 6 show the optimization effect of the kernels and the samples generated in each kernel after adjusting parameters and , respectively, which shows that the adjustment of parameters and has good adaptability to the sample space on DM dataset.

Figure 7 compares the number of samples generated and the classification accuracy under different initial training sets, where . It can be seen that, under the condition of small number of training samples, when the initial kernel width is too small, the established kernel cannot effectively cover the sample space, which leads to the decline of network generalization performance; when the kernel width is large and the number of training samples is sufficient, the performance of the proposed method will also show a certain degree of decline, which shows that the method of KHLD has certain restrictions on the number of training samples and the selection of kernel width parameter.

Figure 8 shows the learning and classification comparison of the HASRBF-ELM network classifier based on the original training set and KHLD under the CC dataset. It can be seen that when the generated kernels of different categories overlap each other seriously, the proposed method can still generate new samples in different kernels and improve the classification performance of the original HASRBF-ELM network classifier, which shows the effectiveness of KHLD method for complex classification problems.

Figure 9 shows the learning effect of the proposed method on the training set as the initial kernel width varies. By changing different kernel width parameters, the method of KHLD can optimize the selection of samples in each kernel. When the kernel width increases, the generated kernels may cover the heterogeneous samples, resulting in the increase of the overlapping of the samples of different pattern categories in the kernel.

Figure 10 further shows the number of generated training samples and the performance comparison of the classification accuracy under different initial training sets, where . When the width parameters of the RBF kernels are in a certain range, the method of KHLD has a good classification effect. Similar to Figure 7(b), when the initial kernel width is too small, the testing accuracy of the proposed method is greatly reduced, which means that the failure of the initial RBF kernel may invalidate the method of kernel holistic learning and further deteriorate the final classification performance. Thus, it is necessary to avoid such a situation. This condition is also a restrictive condition for KHLD in this study.

Figures 11 and 12 show that the combination of KHLD and HSARBF-ELM increases the training time. However, the proposed method improves the network classification performance of HSARBF-ELM, especially when the number of training samples is small. When the number of training samples is sufficient, the proposed method will reduce the performance of HSARBF-ELM to a certain extent, which shows that this method of KHLD is suitable for the situation with less training samples or sparse spatial distribution of samples.

3.2. UCI Benchmark Datasets

Tables 2 and 3 give the comparisons of the classification performance of the proposed method and other learning algorithms under the benchmark sample datasets. It can be seen that, in high-dimensional small sample datasets, the combination of KHLD and other classification algorithms increases the training time. Although the testing results of different classification algorithms on different datasets are different, the combination of KHLD and these classification algorithms can improve the testing accuracy of these algorithms to varying degrees. As an auxiliary method, KHLD is an effective method when the spatial distribution of samples is sparse. The effectiveness of the proposed method can be further verified. However, for the benchmark large sample datasets, the combination of KHLD and existing algorithms reduces the test performance of these algorithms, which further shows that the method of KHLD in this study is not suitable for large sample set learning and classification.

3.3. Discussion of KHLD

In this study, under the premise of the given initial kernel width , according to the optimization method of RBF kernel parameters in HSARBF-ELM, the parameters of KHLD are automatically generated according to the distribution of sample space, where is chosen in . When each kernel parameter is established, the main parameter affecting KHLD is , which determines the number of samples generated in the kernel. is chosen in the . Thus, we mainly discuss the influence of parameters of and on KHLD. Figure 13 shows the stress test when KHLD is combined with HSARBF-ELM in the Climate high-dimensional dataset. In general, when and are in a certain range, the combination of KHLD and HSARBF-ELM can improve the network performance of HSARBF-ELM. When is too low, for example, is set as 0.1 or 0.2, the classification performance of KHLD combined with HSARBF-ELM is poor. The main reason is that the generated kernel cannot effectively cover the sample space, so the effectiveness of the kernel cannot be guaranteed, which leads to the performance degradation of the proposed method. When is too large or is too small, the probability of overlapping samples in the generated kernel will increase, which leads to the performance degradation of the proposed method.

From experiments on multiple datasets, the method of KHLD improves the problem of network generalization performance degradation when the sample size is too small or the sample space distribution is too sparse.

However, when the number of training samples is sufficient or the spatial distribution of training samples is dense, the network performance of the proposed method shows a certain degree of decline compared with the direct training of the classifier. This situation shows that when the constructed kernel can be effectively represented by the existing training samples, the generated samples in the kernels are equivalent to increasing the noise samples, which leads to the redundancy of network training and is not conducive to the improvement of the boundary partition surface. Thus, the proposed method is not suitable for classification problems with sufficient number of training samples or dense spatial distribution of samples. In the selection of parameters, the kernel width should be chosen so that it is not too small or too large. If the kernel width is too small, the validity of the established kernel may not be guaranteed, which makes the method of KHLD ineffective to a certain extent. If the kernel width is too large, the overlapping degree between the samples generated in the kernel and the heterogeneous samples increases, which also leads to the performance degradation of the proposed method.

4. Conclusion

An optimized neural network classifier based on KHLD is presented. The established kernels in KHLD are based on the generated RBF kernel parameters in the HSARBF-ELM algorithm. An optimized sample filling and screening method can realize the effective approximation of KHLD in different classification problems. Combining KHLD with other algorithms can effectively improve the network performance of these algorithms, especially when the sample space distribution is sparse. Experiments on artificial datasets and benchmark datasets further verify the effectiveness of our method.

One of the main shortcomings of this work is the representation of kernels. In this study, for the convenience of problem description, the representation of the kernel is a regular hypersphere. The proposed method is mainly suitable for the case of sparse spatial distribution of samples but is not suitable for large sample set learning and classification. The establishment and representation of kernel are worthy of further study. Exploring more optimized kernel representation and combining it with KHLD are our future work.

Data Availability

The artificial dataset Double Moon comes from “S Hayin, Neural networks and learning machines (Third Edition). Beijing: China Machine Press, China, 2009, pp. 61-63.” The artificial dataset Concrete Circle is generated by the authors. The data are available upon request by email [email protected]. The benchmark datasets Blood, Climate, Heart Disease (HD), Sonar, SPECT Heart (SH), Image Segmentation (IS), Forest, and Wilt come from UCI repository of machine learning databases, available at http://archive.ics.uci.edu/ml. The Svmguide1 dataset comes from LIBSVM databases, available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by the Natural Science Foundation of China (No. 61741111); Natural Science Foundation of Fujian (Nos. 2019J01815 and 2019J01816); Natural Science Foundation of Jiangxi (No. 20181BAB202011); Department of Education of Fujian Province (Nos. JT180486 and FJJKCG20-101); Putian Science and Technology Bureau Project (Nos. 2018RP4004 and 2018ZP10); and Introduction of Talents to Start Scientific Research Projects in Putian University (No. 2018088).