Skip to main content
Log in

A Parallel Data Mining Approach Based on Segmentation and Pruning Optimization

  • Published:
Automatic Control and Computer Sciences Aims and scope Submit manuscript

Abstract

Parallel optimization is one of the important research topics of data mining at this stage. Taking CART parallelization as an example, a parallel data mining algorithm based on segmentation and pruning optimization is proposed, namely SSP-OGini-PCCP optimization. Aiming at the problem of choosing the best CART segmentation point, this paper designs an S-SP model without data association; and in order to calculate the Gini index efficiently, a parallel OGini calculation method is designed. In addition, in order to improve the efficiency of the pruning algorithm, a synchronous PCCP pruning strategy is proposed in this paper. In this paper, the optimal segmentation calculation, Gini index calculation, and pruning algorithm are studied in depth. These are important components of parallel data mining. By constructing a distributed cluster simulation system based on SPARK, data mining methods based on SSP-OGini-PCCP are tested. The experimental results show that this method can significantly improve the efficiency of data classification and decision making, which meets the high demands of contemporary mass data processing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.

Similar content being viewed by others

REFERENCES

  1. Guerine, M., Rosseti, I., and Plastino, A., A hybrid data mining heuristic to solve the point-feature cartographic label placement problem, Int. Trans. Oper. Res., 2020, vol. 27, no. 2, pp. 1189–1209.

    Article  MathSciNet  Google Scholar 

  2. Bommert, A., Sun, X., and Bischl, B., Benchmark for filter methods for feature selection in high-dimensional classification data, Comput. Stat. Data Anal., 2020, vol. 143.

  3. Moral-Garcia, S., Mantas, C.J., and Castellano, J.G., Bagging of credal decision trees for imprecise classification, Expert Syst. Appl., 2020, vol. 141.

  4. Wang, Q., Luo, Y., and Han, X., Research on estimation model of the battery state of charge in a hybrid electric vehicle based on the classification and regression tree, Math. Comput. Modell. Dyn. Syst., 2019, vol. 25, no. 4, pp. 376–396.

    Article  Google Scholar 

  5. Arifuzzaman, M., Gazder, U., and Alam, M.S., Modelling of asphalt’s adhesive behaviour using classification and regression tree (CART) analysis, Comput. Intell. Neurosci., 2019, vol. 2019.

  6. Rutkowski, L., Jaworski, M., Pietruczuk, L., and Duda, P., The cart decision tree for mining data streams, Inf. Sci., 2014, vol. 266, pp. 1–15.

    Article  Google Scholar 

  7. Agnieszka, M., Classification and regression tree theory application for assessment of building damage caused by surface deformation, Nat. Hazards, 2014, vol. 73, no. 2, pp. 317–334.

    Article  Google Scholar 

  8. Bertini, J.R., Nicoletti, M.D., and Zhao, L., An embedded imputation method via attribute-based decision graphs, Expert Syst. Appl., 2016, vol. 57, pp. 159–177.

    Article  Google Scholar 

  9. Ala’raj, M. and Abbod, M.F., Classifiers consensus system approach for credit scoring, Knowl.-Based Syst., 2016, vol. 104, pp. 89–105.

    Article  Google Scholar 

  10. Salmam, F.Z., Fakir, M., and Errattahi, R., Prediction in OLAP data cubes, J. Inf. Knowl. Manage., 2016, vol. 15, no. 2.

  11. Hernandez-Cabronero, M., Blanes, I., Pinho, A.J., et al., Progressive lossy-to-lossless compression of DNA microarray images, IEEE Signal Proc. Lett., vol. 32, no. 5, pp. 698–702.

  12. Aparicio, J., Pastor, J.T., and Vidal, F., The weighted additive distance function, Eur. J. Oper. Res., 2016, vol. 254, no. 1, pp. 338–346.

    Article  MathSciNet  Google Scholar 

  13. Luo, Y.Y., Wang, K.L., Chen, C., and Mao, Y.F., Improved CART arithmetic combined with degrees study, Comput. Eng. Des., 2007, vol. 28, no. 7, pp. 1520–1522.

    Google Scholar 

  14. Zhang, L. and Ning, Q., Two improvements on CART decision tree and its application, Comput. Eng. Des., 2015, vol. 36, no. 5, pp. 1209–1213.

    Google Scholar 

  15. Qian, J.L. and Xun, E.D., Identification of Chinese prosodic phrase based on CART, Comput. Eng. Appl., 2008, vol. 44, no. 6, pp. 169–171.

    Google Scholar 

  16. Zhang, S.C., Cheng, D.B., Zong, M., and Gao, L.L., Self-representation nearest neighbor search for classification, Neurocomputing, 2016, vol. 195, pp. 137–142.

    Article  Google Scholar 

  17. Guo, Y.H. and Zhou, W.M., CART algorithm in International Trade Early Warning, Microcomput. Inf., 2012, vol. 28, no. 10, pp. 248–249.

    Google Scholar 

  18. Chan, C.K., Loh, W.P., and Abd Rahim, Human motion classification using 2D stick-model matching regression coefficients, Appl. Math. Comput., 2016, vol. 283, pp. 70–89.

    MathSciNet  Google Scholar 

  19. Zhang, C.S., Liu, C.C., and Zhang, X.L., An up-to-date comparison of state-of-the-art classification algorithms, Expert Syst. Appl., 2017, vol. 82, pp. 128–150.

    Article  Google Scholar 

  20. Fernandez, A., Jose Carmona, C., and Jose del Jesus, M., A Pareto-based ensemble with feature and instance selection for learning from multi-class imbalanced datasets, Int. J. Neural Syst., 2017, vol. 27, no. 6.

  21. Huang, K., Ji, F., and Xie, Z., Artificial liver support system therapy in acute-on-chronic hepatitis B liver failure: Classification and regression tree analysis, Sci. Rep., 2019, vol. 9.

  22. Moral-Garcia, S., Mantas, C.J., and Castellano, J.G., Bagging of credal decision trees for imprecise classification, Expert Syst. Appl., 2020, vol. 141.

Download references

Funding

This work was supported by National Natural Science Foundation of China (no. 61702059), Research Fund of Guangxi Key Lab of Multi-source Information Mining and Security (MIMS18-03), and the Fundamental Research Funds for the Central Universities (2018CDGFCH0020).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yunfei Yin.

Ethics declarations

The authors declare no conflict of interest.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiameng Wang, Yin, Y. & Deng, X. A Parallel Data Mining Approach Based on Segmentation and Pruning Optimization. Aut. Control Comp. Sci. 54, 483–492 (2020). https://doi.org/10.3103/S0146411620060097

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0146411620060097

Keywords:

Navigation