Skip to main content
Log in

Weighted naïve Bayes text classification algorithm based on improved distance correlation coefficient

  • S.I: Cognitive-inspired Computing and Applications
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

This paper proposes an innovative method to improve the attribute weighting approaches for naïve Bayes text classifiers using the improved distance correlation coefficient. The resulted model is called improved distance correlation coefficient attribute weighted multinomial naïve Bayes, denoted by IDCWMNB. Unlike the traditional correlation statistical measurements that consider the cumulative distribution function of random vectors, the improved distance correlation coefficient tests the joint correlation of random vectors by describing the distance between the joint characteristic function and the product of the marginal characteristic functions. Specifically, a measurement of inverse document frequency that considers the distribution information of document concentrating and scattering has been proposed. Then, the measurement and the distance correlation coefficient between attributes and categories have been combined to measure the importance of attributes to categories, to allocate different weights to different terms. Meanwhile, the learned attribute weights are incorporated into the posterior probability estimates of the multinomial naïve Bayes model, which is known as deep attribute weighting. This measurement is more effective than the traditional statistical measurements in the presence of nonlinear relationship between two random vectors. Experimental results taking benchmark and real-world data indicate that the new attribute weighting method can achieve an effective balance between classification accuracy and execution time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Zhang L, Jiang L, Li C et al (2016) Two feature weighting approaches for naïve Bayes text classifiers. Knowl. Based Syst. 100:137–144

    Article  Google Scholar 

  2. Taniguchi H, Sato H, Shirakawa T (2017) Application of human cognitive mechanisms to naïve Bayes text classifier. In: International conference on numerical analysis and applied mathematics. AIP Publishing LLC

  3. Yahya A, Hisyam L (2018) A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification. Adv Data Anal Classif 13:753–771

    MathSciNet  MATH  Google Scholar 

  4. Khan K, Ahmad N, Khan R (2015) Urdu text classification using decision trees. In: International conference on high-capacity optical networks and enabling/emerging technologies. IEEE

  5. Pang G, Jin H, Jiang S (2015) CenKNN: a scalable and effective text classifier. Data Min Knowl Discov 29:593–625

    Article  MathSciNet  Google Scholar 

  6. Wang Z, Liu J (2015) PU Chinese text classifier based on support vector machine construction. J Nanjing Univ Posts Telecommun 35:100–105

    Google Scholar 

  7. Conneau A, Schwenk H, Barrault L et al (2017) Very deep convolutional networks for text classification. In: Proceedings of 15th Conference on EACL: Long Papers, vol 1, pp 1107–1116

  8. Jiang L, Zhang L, Li C et al (2019) A correlation-based feature weighting filter for naïve Bayes. IEEE Trans Knowl Data Eng 31:201–213

    Article  Google Scholar 

  9. Jiang L, Zhang L, Yu L (2019) Class-specific attribute weighted naïve Bayes. Pattern Recognit 88:321–330

    Article  Google Scholar 

  10. Zaidi N, Cerquides J, Carman M et al (2013) Alleviating naive Bayes attribute independence assumption by attribute weighting. J Mach Learn Res 14:1947–1988

    MathSciNet  MATH  Google Scholar 

  11. Zhang L, Jiang L, Li C (2016) A new feature selection approach to naïve Bayes text classifiers. Int J Pattern Recognit 30:1650003.1-1650003.17

    Google Scholar 

  12. Chen S, Webb G, Liu L et al (2020) A novel selective naive Bayes algorithm. Knowl Based Syst 192:105361

    Article  Google Scholar 

  13. Escalante H, García-Limón M, Morales-Reyes A et al (2015) Term-weighting learning via genetic programming for text classification. Knowl Based Syst 83:176–189

    Article  Google Scholar 

  14. Wang S, Jiang L, Li C (2015) Adapting naïve Bayes tree for text classification. Knowl Inf Syst 44:77–89

    Article  Google Scholar 

  15. Jiang L, Wang S, Li C et al (2016) Structure extended multinomial naïve Bayes. Inf Sci 329:346–356

    Article  Google Scholar 

  16. Kim S, Han K, Rim H et al (2006) Some effective techniques for naïve Bayes text classification. IEEE Trans Knowl Data Eng 18:1457–1466

    Article  Google Scholar 

  17. Li Y, Luo C, Chung S (2012) Weighted naïve Bayes for text classification using positive term-class dependency. Int J Artif Intell Tools 21:1250008-1-1250008–16

    Google Scholar 

  18. Wang S, Jiang L, Li C (2014) A CFS-based feature weighting approach to naïve Bayes text classifiers. In: Proceedings of 24th international conference on artificial neural network. Springer, pp 555–562

  19. Jiang L, Li C, Wang S et al (2016) Deep feature weighting for naïve Bayes and its application to text classification. Eng Appl Artif Intell 52:26–39

    Article  Google Scholar 

  20. Ruan S, Li H, Li C et al (2020) Class-specific deep feature weighting for naïve Bayes text classifiers. IEEE Access 8:20151–20159

    Article  Google Scholar 

  21. Zhang H, Jiang L, Yu L (2020) Class-specific attribute value weighting for naive Bayes. Inf Sci 508:260–274

    Article  Google Scholar 

  22. Tang B, He H, Baggenstoss P et al (2016) A Bayesian classification approach using class-specific features for text categorization. IEEE Trans Knowl Data Eng 28:1602–1606

    Article  Google Scholar 

  23. Youn E, Jeong M (2009) Class dependent feature scaling method using naïve Bayes classifier for text data mining. Pattern Recognit Lett 30:477–485

    Article  Google Scholar 

  24. Kim HJ, Kim J, Kim J et al (2018) Towards perfect text classification with Wikipedia-based semantic naïve Bayes learning. Neurocomputing 315:128–134

    Article  Google Scholar 

  25. Szekely G, Rizzo M, Bakirov NK (2007) Measuring and testing dependence by correlation of distances. J Ann Stat 35:2769–2794

    MathSciNet  MATH  Google Scholar 

  26. Li R, Zhong W, Zhu L (2012) Feature screening via distance correlation learning. J Am Stat Assoc 107:1129–1139

    Article  MathSciNet  Google Scholar 

  27. Sheng W, Yin X (2016) Sufficient dimension reduction via distance covariance. J Comput Graph Stat 25:91–104

    Article  MathSciNet  Google Scholar 

  28. Liu Y, Bi J, Fan Z (2017) Multi-class sentiment classification: the experimental comparisons of feature selection and machine learning algorithms. Expert Syst Appl 80:323–339

    Article  Google Scholar 

  29. McCallum A, Nigam K (1998) A comparison of event models for naïve Bayes text classification. In: Proceedings of AAAI Workshop Learn. Text Categorization, vol 752, pp 41–48

  30. Witten I, Frank E, Hall M (2017) Data mining: practical machine learning tools and techniques, 4th edn. Morgan Kaufmann, San Mateo

    Google Scholar 

  31. Alcalá-Fdez J, Sánchez L et al (2011) KEEL data-mining software tool: data set repository. Integration of algorithms and experimental analysis framework. Multi-Valued Log Soft Comput 17:255–287

    Google Scholar 

  32. Tang C, Zhu Y, Xie B et al (2019) Study on the text categorization of engineering geological investigation. Stat Appl 8:589–597

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers and the editors for their valuable comments and suggestions. The authors would like to thank Chaoguo Tang, the chief engineer in China Railway Erju Group, for providing the real engineering geological survey text data.

Funding

This work is supported by the National Key R&D Program of China (2018YFC1503705), Science and Technology Research Project of Hubei Provincial Department of Education (B2017597).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Shufen Ruan or Hongwei Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ruan, S., Chen, B., Song, K. et al. Weighted naïve Bayes text classification algorithm based on improved distance correlation coefficient. Neural Comput & Applic 34, 2729–2738 (2022). https://doi.org/10.1007/s00521-021-05989-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-05989-6

Keywords

Navigation