Skip to main content
Log in

An efficient generic approach for automatic taxonomy generation using HMMs

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Taxonomies are essential tools for fast information retrieval and classification of knowledge. Many existing techniques for automatic taxonomy generation strongly depend on the specific properties of a particular domain and are consequently hard to apply to other domains. Some attempts have been made to design taxonomies for multiple domains. Unfortunately, they induce high hierarchical classification error rates for some datasets. The automatic design of a taxonomy requires the capability of measuring the similarity between classes. More precisely, the fact that two classes are near intuitively implies that some elements of one class are scattered in the neighborhood of some elements of the other class. This observation is used in this paper to propose a new generic technique for automatic taxonomy generation. A topological analysis of the neighborhood of each instance is first performed. The results of this analysis are used to initialize and train a hidden Markov model for each class. The model of a given class c captures the frequencies of the classes found in the neighborhood of the instances of c, from the most dominant class to the least dominant. The similarities between these models are finally used to derive a taxonomy. Hierarchical classification experiments realized on 20 datasets from various domains showed an average accuracy of \(97.22\%\) and a standard deviation of \(4.11\%\). Comparison results revealed that the proposed approach outperforms existing work with accuracy gains reaching \(38.62\%\) for one dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. https://archive.ics.uci.edu/ml/.

  2. http://marsyas.info/downloads/datasets.html.

  3. https://perso-etis.ensea.fr/sylvain.iloga/GTZAN+/.

  4. http://web.cs.ucla.edu/~yzsun/classes/2017Spring_CS249/Slides/09Evaluation_Clustering.pdf.

  5. https://sourceforge.net/projects/meka/.

References

  1. Sujatha R, Bandaru R, Rao R (2011) Taxonomy construction techniques–issues and challenges. Indian J Comput Sci Eng IJCSE 2(5):661–671

    Google Scholar 

  2. Li T, Anand SS (2008) Automated taxonomy generation for summarizing multi-type relational datasets. In: International conference on data mining (DMIN 2008), Las Vegas, USA, pp 571–577

  3. Treeratpituk P, Khabsa M, Giles CL (2013) Graph-based approach to automatic taxonomy generation (grabtax). arXiv preprint arXiv:1307.1718

  4. Kang D-K, Silvescu A, Zhang J, Honavar V (2004) Generation of attribute value taxonomies from data for data-driven construction of accurate and compact classifiers. In: Fourth IEEE International Conference on Data Mining (ICDM’04), pp 130–137. IEEE

  5. Punera K, Rajan S, Ghosh J (2006) Automatic construction of n-ary tree based taxonomies. In: null, pp 75–79. IEEE

  6. Jo H, Na Y-C, Oh B, Yang J, Honavar V (2008) Attribute value taxonomy generation through matrix based adaptive genetic algorithm. In: 2008 20th IEEE International Conference on Tools with Artificial Intelligence, vol 1, pp 393–400. IEEE

  7. Kang D-K, Sohn K (2009) Learning decision trees with taxonomy of propositionalized attributes. Pattern Recognit 42(1):84–92

    Article  Google Scholar 

  8. Cagliero L, Garza P (2013) Improving classification models with taxonomy information. Data Knowl Eng 86:85–101

    Article  Google Scholar 

  9. Iloga S, Romain O, Tchuenté M (2019) A sequential pattern mining approach to design taxonomies for hierarchical music genre recognition. Pattern Anal Appl 21(2):363–380

    Article  MathSciNet  Google Scholar 

  10. Iloga S, Romain O, Tchuenté M (2019) An accurate hmm-based similarity measure between finite sets of histograms. Pattern Anal Appl 22(3):1079–1104

    Article  MathSciNet  Google Scholar 

  11. Chien L-F, Huang C-C, Teng J-W, Chuang S-L (2002) Automatic taxonomy generation for speech archives. In: International Symposium on Chinese Spoken Language Processing

  12. Yang H, Callan J (2009) A metric-based framework for automatic taxonomy induction. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp 271–279

  13. Liu X, Song Y, Liu S, Wang H (2012) Automatic taxonomy construction from keywords. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1433–1441. ACM

  14. Mao Y, Ren X, Shen J, Gu X, Han J (2018) End-to-end reinforcement learning for automatic taxonomy induction. arXiv preprint arXiv:1805.04044

  15. Sánchez D, Moreno A (2004) Automatic generation of taxonomies from the www. In: International Conference on Practical Aspects of Knowledge Management, pp 208–219. Springer

  16. Costa E, Lorena A, Carvalho ACPLF, Freitas A (2007) A review of performance evaluation measures for hierarchical classifiers. In: Evaluation methods for machine learning II: Papers from the AAAI-2007 workshop, pp 1–6

  17. Sritha S, Mathumathi B (2016) A survey on various approaches for taxonomy construction. Indian J Innov Dev 5:6

    Google Scholar 

  18. Burred JJ, Lerch A (2003) A hierarchical approach to automatic musical genre classification. In: Proceedings of the 6th international conference on digital audio effects, pp 8–11. Citeseer

  19. Li T, Ogihara M (2005) Music genre classification with taxonomy. In: Proceedings.(ICASSP’05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005, vol 5, pp v–197. IEEE

  20. Brecheisen S, Kriegel H-P, Kunath P, Pryakhin A (2006) Hierarchical genre classification for large music collections. In: 2006 IEEE international conference on multimedia and expo, pp 1385–1388. IEEE

  21. Silla JCN, Freitas AA, et al (2009) Novel top-down approaches for hierarchical classification and their application to automatic music genre classification. In: SMC, pp 3499–3504

  22. Zhang L , Liu S, Pan Y, Yang L (2004) Infoanalyzer: a computer-aided tool for building enterprise taxonomies. In: Proceedings of the thirteenth ACM international conference on Information and knowledge management, pp 477–483. ACM

  23. Gates SC, Teiken W, Cheng K-SF (2005) Taxonomies by the numbers: building high-performance taxonomies. In: Proceedings of the 14th ACM international conference on Information and knowledge management, pp 568–577. ACM

  24. Picca D, Popescu A (2007) Using wikipedia and supersense tagging for semi-automatic complex taxonomy construction. In: Computer aided language processing workshop 2007, Wolverhampton

  25. Pachet F, Cazaly D (2000) A taxonomy of musical genres. In: Content-Based Multimedia Information Access-Volume 2, pp 1238–1245. LE CENTRE DE HAUTES ETUDES INTERNATIONALES D’INFORMATIQUE DOCUMENTAIRE

  26. Sasirekha K, Baby P (2013) Agglomerative hierarchical clustering algorithm-a. Int J Sci Res Publ 83:83

    Google Scholar 

  27. Li T, Anand SS (2007) Diva: a variance-based clustering approach for multi-type relational data. In: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp 147–156. ACM

  28. Karypis G, Kumar V (1998) Multilevel algorithms for multi-constraint graph partitioning. In: IEEE/ACM Conference on Supercomputing, 1998, SC98, pp 28–28. IEEE

  29. Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392

    Article  MathSciNet  Google Scholar 

  30. Panchenko A, Faralli S, Ruppert E, Remus S, Naets H, Fairon C, Ponzetto SP, Biemann C (2016) Taxi at semeval-2016 task 13: a taxonomy induction method based on lexico-syntactic patterns, substrings and focused crawling. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp 1320–1327, 2016

  31. Bansal M, Burkett D, De MG, Klein D (2014) Structured learning for taxonomy induction with belief propagation. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1041–1051

  32. Tan P-N, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 32–41. ACM

  33. Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th international joint conference on artificial intelligence (IJCAI-93), Chambèry, pp 1022–1027

  34. Richard CD, Anil KJ (1988) Algorithms for clustering data. Prentice Hall, NJ

    MATH  Google Scholar 

  35. Thair NP (2009) Survey of classification techniques in data mining. Proc Int MultiConf Eng Comput Sci 1:18–20

    Google Scholar 

  36. Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M-C (2001) Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth. In: ICCCN, pp 0215. IEEE

  37. Lesh N, Zaki MJ, Oglhara M (2000) Scalable feature mining for sequential data. IEEE Intell Syst Appl 15(2):48–56

    Article  Google Scholar 

  38. Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286

    Article  Google Scholar 

  39. Lidy TRA (2005) Evaluation of feature extractors and psycho-acoustic transformations for music genre classification. In: ISMIR, pp 34–41

  40. Bahlmann C, Burkhardt H (2001) Measuring hmm similarity with the bayes probability of error and its application to online handwriting recognition. In: ICDAR, p 0406. IEEE

  41. Chen L, Man H (2005) Fast schemes for computing similarities between gaussian hmms and their applications in texture image classification. EURASIP J Adv Signal Process 2005(13):164742

    Article  Google Scholar 

  42. Falkhausen M, Reininger H, Wolf D (1995) Calculation of distance measures between hidden Markov models. In: Fourth European Conference on Speech Communication and Technology

  43. Lyngso RB, Pedersen CN, Nielsen H (1999) Metrics and similarity measures for hidden Markov models. In: Proc Int Conf Intell Syst Mol Biol, pp 178–186

  44. Sahraeian SME, Yoon B-J (2011) A novel low-complexity hmm similarity measure. IEEE Signal Process Lett 18(2):87–90

    Article  Google Scholar 

  45. Do MN (2003) Fast approximation of kullback-leibler distance for dependence trees and hidden Markov models. IEEE Signal Process Lett 10(4):115–118

    Article  Google Scholar 

  46. Silva J, Narayanan S (2008) Upper bound kullback-leibler divergence for transient hidden Markov models. IEEE Trans Signal Process 56(9):4176–4188

    Article  MathSciNet  Google Scholar 

  47. Zeng J, Duan J, Chengrong W (2010) A new distance measure for hidden Markov models. Expert Syst Appl 37(2):1550–1555

    Article  Google Scholar 

  48. Tan P-N, Steinbach M, Kumar V (2016) Introduction to data mining. Pearson Education India

  49. Iloga S, Romain O, Bendaouia L, Tchuente M (2014) Musical genres classification using Markov models. In: 2014 international conference on audio, language and image processing (ICALIP), pp 701–705. IEEE

  50. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sylvain Iloga.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Iloga, S., Romain, O. & Tchuenté, M. An efficient generic approach for automatic taxonomy generation using HMMs. Pattern Anal Applic 24, 243–262 (2021). https://doi.org/10.1007/s10044-020-00918-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-020-00918-0

Keywords

Navigation