An Ensemble Feature Ranking Algorithm for Clustering Analysis

Yu, Jaehong; Zhong, Hua; Kim, Seoung Bum

doi:10.1007/s00357-019-09330-8

An Ensemble Feature Ranking Algorithm for Clustering Analysis

Published: 11 July 2019

Volume 37, pages 462–489, (2020)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Jaehong Yu¹,
Hua Zhong¹ &
Seoung Bum Kim²

751 Accesses
7 Citations
Explore all metrics

Abstract

Feature ranking is a widely used feature selection method. It uses importance scores to evaluate features and selects those with high scores. Conventional unsupervised feature ranking methods do not consider the information on cluster structures; therefore, these methods may be unable to select the relevant features for clustering analysis. To address this limitation, we propose a feature ranking algorithm based on silhouette decomposition. The proposed algorithm calculates the ensemble importance scores by decomposing the average silhouette widths of random subspaces. By doing so, the contribution of a feature in generating cluster structures can be represented more clearly. Experiments on different benchmark data sets examined the properties of the proposed algorithm and compared it with the existing ensemble-based feature ranking methods. The experiments demonstrated that the proposed algorithm outperformed its existing counterparts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on ensemble learning

Article 30 August 2019

Xibin Dong, Zhiwen Yu, … Qianli Ma

A review of unsupervised feature selection methods

Article 29 January 2019

Saúl Solorio-Fernández, J. Ariel Carrasco-Ochoa & José Fco. Martínez-Trinidad

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

Dipti Theng & Kishor K. Bhoyar

Notes

References

Andrews, J. L., & Mcnicholas, P. D. (2014). Variable selection for clustering and classification. Journal of Classification, 31(2), 136–153.
MathSciNet MATH Google Scholar
Arbelaitz, O., Gurrutxaga, I., Muguerrza, J., Pèrez, J. M., & Perona, I. (2013). An extensive comparative study of cluster validity indices. Pattern Recognition, 46(1), 243–256.
Google Scholar
Arthur, D., and Vassilvitskii, S. (2007). “k-means++: the advantages of careful seeding”, in Proceedings of the 18th annual ACM-SIAM symposium on discrete algorithms, 2007, pp. 1027–1035.
Ayad, H. G., & Kamel, M. S. (2008). Cumulative voting consensus method for partitions with variable number of clusters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(1), 160–173.
Google Scholar
Boutsidis, C., Drineas, P., and Mahoney, M.W. (2009), “Unsupervised feature selection for the k-means clustering problem”, in Proceedings of the Advances in Neural Information Processing Systems, pp. 153–161.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
MATH Google Scholar
Caliński, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics-theory and Methods, 3(1), 1–27.
MathSciNet MATH Google Scholar
Chiang, M. M. T., & Mirkin, B. (2010). Intelligent choice of the number of clusters in k-means clustering: an experimental study with different cluster spreads. Journal of Classification, 27(1), 3–40.
MathSciNet MATH Google Scholar
de Amorim, R. C. (2016). A survey on feature weighting based k-means algorithms. Journal of Classification, 33(2), 210–242.
MathSciNet MATH Google Scholar
de Amorim, R. C., & Mirkin, B. (2012). Minkowski metric, feature weighting and anomalous cluster initializing in k-means clustering. Pattern Recognition, 45(3), 1061–1075.
Google Scholar
de Amorim, R. C., Makarenkov, V., & Mirkin, B. (2016). A-Ward_pβ: Effective hierarchical clustering using the Minkowski metric and a fast k-means initialization. Information Sciences, 370, 343–354.
Google Scholar
de Amorim, R. C., Shestakov, A., Mirkin, B., & Makarenkov, V. (2017). The Minkowski central partition as a pointer to a suitable distance exponent and consensus partitioning. Pattern Recognition, 67, 62–72.
Google Scholar
Dy, J. G., & Brodley, C. E. (2004). Feature selection for unsupervised learning. Journal of Machine Learning Research, 5, 845–889.
MathSciNet MATH Google Scholar
Elghazel, H., & Aussem, A. (2015). Unsupervised feature selection with ensemble learning. Machine Learning, 98(1–2), 157–180.
MathSciNet MATH Google Scholar
Fred, A. L., & Jain, A. K. (2005). Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), 835–850.
Google Scholar
Guerra, L., Robles, V., Bielza, C., & Larrañaga, P. (2012). A comparison of clustering quality indices using outliers and noise. Intelligent Data Analysis, 16(4), 703–715.
Google Scholar
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.
MATH Google Scholar
Handl, J., & Knowles, J. (2006). Feature subset selection in unsupervised learning via multiobjective optimization. International Journal of Computational Intelligence Research, 2(3), 217–238.
MathSciNet Google Scholar
Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society: Series C: Applied Statistics, 28(1), 100–108.
MATH Google Scholar
HE, X., CAI, D., and NIYOGI, P. (2005), “Laplacian score for feature selection”, in Proceedings of the Advances in Neural Information Processing Systems, pp. 507–514.
Herrero, J., Dìaz-uriarte, R., & Dopazo, J. (2003). Gene expression data preprocessing. Bioinformatics, 19(5), 655–656.
Google Scholar
Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844.
Google Scholar
Hong, Y., Kwong, S., Chang, Y., & Ren, Q. (2008a). Consensus unsupervised feature ranking from multiple views. Pattern Recognition Letters, 29(5), 595–602.
Google Scholar
Hong, Y., Kwong, S., Chang, Y., & Ren, Q. (2008b). Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recognition, 41(9), 2742–2756.
MATH Google Scholar
Huang, J. Z., Ng, M. K., Rong, H., & LI, Z. (2005). Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5), 657–668.
Google Scholar
Hubrert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.
Google Scholar
Iam-on, N., Boongoen, T., Garrett, S., & Price, C. (2011). A link-based approach to the cluster ensemble problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12), 2396–2409.
Google Scholar
Ketchen, D. J., Jr., & Shook, C. L. (1996). The application of cluster analysis in strategic management research: an analysis and critique. Strategic Management Journal, 17, 441–458.
Google Scholar
Kim, S. B., & Rattakorn, P. (2011). Unsupervised feature selection using weighted principal components. Expert Systems with Applications, 38(5), 5704–5710.
Google Scholar
Kim, E. Y., Kim, S. Y., Ashlock, D., & Nam, D. (2009). MULTI-K: Accurate classification of microarray subtypes using ensemble k-means clustering. BMC Bioinformatics, 10(1), 260.
Google Scholar
Kuncheva, L. I., & Vetrov, D. P. (2006). Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11), 1798–1808.
Google Scholar
Lai, C., Reinders, M. J., & Wessels, L. (2006). Random subspace method for multivariate feature selection. Pattern Recognition Letters, 27(10), 1067–1076.
Google Scholar
Li, F., Zhang, Z., & Jin, C. (2016). Feature selection with partition differentiation entropy for large-scale data sets. Information Sciences, 329, 690–700.
MATH Google Scholar
Liu, Y., Li, Z., Xiong, H., Gao, X., and Wu, J. (2010), “Understanding of internal clustering validation measures”, in Proceedings of IEEE 10th International Conference on Data Mining (ICDM), pp. 911–916.
MacQueen, J. (1967), “Some methods for classification and analysis of multivariate observations”, in Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, 1(14), pp. 281–297.
Makarenkov, V., & Legendre, P. (2001). Optimal variable weighting for ultrametric and additive trees and K-means partitioning: methods and software. Journal of Classification, 18(2), 245–271.
MathSciNet MATH Google Scholar
Mitra, P., Murthy, C. A., & Pal, S. K. (2002). Unsupervised feature selection using feature similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3), 301–312.
Google Scholar
Oehmen, C., & Nieplocha, J. (2006). ScalaBLAST: a scalable implementation of BLAST for high-performance data-intensive bioinformatics analysis. IEEE Transactions on Parallel and Distributed Systems, 17(8), 740–749.
Google Scholar
Panday, D., de Amorim, R. C., & Lane, P. (2018). Feature weighting as a tool for unsupervised feature selection. Information Processing Letters, 129, 44–52.
MathSciNet MATH Google Scholar
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65.
MATH Google Scholar
Steinley, D., & Brusco, M. J. (2007). Initializing k-means batch clustering: a critical evaluation of several techniques. Journal of Classification, 24(1), 99–121.
MathSciNet MATH Google Scholar
Tan, P. N., Steinbach, M., & Kumar, V. (2006). Introduction to data mining. Boston: Addison-Wesley.
Google Scholar
Vendramin, L., Campello, R. J., & Hruschka, E. R. (2010). Relative clustering validity criteria: a comparative overview. Statistical Analysis and Data Mining: the ASA Data Science Journal, 3(4), 209–235.
MathSciNet Google Scholar
Xu, R. F., & Lee, S. J. (2015). Dimensionality reduction by feature clustering for regression problems. Information Sciences, 299, 42–57.
MathSciNet MATH Google Scholar
Yang, C., Wan, B., and Gao, X. (2006), “Effectivity of Internal Validation Techniques for Gene Clustering”, in Proceedings of International Symposium on Biological and Medical Data Analysis, pp. 49–59.
Yu, J., & Kim, S. B. (2016). A density-based Noisy graph partitioning algorithm. Neurocomputing, 175, 473–491.
Google Scholar
Yu, Z., Wang, D., You, J., Wong, H. S., Wu, S., Zhang, J., & Han, G. (2016). Progressive subspace ensemble learning. Pattern Recognition, 60, 692–705.
Google Scholar
Zhang, S., Wong, H. S., Shen, Y., & Xie, D. (2012). A new unsupervised feature ranking method for gene expression data based on consensus affinity. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(4), 1257–1263.
Google Scholar
Zhong, C., Yue, X., Zhang, Z., & Lei, J. (2015). A clustering ensemble: Two-level refined co-association matrix with path-based transformation. Pattern Recognition, 48(8), 2699–2709.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Population Health, School of Medicine, New York University, 180 Madison Avenue, New York, NY, 10016, USA
Jaehong Yu & Hua Zhong
Department of Industrial Management Engineering, Korea University, 145 Anam-Ro, Anam-dong, Seoungbuk-Gu, Seoul, 02841, South Korea
Seoung Bum Kim

Authors

Jaehong Yu
View author publications
You can also search for this author in PubMed Google Scholar
Hua Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Seoung Bum Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seoung Bum Kim.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1. Effect of the Random Shuffling Schemes on the Contribution Scores

In this study, we examined the effect of three random shuffling schemes on the computation of the contribution score. The three random shuffling schemes were: (1) random permutation of values within a feature, (2) normal distribution-based random number generation, and (3) uniform distribution-based random number generation. Figure 12 shows the processes used to calculate the contribution scores with different random shuffling schemes that use the same example data as in Fig. 3.

In Fig. 12, x₁ is an important feature for clustering analysis, but x₂ is irrelevant. As shown in Fig. 12, both the random permutation and the normal distribution–based random number generation can be used successfully to calculate contribution scores. Conversely, when x₂ is replaced with random numbers generated from the uniform distribution, cluster structures become sparse compared with the original cluster structures (Fig. 12c). Hence, the contribution score of x₂ is much larger than those of other random shuffling schemes. These results demonstrate that the contribution scores obtained by the uniform distribution–based random number generation scheme may lead to inappropriate evaluation of feature importance for clustering analysis.

Appendix 2. Investigation of the k_max Value Determination Methods

Because the ik-means algorithm tends to identify more clusters than are actually present in data sets (Chiang and Mirkin 2010; de Amorim et al. 2016), this algorithm can also be used to determine k_max values for the FRSD algorithm. The ik-means algorithm iteratively identifies a set of anomalous patterns farthest from the center of data sets. If the number of observations in the anomalous pattern exceeds a predefined threshold, the anomalous patterns are considered clusters. Otherwise, these patterns are treated as outliers. (In this study, we set the threshold value as one.) The anomalous patterns are removed from the training data set, and this process is repeated until all observations belong to the cluster or are determined to be outliers. The final number of clusters identified can be used as the k_max values for the FRSD algorithm. Table 6 shows the values of k_max obtained from the ik-means algorithm and Eq.12.

Table 6 k_max values obtained from Eq. 12 (\( \min \left(\left\lceil \sqrt{n}\right\rceil, 20\right) \), where n denotes the number of training observations) and the ik-means the algorithm for the FRSD algorithm

Full size table

Table 6 shows that in all cases the values of k_max obtained from the ik-means algorithm are less than those computed by Eq. 12. Moreover, in Armstrong-2002-v2 and in the brain tumor cases, all observations are considered outliers because this algorithm is vulnerable to high dimensionality (de Amorim and Mirkin 2012). In this study, we executed 20 runs of the FRSD algorithm using each k_max setting method (ik-means and Eq. 12) and reported the mean values and standard deviations of the adjusted Rand indices.

As shown in Fig. 13, in most of the cases, Eq. 12 shows similar or better performance compared with the ik-means algorithm because in high-dimensional data, Eq.12 tends to generate more clusters than the ik-means algorithm. Hence, we suggest using Eq. 12 (\( \min \left(\left\lceil \sqrt{n}\right\rceil, 20\right) \)) to determine the k_max value for the FRSD algorithm.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, J., Zhong, H. & Kim, S.B. An Ensemble Feature Ranking Algorithm for Clustering Analysis. J Classif 37, 462–489 (2020). https://doi.org/10.1007/s00357-019-09330-8

Download citation

Published: 11 July 2019
Issue Date: July 2020
DOI: https://doi.org/10.1007/s00357-019-09330-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Ensemble Feature Ranking Algorithm for Clustering Analysis

Abstract

Access this article