Abstract
Neural Networks are well known for its performance to classify and cluster data sets via multiple layers of networks passing and transforming information pictured by raw data. The feature layer projects the raw data into a space spanned by hidden features. To understand data representations in both original (i.e., image) and feature spaces, the main purpose of this research is to analyze the clustering performance with different feature representations. Naturally, distance measures have a great impact on clustering performance. Different distances and their combinations are tested on both the original and feature spaces. The combined distances were obtained by using different optimal weights that minimize classification errors in different measures via a series of optimization models. These weights were multiplied by their respective distances in order to create the combined distance. Clustering was evaluated using silhouette scores. The feature space in general has better performance, in terms of clustering, than the image space, with Cosine Similarity being the best distance for both the image space and feature space.
Similar content being viewed by others
Data availibility
Enquiries about data availability should be directed to the authors.
References
A Euijoon et al. (2019) Unsupervised feature learning with K-means and an ensemble of deep convolutional neural networks for medical image classification. In: arXiv preprint arXiv:1906.03359
Deepak Sinwar, Rahul Kaushik (2014) Study of Euclidean and Manhattan distance metrics using simple k-means clustering. Int J Res Appl Sci Eng Technol 2.5:270–274
Finley T, Joachims T (2005) Supervised clustering with support vector machines. In: proceedings of the 22nd international conference on Machine learning. pp. 217–224
Finley T, Joachims T. (2008) Supervised k-means clustering. Tech Rep
Francis Bach, Michael Jordan (2004) Learning spectral clustering. Adv Neural Inf Process Syst 16.2:305–312
Haider P, Brefeld U, Scheffer T (2007) Supervised clustering of streaming data for email batch detection. In: proceedings of the 24th international conference on Machine learning. pp. 345– 352
Jianchang Mao, Jain Anil K (1995) Artificial neural networks for feature extraction and multivariate data projection. IEEE Trans Neural Netw 6.2:296–317
Kr PB, Sukumar N, Viswanath P (2011) A distance based clustering method for arbitrary shaped clusters in large datasets. Pattern Recogn 44.12:2862–2870
Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recogn Biometrics 36(2):451–461. https://doi.org/10.1016/S0031-3203(02)00060-2
Mao Yunxiang, Yin Zhaozheng, Schober Joseph (2016) A deep convolutional neural network trained on representative samples for circulating tumor cell detection. In: 2016 IEEE Winter Conference. IEEE. pp. 1–6
Merigó José M, Casanovas Montserrat (2011) A new Minkowski distance based on induced aggregation operators. Int J Comput Intell Syst 4.2:123–133. https://doi.org/10.1080/18756891.2011.9727769
Park HS, Jun CH (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36(2):3336–3341. https://doi.org/10.1016/j.eswa.2008.01.039
Per-Erik Danielsson (1980) Euclidean distance mapping. Comput Graph Image Process 14.3:227–248
Rahutomo F, Kitasuka T, Aritsugi M (2012) Semantic cosine similarity. In: The 7th international student conference on advanced science and technology ICAST. Vol. 4. 1. p. 1
Roland Coghetto (2016) Chebyshev distance. Formal Math 24.2:121–141
Schleider Lily, Pasiliao Eduardo L, Zheng Qipeng P (2020) Graph-Based Supervised Clustering in Vector Space. In: international conference on computational data and social networks. Ed. by Sriram Chellappan, Kim-Kwang Raymond Choo, and NhatHai Phan, pp. 476–486
Xie J, Girshick R, Farhadi A. (2016) Unsupervised deep embedding for clustering analysis. In: international conference on machine learning. PMLR. pp. 478–487
Yang J, Parikh D, Batra D (2016) Joint unsupervised learning of deep representations and image clusters. In: proceedings of the IEEE conference on computer vision and pattern recognition. pp. 5147–5156
Acknowledgements
This article is based on basic research works supported by AFRL Mathematical Modeling and Optimization Institute.
Funding
The work was supported in part by the U.S. Air Force Research Laboratory (AFRL) award FA8651-16-2-0009.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We do not have competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Disclaimer: comparisons and improvements to the previous conference article.
Rights and permissions
About this article
Cite this article
Schleider, L., Pasiliao, E.L., Qiang, Z. et al. A study of feature representation via neural network feature extraction and weighted distance for clustering. J Comb Optim 44, 3083–3105 (2022). https://doi.org/10.1007/s10878-022-00849-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-022-00849-y