Structured graph learning for clustering and semi-supervised classification
Introduction
As a natural way to represent structure or connections in data, graphs have broad applications including world wide web, social networks, information retrieval, bioinformatics, computer vision, natural language processing, and many others. Some special cases of graph algorithms, such as graph-based clustering [1], [2], graph embedding [3], graph-based semi-supervised classification [4], signal processing [5], have attracted increasing attention in the recent years.
Clustering refers to the task of finding subsets of similar samples and grouping them together, such that samples in the same cluster would share high similarity to each other, whereas samples in different groups are dissimilar [6], [7]. By leveraging a small set of labeled data, semi-supervised classification aims at determining the labels of a large collection of unlabeled samples based on relationships among the samples [8]. In essence, both clustering and semi-supervised classification algorithms are trying to predict labels for samples [9]. As fundamental techniques in machine learning and pattern recognition, they have been facilitating various research fields and have been extensively studied.
Among numerous clustering and semi-supervised classification methods developed in the past decades, graph based techniques often provide impressive performance. In general, these methods consist of two key steps. First, an affinity graph is constructed from all data points to represent the similarity among the samples. Second, spectral clustering [10] algorithm or label propagation [11] method is utilized to obtain the final labels. Therefore, the start step of building graph might heavily impact the subsequent step and finally lead to suboptimal performance. Since underlying structures of data are often unknown in advance, this pose a major challenge for graph construction. Consequently, the final result might be far from optimal. Unfortunately, constructing a good graph that best captures the essential data structure is still known to be fundamentally challenging [12].
The existing strategies to define adjacency graph can be roughly divided into three categories: a) the metric based approaches, which use some functions to measure the similarity among data points [13], such as Cosine, Euclidean distance, Gaussian function; b) the local structure approaches, which induce the similarity by representing each datum as a linear combination of local neighbors [14] or learning a probability value for two points as neighbors [15]; c) the global self-expressiveness property based approaches, which encode each datum as a weighted combination of all other samples, i.e., its direct neighbors and reachable indirect neighbors [16], [17]. The traditional metric based approaches and the local neighbor based methods depend upon the selection of metric or the local neighborhood parameter, which heavily influence final accuracy. Hence, they are not reliable in practice [18].
On the other hand, adaptive neighbor [15] and self-expressiveness approaches [19], [20] automatically learn graph from data. As a matter of fact, they share a similar spirit as locality preserve projection (LPP) and locally linear embedding (LLE), respectively. Different from LPP and LLE, they don’t specify the neighborhood size and predefine the similarity graph. In realistic applications, they enjoy several benefits. First, automatically determining the most informative neighbors for each data point will avoid the inconsistent drawback in widely used k-nearest-neighborhood and ϵ-nearest-neighborhood graph construction techniques, which provide unstable performance with respect to different k or ϵ values [21]. Second, they are independent of measure metric, while traditional methods are often data-dependent and sensitive to noise and outliers [22]. Third, they can tackle data with structures at different scales of size and density [23]. Therefore, they are prefered in practice. For example, [24] performs dimension reduction and graph learning based on adaptive neighbor in a unified framework.
Nevertheless, they emphasize different aspects of data structure information, i.e., local and global, respectively. As demonstrated in many problems, such as dimension reduction [25], feature selection [26], semi-supervised classification [27], clustering [14], local and global structure information are both important to algorithm performance since they can provide complementary information to each other and thus enhance the performance. In the paper, we combine them into a unified framework for graph learning task.
Moreover, most existing graph-based methods conduct clustering and semi-supervised classification on the graph learned from the original data matrix, which doesn’t have explicit cluster structure, thus they might not achieve the optimal performance. For example, the seminal work [20] assumes a low-rank structure of graph, whose solution might not be optimal due to the bias of nuclear norm [28]. Ideally, the achieved graph should have exactly c connected components if there are c clusters or classes. Most existing methods fail to take this information into account. In this paper, we consider rank constraint to meet this requirement. As an extension to our previous work [22], we establish the theoretical connection of our clustering model to kernel k-means and k-means and consider semi-supervised classification application. As an added bonus, graph learning and label inference are seamlessly integrated into a unified objective function. This is quite different from traditional ways, where graph learning and label inference are performed in two separate steps, which easily lead to suboptimal results. To overcome the limitation of single kernel method, we further extend our model to accommodate multiple kernels.
Though there are many other lines of research on graph. For instance, [29] discusses the transformation issue; [30] introduces a fitness metric to learn the adjacency matrix; [31] focuses on the graph that is sampled from a graphon. Different from them, this work aims to learn a graph that has explicit cluster structure. In particular, the number of clusters/classes is employed as a prior knowledge to enhance the quality of graph, which leads to improved performance of clustering and semi-supervised classification. Additionally, graph neural networks (GNN) has gained increasing popularity recently [32]. The main difference between GNN and our method is that GNN targets to process a graph that is already available in existing data, while our method is designed to learn a good graph from feature data for further processing. Hence, our method and GNN focus on different types of data. In practice, feature data is more common than graph data. From this point of view, our method could be useful for GNN applications when the graph is not available or the graph has low quality. As a matter of fact, how to refine the graph used in GNN is a promissing research direction.
To sum up, the main contributions of this paper are:
- 1.
The similarity graph and labels are adaptively learned from the data by preserving both global and local structure information. By leveraging the interactions among them, they are mutually reinforced towards an overall optimal solution.
- 2.
Theoretical analysis shows the connections of our model to kernel k-means, k-means, and spectral clustering methods. Our framework is more general than k-means and kernel k-means. At the same time, it solves the graph construction challenge of spectral clustering.
- 3.
Based on our method with a single kernel, we further extend our model into an integrated framework which can simultaneously learn the similarity graph, labels, and the optimal combination of multiple kernels. Each subtask can be iteratively boosted by using the results of the others.
- 4.
Extensive experiments on real-world data sets are conducted to testify the effectiveness and advantages of our framework over other state-of-the-art clustering and semi-supervised classification algorithms.
The rest of the paper is organized as follows. Section 2 introduces the proposed clustering method based on a single kernel. In Section 3, we show the theoretical analysis of our model. An extended model with multiple kernel learning ability is provided in Section 4. Clustering and semi-supervised classification experimental results and analysis are presented in Section 5 and 6, respectively. Section 7 draws conclusions.
Notations. Given a data set with m features and n instances, its ith sample and (i, j)th element are denoted by and xij, respectively. The ℓ2-norm of xi is denoted as where T means transpose. The definition of squared Frobenius norm is . Irepresents the identity matrix and 1 denotes a column vector with all the elements as one. is the trace operator. 0 ≤ Z ≤ 1 indicates that elements of Z are in the range of [0,1].
Section snippets
Structured graph learning with single kernel
In this section, we first review local and global structure learning, then describe our model and its optimization.
Connection to kernel K-means and K-means clustering
Theorem 2 When α → ∞, the proposed SGSK model is equivalent to a combination of kernel k-means and k-means problems. Proof As aforementioned, the constraint in (6) will make Z block diagonal. Suppose is the similarity graph matrix of the ith component, where ni is the number of data samples in this component. Then problem (6) can be written for each i: where Xi consists of the points in the Zi. When α → ∞, the above problem becomes:
Structured graph learning with multiple kernel
The only input for our proposed model (9) is kernel K. It is well known that the performance of kernel method is strongly dependent on the selection of kernel. It is also time consuming and impractical to exhaustively search the optimal kernel. Multiple kernel learning [39] which lets an algorithm do the picking or combination from a set of candidate kernels is an effective way to tackle this issue. Here we present an approach to identify a suitable kernel or construct a consensus kernel from a
Clustering experiments
In this section, we demonstrate the effectiveness of our proposed method on clustering application.
Semi-supervised classification experiments
In this section, we assess the effectiveness of SGMK on semi-supervised learning (SSL) task.
Conclusion
In this paper, we propose a new graph learning framework by iteratively learning the graph matrix and the labels. Specifically, both local and global structure information is incorporated in our model. We also consider rank constraint on the graph Laplacian, to yield an optimal graph for clustering and classification tasks, so the achieved graph is more informative and discriminative. This turns out to be a unifed model for both graph and label learning, both are improved collaboratively. A
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This paper was in part supported by Grants from the National Key R&D Program of China (No. 2018YFC0807500), the Natural Science Foundation of China (Nos. 61806045, U19A2059), the Sichuan Science and Techology Program under Project 2020YFS0057, the Ministry of Science and Technology of Sichuan Province Program (Nos. 2018GZDZX0048, 20ZDYF0343), the Fundamental Research Fund for the Central Universities under Project ZYGX2019Z015.
Zhao Kang received his Ph.D. degree in Computer Science from Southern Illinois University Carbondale, USA, in 2017. Currently, he is an assistant professor at the School of Computer Science and Engineering at University of Electronic Science and Technology of China (UESTC). His research interests are machine learning, data mining, Pattern Recognit.ion, and deep learning. He has published about 50 research papers in top-tier conferences and journals, including AAAI, IJCAI, ICDE, CVPR, SIGKDD,
References (45)
- et al.
Auto-weighted multi-view clustering via kernelized graph learning
Pattern Recognit.
(2019) - et al.
Auto-weighted multi-view clustering via deep matrix decomposition
Pattern Recognit.
(2020) - et al.
Partition level multiview subspace clustering
Neural Netw.
(2020) - et al.
Graph-optimized locality preserving projections
Pattern Recognit.
(2010) - et al.
Stable local dimensionality reduction approaches
Pattern Recognit.
(2009) - et al.
Relation-guided representation learning
Neural Netw.
(2020) - et al.
Robust structured subspace learning for data representation
IEEE Trans. Pattern Anal. Mach. Intell.
(2015) - et al.
Graph embedding and extensions: a general framework for dimensionality reduction
IEEE Trans. Pattern Anal. Mach. Intell.
(2007) - et al.
Graph based constrained semi-supervised learning framework via label propagation over adaptive neighborhood
IEEE Trans. Knowl. Data Eng.
(2013) - et al.
The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains
IEEE Signal Process. Mag.
(2013)
Compressed k-means for large-scale clustering
Thirty-First AAAI Conference on Artificial Intelligence
Learning semi-supervised representation towards a unified optimization framework for semi-supervised learning
Proceedings of the IEEE International Conference on Computer Vision
Robust graph learning from noisy data
IEEE Trans. Cybern.
On spectral clustering: analysis and an algorithm
Adv. Neural Inf. Process. Syst.
Robust adaptive embedded label propagation with weight learning for inductive classification
IEEE Trans. Neural Netw. Learn. Syst.
Spectral rotation for deep one-step clustering
Pattern Recognit.
Semi-supervised learning using gaussian fields and harmonic functions
Proceedings of the 20th International conference on Machine learning (ICML-03)
Clustering with local and global regularization
IEEE Trans. Knowl. Data Eng.
Multi-view clustering and semi-supervised classification with adaptive neighbours.
AAAI
Twin learning for similarity and clustering: a unified kernel approach.
AAAI
Non-negative low rank and sparse graph for semi-supervised learning
Computer Vision and Pattern Recognit.ion (CVPR), 2012 IEEE Conference on
Influence of graph construction on semi-supervised learning
Joint European Conference on Machine Learning and Knowledge Discovery in Databases
Cited by (115)
A survey on semi-supervised graph clustering
2024, Engineering Applications of Artificial IntelligenceGraph Convolutional Network with elastic topology
2024, Pattern RecognitionLocal kernels based graph learning for multiple kernel clustering
2024, Pattern RecognitionA novel method based on near-infrared imaging spectroscopy and graph-learning to evaluate the dyeing uniformity of polyester yarn
2024, Engineering Applications of Artificial IntelligenceMultiple kernel clustering with local kernel reconstruction and global heat diffusion
2024, Information Fusion
Zhao Kang received his Ph.D. degree in Computer Science from Southern Illinois University Carbondale, USA, in 2017. Currently, he is an assistant professor at the School of Computer Science and Engineering at University of Electronic Science and Technology of China (UESTC). His research interests are machine learning, data mining, Pattern Recognit.ion, and deep learning. He has published about 50 research papers in top-tier conferences and journals, including AAAI, IJCAI, ICDE, CVPR, SIGKDD, ICDM, CIKM, SDM, ACML, IEEE TCyb, ACM TIST, ACM TKDD, Neural Networks, Pattern Recognit.ion. He has been a PC member or reviewer to a number of top conferences such as AAAI, IJCAI, CVPR, ICCV, MM, ICDM, CIKM, ECCV, etc. He regularly servers as a reviewer to JMLR, TPAMI, TNNLS, TCyb, TKDE, TMM, NN, etc.
Chong Peng received his Ph.D. degree in Computer Science from Southern Illinois University Carbondale, USA, in 2017. Currently, he is an assistant professor at Qingdao University. His research interests are machine learning, data mining, computer vision.
Qiang Cheng received the B.S. and M.S. degrees in mathematics, applied mathematics, and computer science from Peking University, China, and the Ph.D. degree from the Department of Electrical and Computer Engineering at the University of Illinois, Urbana-Champaign. Currently, he is an associate professor at the University of Kentucky. He previously was an associate professor at the Southern Illinois University Carbondale, an AFOSR faculty fellow at the Air Force Research Laboratory, Wright-Patterson, Ohio, and a senior researcher and senior research scientist at Siemens Medical Solutions, Siemens Corporate Research, Siemens Corp., Princeton, New Jersey. His research interests include Pattern Recognit.ion, machine learning, signal and image processing, and biomedical informatics. He received various awards and privileges. He was on the organizing committee of a number of international conferences and workshops. He has a number of international patents issued or filed with the IBM T.J. Watson Research Laboratory, Yorktown Heights, Siemens Medical, Princeton, and Southern Illinois University, Carbondale.
Xinwang Liu received his Ph.D. degree from National University of Defense Technology (NUDT), China. He is now Associate Researcher of School of Computer Science, NUDT. His current research interests include kernel learning and unsupervised feature learning. Dr. Liu has published 50+ peer-reviewed papers, including those in highly regarded journals and conferences such as IEEE T-PAMI, IEEE T-IP, IEEE T-NNLS, ICCV, AAAI, IJCAI, etc. He served on the Technical Program Committees of IJCAI 2016–2020 and AAAI 2016–2020.
Xi Peng received the Ph.D. degree in computer science from the Sichuan University, Chengdu, China, in 2013. He currently is a National Distinguished Youth Professor with the College of Computer Science, Sichuan University. Dr. Peng has served as an Associate Editor/Guest Editor for six journals, such as the IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, and an Area Chair/Session Chair/Program Chair/Tutorial Organization Chair for over 30 international conferences, such as the AAAI Conference on Artificial Intelligence (AAAI) and the European Conference on Computer Vision (ECCV).
Zenglin Xu is currently a full professor in University of Electronic Science and Technology of China. He received the Ph.D. degree in computer science and engineering from the Chinese University of Hong Kong. He has been working at Michigan State University, Cluster of Excellence at Saarland University and Max Planck Institute for Informatics, and later Purdue University. Dr. Xu’s research interests include machine learning and its applications in information retrieval, health informatics, and social network analysis. He currently serves as an associate editor of Neural Networks, Neurocomputing and Big Data Analytics. He is the recipient of the outstanding student paper honorable mention of AAAI 2015, the best student paper runner up of ACML 2016, and the 2016 young researcher award from APNNS.
Ling Tian received the B.S., M.S., and Ph.D. degrees from the School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu, China, in 2003, 2006, and 2010, respectively. She is currently a Professor with UESTC. She won second prize of National Technical Invention Award of China in 2017, first prize of Technical Invention Award of Sichuan Province in 2016, and first prize of Science and Technology Progress Award of Sichuan Province in 2015. She has edited two books and holds more than 20 Chinese patents. She has contributed more than ten technology proposals to the standardizations such as China Audio and Video Standard (AVS) and China Cloud Computing Standard. Her research interests include image/video coding, and artificial intelligence.