Elsevier

Pattern Recognition

Volume 110, February 2021, 107627
Pattern Recognition

Structured graph learning for clustering and semi-supervised classification

https://doi.org/10.1016/j.patcog.2020.107627Get rights and content

Highlights

  • A graph learning framework, which captures both the global and local structure in data, is proposed.

  • Theoretical analysis builds the connections of our model to k-means, spectral clustering, and kernel k-means.

  • Extensions to semi-supervised classification and multiple kernel learning are presented.

Abstract

Graphs have become increasingly popular in modeling structures and interactions in a wide variety of problems during the last decade. Graph-based clustering and semi-supervised classification techniques have shown impressive performance. This paper proposes a graph learning framework to preserve both the local and global structure of data. Specifically, our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure. Furthermore, most existing graph-based methods conduct clustering and semi-supervised classification on the graph learned from the original data matrix, which doesn’t have explicit cluster structure, thus they might not achieve the optimal performance. By considering rank constraint, the achieved graph will have exactly c connected components if there are c clusters or classes. As a byproduct of this, graph learning and label inference are jointly and iteratively implemented in a principled way. Theoretically, we show that our model is equivalent to a combination of kernel k-means and k-means methods under certain condition. Extensive experiments on clustering and semi-supervised classification demonstrate that the proposed method outperforms other state-of-the-art methods.

Introduction

As a natural way to represent structure or connections in data, graphs have broad applications including world wide web, social networks, information retrieval, bioinformatics, computer vision, natural language processing, and many others. Some special cases of graph algorithms, such as graph-based clustering [1], [2], graph embedding [3], graph-based semi-supervised classification [4], signal processing [5], have attracted increasing attention in the recent years.

Clustering refers to the task of finding subsets of similar samples and grouping them together, such that samples in the same cluster would share high similarity to each other, whereas samples in different groups are dissimilar [6], [7]. By leveraging a small set of labeled data, semi-supervised classification aims at determining the labels of a large collection of unlabeled samples based on relationships among the samples [8]. In essence, both clustering and semi-supervised classification algorithms are trying to predict labels for samples [9]. As fundamental techniques in machine learning and pattern recognition, they have been facilitating various research fields and have been extensively studied.

Among numerous clustering and semi-supervised classification methods developed in the past decades, graph based techniques often provide impressive performance. In general, these methods consist of two key steps. First, an affinity graph is constructed from all data points to represent the similarity among the samples. Second, spectral clustering [10] algorithm or label propagation [11] method is utilized to obtain the final labels. Therefore, the start step of building graph might heavily impact the subsequent step and finally lead to suboptimal performance. Since underlying structures of data are often unknown in advance, this pose a major challenge for graph construction. Consequently, the final result might be far from optimal. Unfortunately, constructing a good graph that best captures the essential data structure is still known to be fundamentally challenging [12].

The existing strategies to define adjacency graph can be roughly divided into three categories: a) the metric based approaches, which use some functions to measure the similarity among data points [13], such as Cosine, Euclidean distance, Gaussian function; b) the local structure approaches, which induce the similarity by representing each datum as a linear combination of local neighbors [14] or learning a probability value for two points as neighbors [15]; c) the global self-expressiveness property based approaches, which encode each datum as a weighted combination of all other samples, i.e., its direct neighbors and reachable indirect neighbors [16], [17]. The traditional metric based approaches and the local neighbor based methods depend upon the selection of metric or the local neighborhood parameter, which heavily influence final accuracy. Hence, they are not reliable in practice [18].

On the other hand, adaptive neighbor [15] and self-expressiveness approaches [19], [20] automatically learn graph from data. As a matter of fact, they share a similar spirit as locality preserve projection (LPP) and locally linear embedding (LLE), respectively. Different from LPP and LLE, they don’t specify the neighborhood size and predefine the similarity graph. In realistic applications, they enjoy several benefits. First, automatically determining the most informative neighbors for each data point will avoid the inconsistent drawback in widely used k-nearest-neighborhood and ϵ-nearest-neighborhood graph construction techniques, which provide unstable performance with respect to different k or ϵ values [21]. Second, they are independent of measure metric, while traditional methods are often data-dependent and sensitive to noise and outliers [22]. Third, they can tackle data with structures at different scales of size and density [23]. Therefore, they are prefered in practice. For example, [24] performs dimension reduction and graph learning based on adaptive neighbor in a unified framework.

Nevertheless, they emphasize different aspects of data structure information, i.e., local and global, respectively. As demonstrated in many problems, such as dimension reduction [25], feature selection [26], semi-supervised classification [27], clustering [14], local and global structure information are both important to algorithm performance since they can provide complementary information to each other and thus enhance the performance. In the paper, we combine them into a unified framework for graph learning task.

Moreover, most existing graph-based methods conduct clustering and semi-supervised classification on the graph learned from the original data matrix, which doesn’t have explicit cluster structure, thus they might not achieve the optimal performance. For example, the seminal work [20] assumes a low-rank structure of graph, whose solution might not be optimal due to the bias of nuclear norm [28]. Ideally, the achieved graph should have exactly c connected components if there are c clusters or classes. Most existing methods fail to take this information into account. In this paper, we consider rank constraint to meet this requirement. As an extension to our previous work [22], we establish the theoretical connection of our clustering model to kernel k-means and k-means and consider semi-supervised classification application. As an added bonus, graph learning and label inference are seamlessly integrated into a unified objective function. This is quite different from traditional ways, where graph learning and label inference are performed in two separate steps, which easily lead to suboptimal results. To overcome the limitation of single kernel method, we further extend our model to accommodate multiple kernels.

Though there are many other lines of research on graph. For instance, [29] discusses the transformation issue; [30] introduces a fitness metric to learn the adjacency matrix; [31] focuses on the graph that is sampled from a graphon. Different from them, this work aims to learn a graph that has explicit cluster structure. In particular, the number of clusters/classes is employed as a prior knowledge to enhance the quality of graph, which leads to improved performance of clustering and semi-supervised classification. Additionally, graph neural networks (GNN) has gained increasing popularity recently [32]. The main difference between GNN and our method is that GNN targets to process a graph that is already available in existing data, while our method is designed to learn a good graph from feature data for further processing. Hence, our method and GNN focus on different types of data. In practice, feature data is more common than graph data. From this point of view, our method could be useful for GNN applications when the graph is not available or the graph has low quality. As a matter of fact, how to refine the graph used in GNN is a promissing research direction.

To sum up, the main contributions of this paper are:

  • 1.

    The similarity graph and labels are adaptively learned from the data by preserving both global and local structure information. By leveraging the interactions among them, they are mutually reinforced towards an overall optimal solution.

  • 2.

    Theoretical analysis shows the connections of our model to kernel k-means, k-means, and spectral clustering methods. Our framework is more general than k-means and kernel k-means. At the same time, it solves the graph construction challenge of spectral clustering.

  • 3.

    Based on our method with a single kernel, we further extend our model into an integrated framework which can simultaneously learn the similarity graph, labels, and the optimal combination of multiple kernels. Each subtask can be iteratively boosted by using the results of the others.

  • 4.

    Extensive experiments on real-world data sets are conducted to testify the effectiveness and advantages of our framework over other state-of-the-art clustering and semi-supervised classification algorithms.

The rest of the paper is organized as follows. Section 2 introduces the proposed clustering method based on a single kernel. In Section 3, we show the theoretical analysis of our model. An extended model with multiple kernel learning ability is provided in Section 4. Clustering and semi-supervised classification experimental results and analysis are presented in Section 5 and 6, respectively. Section 7 draws conclusions.

Notations. Given a data set XRn×mwith m features and n instances, its ith sample and (i, j)th element are denoted by xiRm×1and xij, respectively. The ℓ2-norm of xi is denoted as xi=xiT·xi,where T means transpose. The definition of squared Frobenius norm is XF2=ijxij2. Irepresents the identity matrix and 1 denotes a column vector with all the elements as one. Tr()˙is the trace operator. 0 ≤ Z ≤ 1 indicates that elements of Z are in the range of [0,1].

Section snippets

Structured graph learning with single kernel

In this section, we first review local and global structure learning, then describe our model and its optimization.

Connection to kernel K-means and K-means clustering

Theorem 2

When α → ∞, the proposed SGSK model is equivalent to a combination of kernel k-means and k-means problems.

Proof

As aforementioned, the constraint rank(L)=ncin (6) will make Z block diagonal. Suppose ZiRni×niis the similarity graph matrix of the ith component, where ni is the number of data samples in this component. Then problem (6) can be written for each i:minZiϕ(Xi)ϕ(Xi)ZiF2+Tr(ZiTDix)+αZiF2s.t.ZiT1=1,0Zi1, where Xi consists of the points in the Zi. When α → ∞, the above problem becomes:

Structured graph learning with multiple kernel

The only input for our proposed model (9) is kernel K. It is well known that the performance of kernel method is strongly dependent on the selection of kernel. It is also time consuming and impractical to exhaustively search the optimal kernel. Multiple kernel learning [39] which lets an algorithm do the picking or combination from a set of candidate kernels is an effective way to tackle this issue. Here we present an approach to identify a suitable kernel or construct a consensus kernel from a

Clustering experiments

In this section, we demonstrate the effectiveness of our proposed method on clustering application.

Semi-supervised classification experiments

In this section, we assess the effectiveness of SGMK on semi-supervised learning (SSL) task.

Conclusion

In this paper, we propose a new graph learning framework by iteratively learning the graph matrix and the labels. Specifically, both local and global structure information is incorporated in our model. We also consider rank constraint on the graph Laplacian, to yield an optimal graph for clustering and classification tasks, so the achieved graph is more informative and discriminative. This turns out to be a unifed model for both graph and label learning, both are improved collaboratively. A

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This paper was in part supported by Grants from the National Key R&D Program of China (No. 2018YFC0807500), the Natural Science Foundation of China (Nos. 61806045, U19A2059), the Sichuan Science and Techology Program under Project 2020YFS0057, the Ministry of Science and Technology of Sichuan Province Program (Nos. 2018GZDZX0048, 20ZDYF0343), the Fundamental Research Fund for the Central Universities under Project ZYGX2019Z015.

Zhao Kang received his Ph.D. degree in Computer Science from Southern Illinois University Carbondale, USA, in 2017. Currently, he is an assistant professor at the School of Computer Science and Engineering at University of Electronic Science and Technology of China (UESTC). His research interests are machine learning, data mining, Pattern Recognit.ion, and deep learning. He has published about 50 research papers in top-tier conferences and journals, including AAAI, IJCAI, ICDE, CVPR, SIGKDD,

References (45)

  • X. Shen et al.

    Compressed k-means for large-scale clustering

    Thirty-First AAAI Conference on Artificial Intelligence

    (2017)
  • C.-G. Li et al.

    Learning semi-supervised representation towards a unified optimization framework for semi-supervised learning

    Proceedings of the IEEE International Conference on Computer Vision

    (2015)
  • Z. Kang et al.

    Robust graph learning from noisy data

    IEEE Trans. Cybern.

    (2020)
  • A.Y. Ng et al.

    On spectral clustering: analysis and an algorithm

    Adv. Neural Inf. Process. Syst.

    (2002)
  • Z. Zhang et al.

    Robust adaptive embedded label propagation with weight learning for inductive classification

    IEEE Trans. Neural Netw. Learn. Syst.

    (2017)
  • X. Zhu et al.

    Spectral rotation for deep one-step clustering

    Pattern Recognit.

    (2019)
  • X. Zhu et al.

    Semi-supervised learning using gaussian fields and harmonic functions

    Proceedings of the 20th International conference on Machine learning (ICML-03)

    (2003)
  • F. Wang et al.

    Clustering with local and global regularization

    IEEE Trans. Knowl. Data Eng.

    (2009)
  • F. Nie et al.

    Multi-view clustering and semi-supervised classification with adaptive neighbours.

    AAAI

    (2017)
  • Z. Kang et al.

    Twin learning for similarity and clustering: a unified kernel approach.

    AAAI

    (2017)
  • L. Zhuang et al.

    Non-negative low rank and sparse graph for semi-supervised learning

    Computer Vision and Pattern Recognit.ion (CVPR), 2012 IEEE Conference on

    (2012)
  • C.A.R. de Sousa et al.

    Influence of graph construction on semi-supervised learning

    Joint European Conference on Machine Learning and Knowledge Discovery in Databases

    (2013)
  • Cited by (115)

    • A survey on semi-supervised graph clustering

      2024, Engineering Applications of Artificial Intelligence
    View all citing articles on Scopus

    Zhao Kang received his Ph.D. degree in Computer Science from Southern Illinois University Carbondale, USA, in 2017. Currently, he is an assistant professor at the School of Computer Science and Engineering at University of Electronic Science and Technology of China (UESTC). His research interests are machine learning, data mining, Pattern Recognit.ion, and deep learning. He has published about 50 research papers in top-tier conferences and journals, including AAAI, IJCAI, ICDE, CVPR, SIGKDD, ICDM, CIKM, SDM, ACML, IEEE TCyb, ACM TIST, ACM TKDD, Neural Networks, Pattern Recognit.ion. He has been a PC member or reviewer to a number of top conferences such as AAAI, IJCAI, CVPR, ICCV, MM, ICDM, CIKM, ECCV, etc. He regularly servers as a reviewer to JMLR, TPAMI, TNNLS, TCyb, TKDE, TMM, NN, etc.

    Chong Peng received his Ph.D. degree in Computer Science from Southern Illinois University Carbondale, USA, in 2017. Currently, he is an assistant professor at Qingdao University. His research interests are machine learning, data mining, computer vision.

    Qiang Cheng received the B.S. and M.S. degrees in mathematics, applied mathematics, and computer science from Peking University, China, and the Ph.D. degree from the Department of Electrical and Computer Engineering at the University of Illinois, Urbana-Champaign. Currently, he is an associate professor at the University of Kentucky. He previously was an associate professor at the Southern Illinois University Carbondale, an AFOSR faculty fellow at the Air Force Research Laboratory, Wright-Patterson, Ohio, and a senior researcher and senior research scientist at Siemens Medical Solutions, Siemens Corporate Research, Siemens Corp., Princeton, New Jersey. His research interests include Pattern Recognit.ion, machine learning, signal and image processing, and biomedical informatics. He received various awards and privileges. He was on the organizing committee of a number of international conferences and workshops. He has a number of international patents issued or filed with the IBM T.J. Watson Research Laboratory, Yorktown Heights, Siemens Medical, Princeton, and Southern Illinois University, Carbondale.

    Xinwang Liu received his Ph.D. degree from National University of Defense Technology (NUDT), China. He is now Associate Researcher of School of Computer Science, NUDT. His current research interests include kernel learning and unsupervised feature learning. Dr. Liu has published 50+ peer-reviewed papers, including those in highly regarded journals and conferences such as IEEE T-PAMI, IEEE T-IP, IEEE T-NNLS, ICCV, AAAI, IJCAI, etc. He served on the Technical Program Committees of IJCAI 2016–2020 and AAAI 2016–2020.

    Xi Peng received the Ph.D. degree in computer science from the Sichuan University, Chengdu, China, in 2013. He currently is a National Distinguished Youth Professor with the College of Computer Science, Sichuan University. Dr. Peng has served as an Associate Editor/Guest Editor for six journals, such as the IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, and an Area Chair/Session Chair/Program Chair/Tutorial Organization Chair for over 30 international conferences, such as the AAAI Conference on Artificial Intelligence (AAAI) and the European Conference on Computer Vision (ECCV).

    Zenglin Xu is currently a full professor in University of Electronic Science and Technology of China. He received the Ph.D. degree in computer science and engineering from the Chinese University of Hong Kong. He has been working at Michigan State University, Cluster of Excellence at Saarland University and Max Planck Institute for Informatics, and later Purdue University. Dr. Xu’s research interests include machine learning and its applications in information retrieval, health informatics, and social network analysis. He currently serves as an associate editor of Neural Networks, Neurocomputing and Big Data Analytics. He is the recipient of the outstanding student paper honorable mention of AAAI 2015, the best student paper runner up of ACML 2016, and the 2016 young researcher award from APNNS.

    Ling Tian received the B.S., M.S., and Ph.D. degrees from the School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu, China, in 2003, 2006, and 2010, respectively. She is currently a Professor with UESTC. She won second prize of National Technical Invention Award of China in 2017, first prize of Technical Invention Award of Sichuan Province in 2016, and first prize of Science and Technology Progress Award of Sichuan Province in 2015. She has edited two books and holds more than 20 Chinese patents. She has contributed more than ten technology proposals to the standardizations such as China Audio and Video Standard (AVS) and China Cloud Computing Standard. Her research interests include image/video coding, and artificial intelligence.

    View full text