Feature selection and learning for graphlet kernel,Pattern Recognition Letters

当前位置： X-MOL 学术 › Pattern Recogn. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Feature selection and learning for graphlet kernel
Pattern Recognition Letters ( IF 3.9 ) Pub Date : 2020-05-27 , DOI: 10.1016/j.patrec.2020.05.023
Furqan Aziz , Afan Ullah , Faiza Shah

Graph-based representations have been used with considerable success in Bioinformatics. One of the challenging problems with graph-based representation is that of estimating the similarity between two input graphs. Graph kernels are an answer to this problem that aim at bridging the gap between a vectorised representation and a structured representation. However, existing graph kernels suffer from one of two problems. They are either computationally very expensive or have low classification accuracy. In this paper we present a method that can be used to improve the accuracy and efficiency of one of the most popular graph kernels, i.e., graphlet kernel. The main idea behind graphlet kernel is to use a graphlet frequency vector as a feature vector. We propose a framework that can be used to select a subset of features that can be used to estimate the similarity between graphs. We show that the proposed method not only increases the efficiency of the resulting kernel but also increases the classification accuracy. We enrich the feature vector by identifying a set of higher-order graphlets that can be efficiently computed. We also show that different datasets from bioinformatics domain share common graphlets. Therefore the set of features learned from one bioinformatics dataset can also be used to classify graphs in another bioinformatics dataset.

中文翻译：

graphlet内核的特征选择和学习

基于图的表示已在生物信息学中获得了相当大的成功。基于图的表示的挑战性问题之一是估计两个输入图之间的相似性。图形内核是针对此问题的答案，旨在弥合矢量化表示和结构化表示之间的差距。但是，现有的图内核遭受两个问题之一。它们要么在计算上非常昂贵，要么分类精度低。在本文中，我们提出了一种可用于提高最流行的图核之一（即graphlet核）的准确性和效率的方法。graphlet内核背后的主要思想是使用graphlet频率向量作为特征向量。我们提出了一个可用于选择特征子集的框架，该子集可用于估计图之间的相似性。我们表明，提出的方法不仅提高了所得内核的效率，而且还提高了分类精度。我们通过识别可以有效计算的一组高阶图集来丰富特征向量。我们还显示，来自生物信息学领域的不同数据集共享共同的图集。因此，从一个生物信息学数据集中学习的特征集也可以用于对另一个生物信息学数据集中的图形进行分类。我们通过识别可以有效计算的一组高阶图集来丰富特征向量。我们还显示，来自生物信息学领域的不同数据集共享共同的图集。因此，从一个生物信息学数据集中学习的特征集也可以用于对另一个生物信息学数据集中的图形进行分类。我们通过识别一组可以有效计算的高阶图集来丰富特征向量。我们还显示，来自生物信息学领域的不同数据集共享共同的图集。因此，从一个生物信息学数据集中学习的特征集也可以用于对另一个生物信息学数据集中的图形进行分类。

更新日期：2020-05-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11