当前位置:
X-MOL 学术
›
arXiv.cs.SI
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
M-Evolve: Structural-Mapping-Based Data Augmentation for Graph Classification
arXiv - CS - Social and Information Networks Pub Date : 2020-07-11 , DOI: arxiv-2007.05700 Jiajun Zhou, Jie Shen, Shanqing Yu, Guanrong Chen, Qi Xuan
arXiv - CS - Social and Information Networks Pub Date : 2020-07-11 , DOI: arxiv-2007.05700 Jiajun Zhou, Jie Shen, Shanqing Yu, Guanrong Chen, Qi Xuan
Graph classification, which aims to identify the category labels of graphs,
plays a significant role in drug classification, toxicity detection, protein
analysis etc. However, the limitation of scale in the benchmark datasets makes
it easy for graph classification models to fall into over-fitting and
undergeneralization. To improve this, we introduce data augmentation on graphs
(i.e. graph augmentation) and present four methods:random mapping,
vertex-similarity mapping, motif-random mapping and motif-similarity mapping,
to generate more weakly labeled data for small-scale benchmark datasets via
heuristic transformation of graph structures. Furthermore, we propose a generic
model evolution framework, named M-Evolve, which combines graph augmentation,
data filtration and model retraining to optimize pre-trained graph classifiers.
Experiments on six benchmark datasets demonstrate that the proposed framework
helps existing graph classification models alleviate over-fitting and
undergeneralization in the training on small-scale benchmark datasets, which
successfully yields an average improvement of 3-13% accuracy on graph
classification tasks.
中文翻译:
M-Evolve:用于图分类的基于结构映射的数据增强
图分类旨在识别图的类别标签,在药物分类、毒性检测、蛋白质分析等方面发挥着重要作用。 然而,基准数据集的规模限制使得图分类模型容易陷入过度拟合和欠概括。为了改善这一点,我们在图上引入了数据增强(即图增强)并提出了四种方法:随机映射、顶点相似性映射、基序随机映射和基序相似性映射,为小规模基准数据集生成更多弱标记数据通过图结构的启发式转换。此外,我们提出了一个通用的模型进化框架,名为 M-Evolve,它结合了图增强、数据过滤和模型再训练来优化预训练的图分类器。
更新日期:2020-08-26
中文翻译:
M-Evolve:用于图分类的基于结构映射的数据增强
图分类旨在识别图的类别标签,在药物分类、毒性检测、蛋白质分析等方面发挥着重要作用。 然而,基准数据集的规模限制使得图分类模型容易陷入过度拟合和欠概括。为了改善这一点,我们在图上引入了数据增强(即图增强)并提出了四种方法:随机映射、顶点相似性映射、基序随机映射和基序相似性映射,为小规模基准数据集生成更多弱标记数据通过图结构的启发式转换。此外,我们提出了一个通用的模型进化框架,名为 M-Evolve,它结合了图增强、数据过滤和模型再训练来优化预训练的图分类器。