当前位置: X-MOL 学术Int. J. Mach. Learn. & Cyber. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Knowledge-driven graph similarity for text classification
International Journal of Machine Learning and Cybernetics ( IF 3.1 ) Pub Date : 2020-11-19 , DOI: 10.1007/s13042-020-01221-4
Niloofer Shanavas , Hui Wang , Zhiwei Lin , Glenn Hawe

Automatic text classification using machine learning is significantly affected by the text representation model. The structural information in text is necessary for natural language understanding, which is usually ignored in vector-based representations. In this paper, we present a graph kernel-based text classification framework which utilises the structural information in text effectively through the weighting and enrichment of a graph-based representation. We introduce weighted co-occurrence graphs to represent text documents, which weight the terms and their dependencies based on their relevance to text classification. We propose a novel method to automatically enrich the weighted graphs using semantic knowledge in the form of a word similarity matrix. The similarity between enriched graphs, knowledge-driven graph similarity, is calculated using a graph kernel. The semantic knowledge in the enriched graphs ensures that the graph kernel goes beyond exact matching of terms and patterns to compute the semantic similarity of documents. In the experiments on sentiment classification and topic classification tasks, our knowledge-driven similarity measure significantly outperforms the baseline text similarity measures on five benchmark text classification datasets.



中文翻译:

知识驱动的图相似度用于文本分类

使用机器学习的自动文本分类受文本表示模型的影响很大。文本中的结构信息对于自然语言的理解是必不可少的,通常在基于矢量的表示中会被忽略。在本文中,我们提出了一种基于图核的文本分类框架,该框架通过加权和丰富基于图的表示形式来有效利用文本中的结构信息。我们引入加权共现图来表示文本文档,该图根据术语与文本分类的相关性对术语及其相关性进行加权。我们提出了一种新颖的方法,可以利用语义知识以词相似度矩阵的形式自动丰富加权图。富集图之间的相似性知识驱动的图相似度是使用图内核来计算的。丰富的图中的语义知识可确保图内核超越术语和模式的精确匹配,以计算文档的语义相似性。在情绪分类和主题分类任务的实验中,我们的知识驱动相似度度量明显优于五个基准文本分类数据集上的基准文本相似度度量。

更新日期:2020-11-19
down
wechat
bug