当前位置: X-MOL 学术Entropy › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Active Learning for Node Classification: An Evaluation
Entropy ( IF 2.1 ) Pub Date : 2020-10-16 , DOI: 10.3390/e22101164
Kaushalya Madhawa 1 , Tsuyoshi Murata 1
Affiliation  

Current breakthroughs in the field of machine learning are fueled by the deployment of deep neural network models. Deep neural networks models are notorious for their dependence on large amounts of labeled data for training them. Active learning is being used as a solution to train classification models with less labeled instances by selecting only the most informative instances for labeling. This is especially important when the labeled data are scarce or the labeling process is expensive. In this paper, we study the application of active learning on attributed graphs. In this setting, the data instances are represented as nodes of an attributed graph. Graph neural networks achieve the current state-of-the-art classification performance on attributed graphs. The performance of graph neural networks relies on the careful tuning of their hyperparameters, usually performed using a validation set, an additional set of labeled instances. In label scarce problems, it is realistic to use all labeled instances for training the model. In this setting, we perform a fair comparison of the existing active learning algorithms proposed for graph neural networks as well as other data types such as images and text. With empirical results, we demonstrate that state-of-the-art active learning algorithms designed for other data types do not perform well on graph-structured data. We study the problem within the framework of the exploration-vs.-exploitation trade-off and propose a new count-based exploration term. With empirical evidence on multiple benchmark graphs, we highlight the importance of complementing uncertainty-based active learning models with an exploration term.

中文翻译:


节点分类的主动学习:评估



当前机器学习领域的突破是由深度神经网络模型的部署推动的。深度神经网络模型因依赖大量标记数据进行训练而臭名昭著。主动学习被用作一种解决方案,通过仅选择信息最丰富的实例进行标记来训练具有较少标记实例的分类模型。当标记数据稀缺或标记过程昂贵时,这一点尤其重要。在本文中,我们研究了主动学习在属性图上的应用。在此设置中,数据实例表示为属性图的节点。图神经网络在属性图上实现了当前最先进的分类性能。图神经网络的性能依赖于对其超参数的仔细调整,通常使用验证集(一组附加的标记实例)来执行。在标签稀缺问题中,使用所有标记实例来训练模型是现实的。在这种情况下,我们对针对图神经网络以及其他数据类型(例如图像和文本)提出的现有主动学习算法进行了公平的比较。通过实证结果,我们证明了为其他数据类型设计的最先进的主动学习算法在图结构数据上表现不佳。我们在探索与利用权衡的框架内研究该问题,并提出一个新的基于计数的探索术语。通过多个基准图的经验证据,我们强调了用探索术语补充基于不确定性的主动学习模型的重要性。
更新日期:2020-10-16
down
wechat
bug