当前位置: X-MOL 学术Data Min. Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Effective social post classifiers on top of search interfaces
Data Mining and Knowledge Discovery ( IF 4.8 ) Pub Date : 2021-06-12 , DOI: 10.1007/s10618-021-00768-2
Ryan Rivas , Vagelis Hristidis

Applying text classification to find social media posts relevant to a topic of interest is the focus of a substantial amount of research. A key challenge is how to select a good training set of posts to label. This problem has traditionally been solved using active learning. However, this assumes access to all posts of the collection, which is not realistic in many cases, as social networks impose constraints on the number of posts that can be retrieved through their search APIs. To address this problem, which we refer as the training post retrieval over constrained search interfaces problem, we propose several keyword selection algorithms that, given a topic, generate an effective set of keyword queries to submit to the search API. The returned posts are labeled and used as a training dataset to train post classifiers. Our experiments compare our proposed keyword selection algorithms to several baselines across various topics from three sources. The results show that the proposed methods generate superior training sets, which is measured by the balanced accuracy of the trained classifiers.



中文翻译:

搜索界面之上的有效社交帖子分类器

应用文本分类来查找与感兴趣的主题相关的社交媒体帖子是大量研究的重点。一个关键的挑战是如何选择一个好的帖子训练集来标记。这个问题传统上是通过主动学习来解决的。然而,这假设访问集合的所有帖子,这在许多情况下是不现实的,因为社交网络对可以通过其搜索 API 检索的帖子数量施加了限制。为了解决这个问题,我们将其称为受限搜索接口上训练后检索问题,我们提出了几种关键字选择算法,给定一个主题,生成一组有效的关键字查询以提交给搜索 API。返回的帖子被标记并用作训练数据集来训练帖子分类器。我们的实验将我们提出的关键字选择算法与来自三个来源的各种主题的几个基线进行了比较。结果表明,所提出的方法产生了优良的训练集,这是通过训练分类器的平衡精度来衡量的。

更新日期:2021-06-13
down
wechat
bug