Query-based unsupervised learning for improving social media search,World Wide Web

当前位置： X-MOL 学术 › World Wide Web › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Query-based unsupervised learning for improving social media search
World Wide Web ( IF 2.7 ) Pub Date : 2019-11-27 , DOI: 10.1007/s11280-019-00747-0
Khaled Albishre , Yuefeng Li , Yue Xu , Wei Huang

In the current information era over the internet, social media has become one of the essential information sources for users. While the text is the primary information representation, finding relevant information is a challenging mission for researchers due to its nature (e.g., short length, sparseness). Acquiring high-quality search results from massive data, such as social media needs a set of representative query terms that are not always available. In this paper, we propose a novel query-based unsupervised learning model to represent the implicit relationships in the short text from social media. This bridges the gap of the lack of word co-occurrences without requiring many parameters to be estimated and external evidence to be collected. To confirm the proposed model effectiveness, we compare the proposed model with state-of-the-art lexical, topic model and temporal models on the large-scale TREC microblog 2011-2014 collections. The experimental results show that the proposed model significantly improved overall state-of-the-art lexical, topic model and temporal models with the maximum percentage of increase reaching 33.97% based on MAP value and 21.38% based on Precision at top 30 documents. The proposed model can improve the social media search effectiveness in potential closely retrieval tasks, such as question answering and timeline summarisation.

中文翻译：

基于查询的无监督学习，可改善社交媒体搜索

在当前的互联网信息时代，社交媒体已成为用户必不可少的信息来源之一。虽然文本是主要的信息表示形式，但由于其性质（例如，长度短，稀疏），查找相关信息对研究人员而言是一项艰巨的任务。要从海量数据（例如社交媒体）中获取高质量的搜索结果，需要一套并非总是可用的代表性查询词。在本文中，我们提出了一种新颖的基于查询的无监督学习模型来表示社交媒体中短文本中的隐式关系。这弥合了缺少单词共现的缺口，而无需估计许多参数并收集外部证据。为了确认拟议模型的有效性，我们将拟议模型与最新的词汇表进行了比较，大型TREC微博2011-2014集合中的主题模型和时间模型。实验结果表明，所提出的模型显着改善了整体最新的词汇，主题模型和时间模型，基于MAP值的最大增加百分比达到33.97％，基于前30个文档的精确度最大增加百分比达到21.38％。所提出的模型可以提高社交媒体在潜在的紧密检索任务中的搜索效率，例如问题回答和时间轴摘要。38％基于前30个文档中的Precision。所提出的模型可以提高社交媒体在潜在的紧密检索任务中的搜索效率，例如问题回答和时间轴摘要。38％基于前30个文档中的Precision。所提出的模型可以提高社交媒体在潜在的紧密检索任务中的搜索效率，例如问题回答和时间轴摘要。

更新日期：2019-11-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文