当前位置: X-MOL 学术Inform. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Query by documents on top of a search interface
Information Systems ( IF 3.0 ) Pub Date : 2021-05-14 , DOI: 10.1016/j.is.2021.101793
Nhat X.T. Le , Moloud Shahbazi , Abdulaziz Almaslukh , Vagelis Hristidis

Document repositories often provide a keyword-based query interfaces to allow users to search for documents. These interfaces typically have rate limits or monetary cost per access operation. Constrained search interfaces include legal or medical data sources, social networks and the Web. We study the problem where a user has a set of input documents, and wants to discover other similar documents using a constrained search interface. Specifically, given a set of input documents and an access budget, we present principled techniques to generate a list of queries to submit. Our technique’s key intuition is to compute the best set of queries to return the input documents, which, as we show experimentally, also return other relevant documents. We show that our techniques are superior to the state-of-the-art work, according to several intuitive document relevance metrics, on several real benchmark datasets. We show results for two problem variants: finding queries to return in the highest positions the input documents (Docs2Queries-Self problem) and other relevant documents (Docs2Queries-Sim problem).



中文翻译:

在搜索界面上按文档查询

文档存储库通常提供基于关键字的查询接口以允许用户搜索文档。这些接口通常具有速率限制或每次访问操作的货币成本。受限搜索界面包括法律或医疗数据源、社交网络和 Web。我们研究了用户有一组输入文档,并希望使用受限搜索界面发现其他类似文档的问题。具体来说,给定一组输入文档和访问预算,我们提出了生成要提交的查询列表的原则性技术。我们的技术的关键直觉是计算返回输入文档的最佳查询集,正如我们在实验中展示的那样,它还返回其他相关文档。我们证明了我们的技术优于最新的工作,根据几个直观的文档相关性指标,在几个真实的基准数据集上。我们显示了两个问题变体的结果:查找查询以将输入文档(Docs2Queries-Self问题)和其他相关文档(Docs2Queries-Sim问题)返回到最高位置。

更新日期:2021-05-28
down
wechat
bug