当前位置: X-MOL 学术J. Web Semant. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Characterising dataset search—An analysis of search logs and data requests
Journal of Web Semantics ( IF 2.1 ) Pub Date : 2018-11-19 , DOI: 10.1016/j.websem.2018.11.003
Emilia Kacprzak , Laura Koesten , Luis-Daniel Ibáñez , Tom Blount , Jeni Tennison , Elena Simperl

Large amounts of data are becoming increasingly available online. In order to benefit from it we need tools to retrieve the most relevant datasets that match ones data needs. Several vocabularies have been developed to describe datasets in order to increase their discoverability, but for data publishers is costly to cumbersome to annotate them using all, leading to the question of what properties are more important. In this work we contribute with a systematic study of the patterns and specific attributes that data consumers use to search for data and how it compares with general web search. We performed a query log analysis based on logs from four national open data portals and conducted a qualitative analysis of user data requests for requests issued to one of them. Search queries issued on data portals differ from those issued to web search engines in their length, topic, and structure. Based on our findings we hypothesise that portals search functionalities are currently used in an exploratory manner, rather than to retrieve a specific resource. In our study of data requests we found that geospatial and temporal attributes, as well as information on the required granularity of the data are the most common features. The findings of both analyses suggest that these features are of higher importance in dataset retrieval in contrast to general web search, suggesting that efforts of dataset publishers should focus on generating dataset descriptions including them.



中文翻译:

表征数据集搜索-搜索日志和数据请求分析

大量数据正在变得越来越可在线使用。为了从中受益,我们需要工具来检索与数据需求匹配的最相关的数据集。为了提高数据发现性,已经开发了一些词汇来描述数据集,但是对于数据发布者来说,使用所有词汇来注释它们非常昂贵,从而带来了一个问题,即哪些属性更重要。在这项工作中,我们将系统研究数据消费者用来搜索数据的模式和特定属性,以及将其与一般网络搜索进行比较的方式。我们基于来自四个国家开放数据门户网站的日志进行了查询日志分析,并对用户数据请求中向其中一个发出的请求进行了定性分析。在数据门户网站上发布的搜索查询与在网络搜索引擎上发布的搜索查询的长度,主题和结构不同。根据我们的发现,我们假设门户网站搜索功能当前以探索性方式使用,而不是检索特定资源。在我们对数据请求的研究中,我们发现地理空间和时间属性以及有关数据所需粒度的信息是最常见的特征。两种分析的结果都表明,与一般的Web搜索相比,这些功能在数据集检索中具有更高的重要性,这表明数据集发布者应将精力集中在生成包括它们的数据集描述上。根据我们的发现,我们假设门户网站搜索功能当前以探索性方式使用,而不是检索特定资源。在对数据请求的研究中,我们发现地理空间和时间属性以及有关数据所需粒度的信息是最常见的特征。两种分析的结果都表明,与一般的Web搜索相比,这些功能在数据集检索中具有更高的重要性,这表明数据集发布者应将精力集中在生成包括它们的数据集描述上。根据我们的发现,我们假设门户网站搜索功能当前以探索性方式使用,而不是检索特定资源。在我们对数据请求的研究中,我们发现地理空间和时间属性以及有关数据所需粒度的信息是最常见的特征。两种分析的结果都表明,与一般的Web搜索相比,这些功能在数据集检索中具有更高的重要性,这表明数据集发布者应将精力集中在生成包括它们的数据集描述上。

更新日期:2018-11-19
down
wechat
bug