当前位置: X-MOL 学术Found. Trends Inf. Ret. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Credibility in Information Retrieval
Foundations and Trends in Information Retrieval ( IF 10.4 ) Pub Date : 2015-12-17 , DOI: 10.1561/1500000046
Alexandru L. Ginsca , Adrian Popescu , Mihai Lupu

Credibility, as the general concept covering trustworthiness and expertise, but also quality and reliability, is strongly debated in philosophy, psychology, and sociology, and its adoption in computer science is therefore fraught with difficulties. Yet its importance has grown in the information access community because of two complementing factors: on one hand, it is relatively difficult to precisely point to the source of a piece of information, and on the other hand, complex algorithms, statistical machine learning, artificial intelligence, make decisions on behalf of the users, with little oversight from the users themselves. This survey presents a detailed analysis of existing credibility models from different information seeking research areas, with focus on the Web and its pervasive social component. It shows that there is a very rich body of work pertaining to different aspects and interpretations of credibility, particularly for different types of textual content (e.g., Web sites, blogs, tweets), but also to other modalities (videos, images, audio) and topics (e.g., health care). After an introduction placing credibility in the context of other sciences and relating it to trust, we argue for a quartic decomposition of credibility: expertise and trustworthiness, well documented in the literature and predominantly related to information source, and quality and reliability, raised to the status of equal partners because the source is often impossible to detect, and predominantly related to the content. The second half of the survey provides the reader with access points to the literature, grouped by research interests. Section 3 reviews general research directions: the factors that contribute to credibility assessment in human consumers of information; the models used to combine these factors; the methods to predict credibility. A smaller section is dedicated to informing users about the credibility learned from the data. Sections 4, 5, and 6 go further into details, with domain-specific credibility, social media credibility, and multimedia credibility, respectively. While each of them is best understood in the context of Sections 1 and 2, they can be read independently of each other. The last section of this survey addresses a topic not commonly considered under “credibility”: the credibility of the system itself, independent of the data creators. This is a topic of particular importance in domains where the user is professionally motivated and where there are no concerns about the credibility of the data (e.g. e-discovery and patent search). While there is little explicit work in this direction, we argue that this is an open research direction that is worthy of future exploration. Finally, as an additional help to the reader, an appendix lists the existing test collections that cater specifically to some aspect of credibility. Overall, this review will provide the reader with an organised and comprehensive reference guide to the state of the art and the problems at hand, rather than a final answer to the question of what credibility is for computer science. Even within the relatively limited scope of an exact science, such an answer is not possible for a concept that is itself widely debated in philosophy and social sciences.



中文翻译:

信息检索的信誉

信誉,作为涵盖可信赖性和专业知识以及质量和可靠性的一般概念,在哲学,心理学和社会学中受到了激烈的争论,因此在计算机科学中采用它充满了困难。然而,由于两个互补因素,它在信息访问社区中的重要性日益提高:一方面,很难准确地指出一条信息的来源;另一方面,复杂的算法,统计机器学习,人工智能,代表用户做出决策,而很少受到用户本身的监督。这项调查提供了对来自不同信息搜索研究领域的现有信誉模型的详细分析,重点是Web及其普遍的社会组成部分。它表明,有关信誉的不同方面和解释的工作非常丰富,特别是针对不同类型的文本内容(例如,网站,博客,推文),还涉及其他形式(视频,图像,音频)和主题(例如卫生保健)。在将信誉放在其他科学领域并将其与信任联系起来的介绍之后,我们主张对信誉进行四次分解:专门知识和可信赖性,在文献中有很好的记载,并且主要与信息来源,质量和可靠性有关,从而引起了人们的关注。平等伙伴的地位,因为来源通常是无法检测的,并且主要与内容相关。调查的后半部分为读者提供了按研究兴趣分组的文献访问点。第三部分回顾了一般的研究方向:促成人类信息消费者可信度评估的因素;用于组合这些因素的模型;预测信誉的方法。较小的部分专用于通知用户有关从数据中学到的信誉的信息。第4、5和6节将分别针对特定领域的信誉,社交媒体的信誉和多媒体的信誉进行详细介绍。尽管在第1节和第2节的上下文中可以最好地理解它们中的每一个,但它们可以彼此独立地阅读。本调查的最后一部分讨论了“信誉”下通常不考虑的一个话题:系统本身的信誉,独立于数据创建者。在用户出于专业动机并且不担心数据可信性(例如,电子发现和专利搜索)的领域中,这是一个特别重要的主题。尽管在这个方向上几乎没有明确的工作,但我们认为这是一个开放的研究方向,值得将来进行探索。最后,作为对读者的附加帮助,附录列出了专门针对信誉某些方面的现有测试集合。总的来说,这篇综述将为读者提供有关当前技术水平和当前问题的有组织的综合参考指南,而不是对计算机科学的可信度问题的最终答案。即使在精确科学的相对有限范围内,

更新日期:2015-12-17
down
wechat
bug