Nature Machine Intelligence ( IF 23.9 ) Pub Date : 2023-10-19 , DOI: 10.1038/s42256-023-00726-1 Fabio Petroni , Samuel Broscheit , Aleksandra Piktus , Patrick Lewis , Gautier Izacard , Lucas Hosseini , Jane Dwivedi-Yu , Maria Lomeli , Timo Schick , Michele Bevilacqua , Pierre-Emmanuel Mazaré , Armand Joulin , Edouard Grave , Sebastian Riedel
|
|
Verifiability is a core content policy of Wikipedia: claims need to be backed by citations. Maintaining and improving the quality of Wikipedia references is an important challenge and there is a pressing need for better tools to assist humans in this effort. We show that the process of improving references can be tackled with the help of artificial intelligence (AI) powered by an information retrieval system and a language model. This neural-network-based system, which we call SIDE, can identify Wikipedia citations that are unlikely to support their claims, and subsequently recommend better ones from the web. We train this model on existing Wikipedia references, therefore learning from the contributions and combined wisdom of thousands of Wikipedia editors. Using crowdsourcing, we observe that for the top 10% most likely citations to be tagged as unverifiable by our system, humans prefer our system’s suggested alternatives compared with the originally cited reference 70% of the time. To validate the applicability of our system, we built a demo to engage with the English-speaking Wikipedia community and find that SIDE’s first citation recommendation is preferred twice as often as the existing Wikipedia citation for the same top 10% most likely unverifiable claims according to SIDE. Our results indicate that an AI-based system could be used, in tandem with humans, to improve the verifiability of Wikipedia.
中文翻译:
利用人工智能提高维基百科的可验证性
可验证性是维基百科的核心内容政策:声明需要有引用的支持。维护和提高维基百科参考文献的质量是一项重要的挑战,迫切需要更好的工具来帮助人类完成这项工作。我们表明,可以借助由信息检索系统和语言模型提供支持的人工智能 (AI) 来解决改进参考的过程。这种基于神经网络的系统(我们称之为 SIDE)可以识别不太可能支持其主张的维基百科引文,然后从网络上推荐更好的引文。我们在现有的维基百科参考资料上训练这个模型,因此从数千名维基百科编辑的贡献和综合智慧中学习。通过众包,我们观察到,对于最有可能被我们的系统标记为无法验证的前 10% 的引用,与最初引用的参考文献相比,人们在 70% 的情况下更喜欢我们系统建议的替代方案。为了验证我们系统的适用性,我们构建了一个演示来与英语维基百科社区互动,发现对于相同的前 10% 最有可能无法验证的声明,SIDE 的第一个引用推荐的首选频率是现有维基百科引用的两倍边。我们的结果表明,基于人工智能的系统可以与人类一起使用,以提高维基百科的可验证性。




















































京公网安备 11010802027423号