当前位置: X-MOL 学术International Journal on Digital Libraries › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Toward comprehensive event collections
International Journal on Digital Libraries ( IF 1.6 ) Pub Date : 2018-06-22 , DOI: 10.1007/s00799-018-0246-x
Federico Nanni , Simone Paolo Ponzetto , Laura Dietz

Web archives, such as the Internet Archive, preserve an unprecedented abundance of materials regarding major events and transformations in our society. In this paper, we present an approach for building event-centric sub-collections from such large archives, which includes not only the core documents related to the event itself but, even more importantly, documents describing related aspects (e.g., premises and consequences). This is achieved by identifying relevant concepts and entities from a knowledge base, and then detecting their mentions in documents, which are interpreted as indicators for relevance. We extensively evaluate our system on two diachronic corpora, the New York Times Corpus and the US Congressional Record; additionally, we test its performance on the TREC KBA Stream Corpus and on the TREC-CAR dataset, two publicly available large-scale web collections.

中文翻译:

走向全面的活动收藏

Web档案库(例如Internet档案库)保留了有关我们社会重大事件和变革的空前丰富的资料。在本文中,我们提出了一种从如此庞大的档案中构建以事件为中心的子集合的方法,该方法不仅包括与事件本身相关的核心文档,而且更重要的是,它描述了相关方面(例如前提和后果)的文档。 。这是通过从知识库中识别相关的概念和实体,然后在文档中检测到它们的提及来实现的,这些文档被解释为相关性的指标。我们在两个历时性语料库(纽约时报语料库和美国国会记录)上广泛评估了我们的系统;此外,我们在TREC KBA流语料库和TREC-CAR数据集上测试了其性能,
更新日期:2018-06-22
down
wechat
bug