当前位置: X-MOL 学术arXiv.cs.DL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Where Did the Web Archive Go?
arXiv - CS - Digital Libraries Pub Date : 2021-08-12 , DOI: arxiv-2108.05939
Mohamed Aturban, Michael L. Nelson, Michele C. Weigle

To perform a longitudinal investigation of web archives and detecting variations and changes replaying individual archived pages, or mementos, we created a sample of 16,627 mementos from 17 public web archives. Over the course of our 14-month study (November, 2017 - January, 2019), we found that four web archives changed their base URIs and did not leave a machine-readable method of locating their new base URIs, necessitating manual rediscovery. Of the 1,981 mementos in our sample from these four web archives, 537 were impacted: 517 mementos were rediscovered but with changes in their time of archiving (or Memento-Datetime), HTTP status code, or the string comprising their original URI (or URI-R), and 20 of the mementos could not be found at all.

中文翻译:

网络档案去哪儿了?

为了对网络档案进行纵向调查并检测重播单个存档页面或纪念品的变化和变化,我们从 17 个公共网络档案中创建了一个包含 16,627 个纪念品的样本。在我们为期 14 个月的研究过程中(2017 年 11 月 - 2019 年 1 月),我们发现四个 Web 档案更改了它们的基本 URI,并且没有留下用于定位新基本 URI 的机器可读方法,因此需要手动重新发现。在来自这四个 Web 档案的样本中的 1,981 个纪念品中,有 537 个受到影响:517 个纪念品被重新发现,但它们的存档时间(或 Memento-Datetime)、HTTP 状态代码或组成其原始 URI(或 URI)的字符串发生了变化-R),并且根本找不到 20 个纪念品。
更新日期:2021-08-16
down
wechat
bug