The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives,arXiv - CS - Digital Libraries

当前位置： X-MOL 学术 › arXiv.cs.DL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives
arXiv - CS - Digital Libraries Pub Date : 2020-01-15 , DOI: arxiv-2001.05399
Nick Ruest, Jimmy Lin, Ian Milligan, and Samantha Fritz

The Archives Unleashed project aims to improve scholarly access to web archives through a multi-pronged strategy involving tool creation, process modeling, and community building - all proceeding concurrently in mutually-reinforcing efforts. As we near the end of our initially-conceived three-year project, we report on our progress and share lessons learned along the way. The main contribution articulated in this paper is a process model that decomposes scholarly inquiries into four main activities: filter, extract, aggregate, and visualize. Based on the insight that these activities can be disaggregated across time, space, and tools, it is possible to generate "derivative products", using our Archives Unleashed Toolkit, that serve as useful starting points for scholarly inquiry. Scholars can download these products from the Archives Unleashed Cloud and manipulate them just like any other dataset, thus providing access to web archives without requiring any specialized knowledge. Over the past few years, our platform has processed over a thousand different collections from about two hundred users, totaling over 280 terabytes of web archives.

中文翻译：

档案释放项目：技术、流程和社区改善对网络档案的学术访问

Archives Unleashed 项目旨在通过涉及工具创建、流程建模和社区建设的多管齐下的策略来改善对网络档案的学术访问 - 所有这些都在相互加强的努力中同时进行。当我们最初设想的三年项目接近尾声时，我们报告我们的进展并分享沿途的经验教训。本文阐述的主要贡献是一个过程模型，它将学术调查分解为四个主要活动：过滤、提取、聚合和可视化。基于这些活动可以跨时间、空间和工具分解的见解，可以使用我们的档案释放工具包生成“衍生产品”，作为学术调查的有用起点。学者可以从 Archives Unleashed Cloud 下载这些产品并像处理任何其他数据集一样操作它们，从而无需任何专业知识即可访问网络档案。在过去的几年里，我们的平台已经处理了来自大约 200 个用户的一千多个不同的集合，总计超过 280 TB 的网络档案。

更新日期：2020-01-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文