Big Data Science Over the Past Web,arXiv - CS - Digital Libraries

当前位置： X-MOL 学术 › arXiv.cs.DL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Big Data Science Over the Past Web
arXiv - CS - Digital Libraries Pub Date : 2021-08-03 , DOI: arxiv-2108.01605
Miguel Costa, Julien Masanès

Web archives preserve unique and historically valuable information. They hold a record of past events and memories published by all kinds of people, such as journalists, politicians and ordinary people who have shared their testimony and opinion on multiple subjects. As a result, researchers such as historians and sociologists have used web archives as a source of information to understand the recent past since the early days of the World Wide Web. The typical way to extract knowledge from a web archive is by using its search functionalities to find and analyse historical content. This can be a slow and superficial process when analysing complex topics, due to the huge amount of data that web archives have been preserving over time. Big data science tools can cope with this order of magnitude, enabling researchers to automatically extract meaningful knowledge from the archived data. This knowledge helps not only to explain the past but also to predict the future through the computational modelling of events and behaviours. Currently, there is an immense landscape of big data tools, machine learning frameworks and deep learning algorithms that significantly increase the scalability and performance of several computational tasks, especially over text, image and audio. Web archives have been taking advantage of this panoply of technologies to provide their users with more powerful tools to explore and exploit historical data. This chapter presents several examples of these tools and gives an overview of their application to support longitudinal studies over web archive collections.

中文翻译：

过去网络上的大数据科学

网络档案保存独特且具有历史价值的信息。他们记录了过去的事件和记忆，由各种各样的人发表，例如记者、政治家和普通人，他们就多个主题分享了他们的证词和观点。因此，自万维网早期以来，历史学家和社会学家等研究人员已使用网络档案作为信息来源来了解最近的过去。从网络档案中提取知识的典型方法是使用其搜索功能来查找和分析历史内容。在分析复杂的主题时，这可能是一个缓慢而肤浅的过程，因为网络档案随着时间的推移保存了大量数据。大数据科学工具可以应对这个数量级，使研究人员能够从存档数据中自动提取有意义的知识。这些知识不仅有助于解释过去，还有助于通过事件和行为的计算建模来预测未来。目前，大数据工具、机器学习框架和深度学习算法的前景广阔，可显着提高多项计算任务的可扩展性和性能，尤其是文本、图像和音频。网络档案馆一直在利用这一系列技术为其用户提供更强大的工具来探索和利用历史数据。本章介绍了这些工具的几个示例，并概述了它们在支持网络档案馆藏纵向研究方面的应用。这些知识不仅有助于解释过去，还有助于通过事件和行为的计算建模来预测未来。目前，大数据工具、机器学习框架和深度学习算法的前景广阔，可显着提高多项计算任务的可扩展性和性能，尤其是文本、图像和音频。网络档案馆一直在利用这一系列技术为其用户提供更强大的工具来探索和利用历史数据。本章介绍了这些工具的几个示例，并概述了它们在支持网络档案馆藏纵向研究方面的应用。这些知识不仅有助于解释过去，还有助于通过事件和行为的计算建模来预测未来。目前，大数据工具、机器学习框架和深度学习算法的前景广阔，可显着提高多项计算任务的可扩展性和性能，尤其是文本、图像和音频。网络档案馆一直在利用这一系列技术为其用户提供更强大的工具来探索和利用历史数据。本章介绍了这些工具的几个示例，并概述了它们在支持网络档案馆藏纵向研究方面的应用。大数据工具、机器学习框架和深度学习算法的前景广阔，可显着提高多项计算任务的可扩展性和性能，尤其是在文本、图像和音频方面。网络档案馆一直在利用这一系列技术为其用户提供更强大的工具来探索和利用历史数据。本章介绍了这些工具的几个示例，并概述了它们在支持网络档案馆藏纵向研究方面的应用。大数据工具、机器学习框架和深度学习算法的前景广阔，可显着提高多项计算任务的可扩展性和性能，尤其是在文本、图像和音频方面。网络档案馆一直在利用这一系列技术为其用户提供更强大的工具来探索和利用历史数据。本章介绍了这些工具的几个示例，并概述了它们在支持网络档案馆藏纵向研究方面的应用。网络档案馆一直在利用这一系列技术为其用户提供更强大的工具来探索和利用历史数据。本章介绍了这些工具的几个示例，并概述了它们在支持网络档案馆藏纵向研究方面的应用。网络档案馆一直在利用这一系列技术为其用户提供更强大的工具来探索和利用历史数据。本章介绍了这些工具的几个示例，并概述了它们在支持网络档案馆藏纵向研究方面的应用。

更新日期：2021-08-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文