当前位置: X-MOL 学术arXiv.cs.DL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Full-Text and URL Search Over Web Archives
arXiv - CS - Digital Libraries Pub Date : 2021-08-03 , DOI: arxiv-2108.01603
Miguel Costa

Web archives are a historically valuable source of information. In some respects, web archives are the only record of the evolution of human society in the last two decades. They preserve a mix of personal and collective memories, the importance of which tends to grow as they age. However, the value of web archives depends on their users being able to search and access the information they require in efficient and effective ways. Without the possibility of exploring and exploiting the archived contents, web archives are useless. Web archive access functionalities range from basic browsing to advanced search and analytical services, accessed through user-friendly interfaces. Full-text and URL search have become the predominant and preferred forms of information discovery in web archives, fulfilling user needs and supporting search APIs that feed complex applications. Both full-text and URL search are based on the technology developed for modern web search engines, since the Web is the main resource targeted by both systems. However, while web search engines enable searching over the most recent web snapshot, web archives enable searching over multiple snapshots from the past. This means that web archives have to deal with a temporal dimension that is the cause of new challenges and opportunities, discussed throughout this chapter.

中文翻译:

通过 Web 档案进行全文和 URL 搜索

网络档案是具有历史价值的信息来源。在某些方面,网络档案是过去二十年人类社会演变的唯一记录。他们保留了个人和集体记忆的混合,随着年龄的增长,这种记忆的重要性往往会增加。然而,网络档案的价值取决于其用户能够以高效和有效的方式搜索和访问他们需要的信息。如果没有探索和利用存档内容的可能性,网络存档就毫无用处。Web 档案访问功能范围从基本浏览到高级搜索和分析服务,可通过用户友好的界面访问。全文和 URL 搜索已成为网络档案中信息发现的主要和首选形式,满足用户需求并支持提供复杂应用程序的搜索 API。全文和 URL 搜索都基于为现代网络搜索引擎开发的技术,因为网络是两个系统的主要目标资源。然而,虽然网络搜索引擎可以搜索最近的网络快照,但网络档案可以搜索过去的多个快照。这意味着网络档案必须处理一个时间维度,这是本章讨论的新挑战和机遇的原因。网络档案可以搜索过去的多个快照。这意味着网络档案必须处理一个时间维度,这是本章讨论的新挑战和机遇的原因。网络档案可以搜索过去的多个快照。这意味着网络档案必须处理一个时间维度,这是本章讨论的新挑战和机遇的原因。
更新日期:2021-08-04
down
wechat
bug