当前位置: X-MOL 学术Found. Trends Inf. Ret. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Web Crawling
Foundations and Trends in Information Retrieval ( IF 10.4 ) Pub Date : 2010-2-11 , DOI: 10.1561/1500000017
Christopher Olston , Marc Najork

This is a survey of the science and practice of web crawling. While at first glance web crawling may appear to be merely an application of breadth-first-search, the truth is that there are many challenges ranging from systems concerns such as managing very large data structures to theoretical questions such as how often to revisit evolving content sources. This survey outlines the fundamental challenges and describes the state-of-the-art models and solutions. It also highlights avenues for future work.



中文翻译:

网络爬行

这是对网络爬网的科学和实践的调查。乍一看,Web爬网似乎只是广度优先搜索的一种应用,但事实是,存在许多挑战,从系统问题(例如管理非常大的数据结构)到理论问题(例如多久重新访问不断发展的内容)资料来源。该调查概述了基本挑战,并描述了最新的模型和解决方案。它还强调了未来工作的途径。

更新日期:2010-02-11
down
wechat
bug