当前位置: X-MOL 学术EURASIP J. Info. Secur. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hybrid focused crawling on the Surface and the Dark Web
EURASIP Journal on Information Security ( IF 2.5 ) Pub Date : 2017-07-04 , DOI: 10.1186/s13635-017-0064-5
Christos Iliou , George Kalpakis , Theodora Tsikrika , Stefanos Vrochidis , Ioannis Kompatsiaris

Focused crawlers enable the automatic discovery of Web resources about a given topic by automatically navigating through the Web link structure and selecting the hyperlinks to follow by estimating their relevance to the topic of interest. This work proposes a generic focused crawling framework for discovering resources on any given topic that reside on the Surface or the Dark Web. The proposed crawler is able to seamlessly navigate through the Surface Web and several darknets present in the Dark Web (i.e., Tor, I2P, and Freenet) during a single crawl by automatically adapting its crawling behavior and its classifier-guided hyperlink selection strategy based on the destination network type and the strength of the local evidence present in the vicinity of a hyperlink. It investigates 11 hyperlink selection methods, among which a novel strategy proposed based on the dynamic linear combination of a link-based and a parent Web page classifier. This hybrid focused crawler is demonstrated for the discovery of Web resources containing recipes for producing homemade explosives. The evaluation experiments indicate the effectiveness of the proposed focused crawler both for the Surface and the Dark Web.

中文翻译:

混合专注于Surface和Dark Web的爬网

重点爬网程序通过自动浏览Web链接结构并选择超链接来估计与特定主题的相关性,从而自动发现有关给定主题的Web资源。这项工作提出了一个通用的集中爬网框架,用于发现Surface或Dark Web上任何给定主题的资源。所建议的爬虫能够在一次爬网期间通过自动适应其爬网行为和基于分类器指导的超链接选择策略,无缝浏览Surface Web和暗网中存在的多个暗网(即Tor,I2P和Freenet)。超级链接附近的目标网络类型和本地证据的强度。它研究了11种超链接选择方法,其中提出了一种基于链接和父网页分类器的动态线性组合的新策略。该混合型爬虫被证明可用于发现包含自制炸药配方的Web资源。评估实验表明,所提出的集中式爬虫对于Surface和Dark Web都是有效的。
更新日期:2020-04-16
down
wechat
bug