当前位置: X-MOL 学术Soft Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficiently harvesting deep web interfaces based on adaptive learning using two-phase data crawler framework
Soft Computing ( IF 3.1 ) Pub Date : 2021-05-06 , DOI: 10.1007/s00500-021-05816-z
Madhusudhan Rao Murugudu , L. S. S. Reddy

Enhanced richness and size of the data on the web paves the path for increased online services, supporting the sophisticated usage of heterogeneous complex tasks by users. As deep web online services are increasing, several efficient techniques to explore the location of deep web interfaces are also leveraged thereby providing better support in user information exploration on the web concerning queries and the user clicks on web-related search data. This searching process provides an improved experience to users using crawling websites and searches rank with titles and links. The main problem behind search data on the web is organizing user search information based on their preferences into dynamic query formations and data coming from associated web links in an automated way on web services on high-quality sparse data. Vast amounts of data with the Dynamic nature of web interfaces explore or achieve high efficiency and coverage of all deep web interfaces that turn out to be a challenging issue. To handle such issues, we propose and introduce a novel and efficient two-phase deep learning data crawler framework (NTPDCF). The first phase initiates in gathering accurate and highly relevant links using the search engine, and the second phase explores fast and in-site relevant website links using adaptive site ranking. The approach focuses on drilling relevant site data and top-k ranking with different relations based on dynamic features with user preferences in single- and multi-query formation with adaptive weight features. The method promises to visualize improved results with efficient data exploration over the traditional approach concerning real-time defense and e-commerce related to web-based services.



中文翻译:

使用两阶段数据搜寻器框架基于自适应学习有效地收集深层Web界面

Web上数据的丰富性和大小的增强为增加在线服务铺平了道路,从而支持了用户对异构复杂任务的复杂使用。随着深层网络在线服务的增加,探索深层Web界面位置的几种有效技术也得到了利用,从而在网络上有关查询和用户单击与网络相关的搜索数据的用户信息探索中提供了更好的支持。该搜索过程为使用爬网网站的用户提供了更好的体验,并提供了带有标题和链接的搜索排名。Web上搜索数据背后的主要问题是,根据用户的偏好将用户搜索信息组织成动态查询形式,以及以自动方式在高质量稀疏数据的Web服务上从关联的Web链接提供数据。具有Web界面动态特性的大量数据探索或实现了高效率,并且覆盖了所有深层Web界面,这实际上是一个具有挑战性的问题。为了解决此类问题,我们提出并介绍了一种新颖且高效的两阶段深度学习数据搜寻器框架(NTPDCF)。第一阶段开始使用搜索引擎收集准确且高度相关的链接,第二阶段使用自适应站点排名探索快速且与站点相关的网站链接。该方法着重于根据动态特征以及具有自适应权重特征的单查询和多查询形成的用户喜好,钻取具有不同关系的相关站点数据和top-k排名。

更新日期:2021-05-06
down
wechat
bug