Future Generation Computer Systems ( IF 7.5 ) Pub Date : 2021-01-04 , DOI: 10.1016/j.future.2020.12.023 Xu Zhuang , Yan Zhu , Qiang Peng , Faisal Khurshid
Many score propagation based Web Spam Demotion Algorithms (WSDAs) have been proposed in last decade. There are two major challenges those algorithms suffer from. First, the non-incremental property of score propagation based WSDAs restricted their applications in real world since Web changes rapidly and running algorithm on the entire Web graph is computation consuming. Second, the score propagation based WSDAs adopt only link structure of the web graph to demote Web spam, so that they are vulnerable to some other kind of spamming techniques, such as content spam. In this paper, we propose a preference-based learning to rank method to address the above-mentioned issues confronted by score propagation based WSDAs. Our proposal consists of two components, a preference function and an ordering algorithm. The preference function is modeled by Deep Belief Network (DBN), which can benefit from unlabeled data for better generalization. The proposed Incremental Probabilistic Ordering Algorithm (IPOA) uses the trained preference function to calculate top-ranking probabilities of Web pages, and then uses those probabilities for final ranking. Therefore, the complex object (i.e. Web page) ranking problem is reduced to real number ranking problem, which can be solved efficiently by classical sorting algorithm. We conduct experiments to compare our proposal with conventional score propagation based WSDAs as well as some popular preference based learning to rank algorithms on two public available datasets, WEBSPAM-UK2006 and WEBSPAM-UK2007. Our experimental results demonstrate the superiority of our proposed method. Specifically, compared with score propagation based WSDAs, we obtain 0.0074 absolute improvement (0.7% relative improvement) on WEBSPAM-UK2006 and 0.065 absolute improvement (7.3% relative improvement) on WEBSPAM-UK2007 in terms of spam demotion score.
中文翻译:
使用深度信任网络降级Web垃圾邮件
在过去的十年中,已经提出了许多基于分数传播的Web垃圾邮件降级算法(WSDA)。这些算法遭受两个主要挑战。首先,非-增量基于分数传播的WSDA的特性限制了其在现实世界中的应用,因为Web发生了快速变化,并且在整个Web图形上运行算法非常耗费计算资源。其次,基于分数传播的WSDA仅采用Web图形的链接结构来降级Web垃圾邮件,因此它们容易受到其他垃圾邮件技术(例如内容垃圾邮件)的攻击。在本文中,我们提出了一种基于偏好的学习排序方法,以解决基于分数传播的WSDA所面临的上述问题。我们的建议由两个部分组成,偏好函数和排序算法。偏好功能由Deep Belief Network(DBN)建模,该功能可以受益于未标记的数据,从而可以更好地进行泛化。提出的增量概率排序算法(IPOA)使用训练好的首选项函数来计算Web页面的排名最高概率,然后将这些概率用于最终排名。因此,将复杂对象(即网页)的排序问题简化为实数排序问题,这可以通过经典的排序算法有效地解决。我们进行实验,以将我们的建议与基于常规分数传播的WSDA以及一些基于流行偏好的学习进行比较,以对两个公共可用数据集WEBSPAM-UK2006和WEBSPAM-UK2007进行算法排名。我们的实验结果证明了我们提出的方法的优越性。具体而言,与基于分数传播的WSDA相比,我们在WEBSPAM-UK2006和0上获得0.0074的绝对改善(相对改善0.7%)。