当前位置: X-MOL 学术Data Technol. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Gradient boosting learning for fraudulent publisher detection in online advertising
Data Technologies and Applications ( IF 1.6 ) Pub Date : 2020-11-17 , DOI: 10.1108/dta-04-2020-0093
Deepti Sisodia , Dilip Singh Sisodia

Purpose

Analysis of the publisher's behavior plays a vital role in identifying fraudulent publishers in the pay-per-click model of online advertising. However, the vast amount of raw user click data with missing values pose a challenge in analyzing the conduct of publishers. The presence of high cardinality in categorical attributes with multiple possible values has further aggrieved the issue.

Design/methodology/approach

In this paper, gradient tree boosting (GTB) learning is used to address the challenges encountered in learning the publishers' behavior from raw user click data and effectively classifying fraudulent publishers.

Findings

The results demonstrate that the GTB effectively classified fraudulent publishers and exhibited significantly improved performance as compared to other learning methods in terms of average precision (60.5 %), recall (57.8 %) and f-measure (59.1%).

Originality/value

The experiments were conducted using publicly available multiclass raw user click dataset and eight other imbalanced datasets to test the GTB's generalizing behavior, while training and testing were done using 10-fold cross-validation. The performance of GTB was evaluated using average precision, recall and f-measure. The performance of GTB learning was also compared with eleven other state-of-the-art individual and ensemble classification models.



中文翻译:

用于在线广告中欺诈性发布者检测的梯度提升学习

目的

在在线广告的按点击数付费模式中,对发布者行为的分析在识别欺诈性发布者方面起着至关重要的作用。但是,大量缺少值的原始用户点击数据对分析发布者的行为构成了挑战。具有多个可能值的分类属性中高基数的存在进一步加剧了该问题。

设计/方法/方法

在本文中,梯度树增强(GTB)学习用于解决在从原始用户点击数据中学习发布者的行为并有效地对欺诈性发布者进行分类时遇到的挑战。

发现

结果表明,与其他学习方法相比,GTB有效地对欺诈性发布者进行了分类,并且在平均准确度(60.5%),召回率(57.8%)和f量度(59.1%)方面表现出显着改善。

创意/价值

使用公开的多类原始用户点击数据集和其他八个不平衡数据集进行了实验,以测试GTB的泛化行为,而训练和测试则使用10倍交叉验证进行。GTB的性能使用平均精度,召回率和f量度进行了评估。GTB学习的表现也与其他11个最新的个人和整体分类模型进行了比较。

更新日期:2020-11-17
down
wechat
bug