当前位置: X-MOL 学术Commun. Stat. Simul. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Tweedie gradient boosting for extremely unbalanced zero-inflated data
Communications in Statistics - Simulation and Computation ( IF 0.8 ) Pub Date : 2020-07-11 , DOI: 10.1080/03610918.2020.1772302
He Zhou 1 , Wei Qian 2 , Yi Yang 3
Affiliation  

Abstract

Tweedie’s compound Poisson model is a popular method to model insurance claims with probability mass at zero and nonnegative, highly right-skewed distribution. In particular, it is not uncommon to have extremely unbalanced data with excessively large proportion of zero claims, and even traditional Tweedie model may not be satisfactory for fitting the data. In this paper, we propose a boosting-assisted zero-inflated Tweedie model, called EMTboost, that allows zero probability mass to exceed a traditional model. We makes a nonparametric assumption on its Tweedie model component, that unlike a linear model, is able to capture nonlinearities, discontinuities, and complex higher order interactions among predictors. A specialized Expectation-Maximization algorithm is developed that integrates a blockwise coordinate descent strategy and a gradient tree-boosting algorithm to estimate key model parameters. We use extensive simulation and data analysis on synthetic zero-inflated auto-insurance claim data to illustrate our method’s prediction performance.



中文翻译:

极不平衡的零膨胀数据的 Tweedie 梯度提升

摘要

Tweedie 的复合泊松模型是一种流行的方法,用于对概率质量为零且非负、高度右偏分布的保险索赔进行建模。尤其是极不平衡的数据,零索赔比例过大的情况并不少见,即使是传统的 Tweedie 模型也可能无法很好地拟合数据。在本文中,我们提出了一种增强辅助的零膨胀 Tweedie 模型,称为 EMTboost,它允许零概率质量超过传统模型。我们对其 Tweedie 模型组件进行了非参数假设,与线性模型不同,它能够捕捉预测变量之间的非线性、不连续性和复杂的高阶交互。开发了一种专门的期望最大化算法,该算法集成了块坐标下降策略和梯度树增强算法来估计关键模型参数。我们对合成的零膨胀汽车保险索赔数据进行了广泛的模拟和数据分析,以说明我们方法的预测性能。

更新日期:2020-07-11
down
wechat
bug