ADDRESSING IMBALANCED INSURANCE DATA THROUGH ZERO-INFLATED POISSON REGRESSION WITH BOOSTING

Simon C.K. Lee

doi:10.1017/asb.2020.40

ADDRESSING IMBALANCED INSURANCE DATA THROUGH ZERO-INFLATED POISSON REGRESSION WITH BOOSTING

Published online by Cambridge University Press: 17 December 2020

Simon C.K. Lee

Show author details

Simon C.K. Lee*: Affiliation:
Department of Statistics and Actuarial Science, The University of Hong Kong, E-Mail: slee2016@hku.hk

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

A machine learning approach to zero-inflated Poisson (ZIP) regression is introduced to address common difficulty arising from imbalanced financial data. The suggested ZIP can be interpreted as an adaptive weight adjustment procedure that removes the need for post-modeling re-calibration and results in a substantial enhancement of predictive accuracy. Notwithstanding the increased complexity due to the expanded parameter set, we utilize a cyclic coordinate descent optimization to implement the ZIP regression, with adjustments made to address saddle points. We also study how various approaches alleviate the potential drawbacks of incomplete exposures in insurance applications. The procedure is tested on real-life data. We demonstrate a significant improvement in performance relative to other popular alternatives, which justifies our modeling techniques.

Keywords

Boosting trees predictive modeling insurance machine learning imbalanced data zero-inflated Poisson C14

Type: Research Article
Information: ASTIN Bulletin: The Journal of the IAA , Volume 51 , Issue 1 , January 2021 , pp. 27 - 55

DOI: https://doi.org/10.1017/asb.2020.40 [Opens in a new window]
Copyright: © 2020 by Astin Bulletin. All rights reserved

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Boucher, J.-P., Denuit, M. and Guillén, M. (2007) Risk classification for claim counts: A comparative analysis of various zeroinflated mixed poisson and hurdle models. North American Actuarial Journal, 11(4), 110–131.CrossRef Google Scholar

Boucher, J.-P., Denuit, M. and Guillen, M. (2009) Number of accidents or number of claims? An approach with zero-inflated poisson models for panel data. Journal of Risk and Insurance, 76(4), 821–846.CrossRef Google Scholar

Breiman, L., Friedman, J., Stone, C.J. and Olshen, R.A. (1984) Classification and Regression Trees. Boca Raton, Florida, USA: CRC Press.Google Scholar

Bühlmann, H. and Gisler, A. (2006) A Course in Credibility Theory and Its Applications. Berlin, Germany: Springer Science & Business Media.Google Scholar

Caldern-Ojeda, E., GóMez-Déniz, E. and Barranco-Chamorro, I. (2019). Modelling zero-inflated count data with a special case of the generalised poisson distribution. ASTIN Bulletin: The Journal of the IAA, 49(3), 689–707.CrossRef Google Scholar

Chawla, N.V., Bowyer, K.W., Hall, L.O. and Kegelmeyer, W.P. (2002) Smote: A ynthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321–357.CrossRef Google Scholar

Chen, T., He, T., Benesty, M., Khotilovich, V. and Tang, Y. (2015) Xgboost: Extreme gradient boosting. R package version 0.4-2, 1–4.Google Scholar

De Jong, P. and Heller, G.Z. (2008) Generalized Linear Models for Insurance Data. Cambridge, UK: Cambridge University Press.CrossRef Google Scholar

Fernández, A., Garca, S., Galar, M., Prati, R.C., Krawczyk, B. and Herrera, F. (2018) Learning from Imbalanced Data Sets. Springer.CrossRef Google Scholar

Freund, Y. and Schapire, R.E. (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. European Conference on Computational Learning Theory, pp. 23–37. Springer.CrossRef Google Scholar

Friedman, J.H. (2001) Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.CrossRef Google Scholar

Friedman, J.H. (2002) Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367–378.CrossRef Google Scholar

Gee, J. and Button, M. (2019) The financial cost of fraud 2019: The latest data from around the world. Tech. rep., Crowe UK.Google Scholar

Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H. and Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73, 220–239.CrossRef Google Scholar

He, H. and Ma, Y. (2013). Imbalanced learning: Foundations, Algorithms, and Applications. New York, USA: John Wiley & Sons.CrossRef Google Scholar

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q. and Liu, T.-Y. (2017) Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, pp. 3146–3154.Google Scholar

Kingman, J.F.C. (2005) Poisson processes. Encyclopedia of biostatistics 6.CrossRef Google Scholar

Klein, N., Kneib, T. and Lang, S. (2015) Bayesian generalized additive models for location, scale, and shape for zero-inflated and overdispersed count data. Journal of the American Statistical Association, 110(509), 405–419.CrossRef Google Scholar

Lambert, D. 1992. Zero-Inflated poisson regression, with an application to defects in manufacturing. Technometrics, 34, 1–14.CrossRef Google Scholar

Lee, S.C. (2020) Delta boosting implementation of negative binomial regression in actuarial pricing. Risks, 8(1), 19.CrossRef Google Scholar

Lee, S.C. and Lin, S. (2018) Delta boosting machine with application to general insurance. North American Actuarial Journal, 22(3), 405–425.CrossRef Google Scholar

Saha, A. and Tewari, A. (2010) On the finite time convergence of cyclic coordinate descent methods. arXiv preprint arXiv: 1005.2146.Google Scholar

Schapire, R.E. (1990) The strength of weak learnability. Machine Learning, 5(2), 197–227.CrossRef Google Scholar

Teugels, J.L. and Vynckie, P. (1996). The structure distribution in a mixed poisson process. International Journal of Stochastic Analysis, 9(4), 489–496.Google Scholar

Wright, S.J. (2015) Coordinate descent algorithms. Mathematical Programming, 151(1), 3–34.Google Scholar

Wuthrich, M.V. and Buser, C. (2019). Data analytics for non-life insurance pricing. Swiss Finance Institute Research Paper 2019 (16-68).Google Scholar

Article contents

ADDRESSING IMBALANCED INSURANCE DATA THROUGH ZERO-INFLATED POISSON REGRESSION WITH BOOSTING

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests