Toward an explainable machine learning model for claim frequency: a use case in car insurance pricing with telematics data

Maillart, Arthur

doi:10.1007/s13385-021-00270-5

Toward an explainable machine learning model for claim frequency: a use case in car insurance pricing with telematics data

Original Research Paper
Published: 19 March 2021

Volume 11, pages 579–617, (2021)
Cite this article

European Actuarial Journal Aims and scope Submit manuscript

Arthur Maillart¹

1147 Accesses
3 Citations
Explore all metrics

Abstract

In this paper, we suggest an explainable machine learning approach to model the claim frequency of a telematics car dataset. In fact, we use a data-driven method based on tree ensembles, namely, the random forest, to create a claim frequency model. Then, we present a method to build a tree that faithfully synthesizes the predictions of a tree ensemble model such as those derived from the random forest or gradient boosting. This tree serves as a global explanation of the predictions of the black-box. Thanks to this surrogate model, we can extract knowledge from a black-box tree ensemble model. Then, we provide an application to improve the performance of a generalized linear model. Indeed, we integrate this new knowledge into a generalized linear model to increase the predictive power.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Prediction of Claim Probability with Excess Zeros

Bankruptcy prediction using machine learning and Shapley additive explanations

Article 07 September 2023

Hoang Hiep Nguyen, Jean-Laurent Viviani & Sami Ben Jabeur

Using a Data Mining Approach to Detect Automobile Insurance Fraud

Notes

References

Beard RE, Pentikäinen T, Pesonen E (1984) Risk theory. Springer Netherlands.https://doi.org/10.1007/978-94-011-7680-4
Boucher JP, Côté S, Guillen M (2017) Exposure as duration and distance in telematics motor insurance using generalized additive models. Risks 5(4):54. https://doi.org/10.3390/risks5040054. http://www.mdpi.com/2227-9091/5/4/54
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article Google Scholar
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton
MATH Google Scholar
Chen T, Guestrin C (2016) XGBoost: extreme gradient boosting. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM. https://doi.org/10.1145/2939672.2939785
Chipman HA, George EI, McCulloch RE (2010) BART: Bayesian additive regression trees. Ann Appl Stat 4(1):266–298. https://doi.org/10.1214/09-aoas285
Article MathSciNet MATH Google Scholar
Cui Z, Chen W, He Y, Chen Y (2015) Optimal action extraction for random forests and boosted trees. In: Proceedings of the 21st ACM SIGKDD international conference on knowledge discovery and data mining—KDD ’15. ACM Press. https://doi.org/10.1145/2783258.2783281
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–22
MathSciNet MATH Google Scholar
Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning
Frees EW, Derrig RA, Meyers G (2014) Predictive modeling applications in actuarial science, vol 1. Cambridge University Press, Cambridge
Book Google Scholar
Friedman J (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Article MathSciNet Google Scholar
Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat 28(2):337–407. https://doi.org/10.1214/aos/1016218223
Article MATH Google Scholar
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42. https://doi.org/10.1007/s10994-006-6226-1
Article MATH Google Scholar
Goldstein A, Kapelner A, Bleich J, Pitkin E (2015) Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation. J Comput Graph Stat 24(1):44–65. https://doi.org/10.1080/10618600.2014.907095
Article MathSciNet Google Scholar
Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D (2019) A survey of methods for explaining black box models. ACM Comput Surv 51(5):1–42. https://doi.org/10.1145/3236009
Article Google Scholar
Hara S, Hayashi K (2016) Making tree ensembles interpretable. arXiv:1606.05390 [stat]
Hara S, Hayashi K (2018) Making tree ensembles interpretable: a Bayesian model selection approach. In: Storkey A, Perez-Cruz F (eds) Proceedings of the twenty-first international conference on artificial intelligence and statistics, Proceedings of machine learning research, vol 84. PMLR, Playa Blanca, pp 77–85. http://proceedings.mlr.press/v84/hara18a.html
Hastie T, Tibshirani R, Friedman J, Franklin J (2009) The elements of statistical learning: data mining, inference and prediction. Math Intell 27(2):83–85
MATH Google Scholar
Henckaerts R, Antonio K, Clijsters M, Verbelen R (2018) A data driven binning strategy for the construction of insurance tariff classes. Scand Actuar J 8:681–705. https://doi.org/10.1080/03461238.2018.1429300
Article MathSciNet MATH Google Scholar
Henckaerts R, Côté MP, Antonio K, Verbelen R (2019) Boosting insights in insurance tariff plans with tree-based machine learning methods. arXiv:1904.10890 [cs, stat]
Jacobs RA, Jordan MI, Barto AG (1991) Task decomposition through competition in a modular connectionist architecture: the what and where vision tasks. Cogn Sci 15(2):219–250. https://doi.org/10.1207/s15516709cog1502_2
Article Google Scholar
Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput 3(1):79–87. https://doi.org/10.1162/neco.1991.3.1.79
Article Google Scholar
Jordan M, Jacobs R (1993) Hierarchical mixtures of experts and the EM algorithm. In: Proceedings of 1993 international conference on neural networks (IJCNN-93-Nagoya, Japan), vol 2. IEEE, pp 1339–1344. https://doi.org/10.1109/ijcnn.1993.716791
Jordan MI, Jacobs RA (1991) Hierarchies of adaptive experts. In: NIPS
Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. In: Advances in neural information processing systems, pp 4765–4774
McLachlan G, Peel D (2000) Finite mixture models. Wiley. https://doi.org/10.1002/0471721182
Miller T (2019) Explanation in artificial intelligence: insights from the social sciences. Artif Intell 267:1–38. https://doi.org/10.1016/j.artint.2018.07.007
Article MathSciNet MATH Google Scholar
Noll A, Salzmann R, Wuthrich MV (2018) Case study: French motor third-party liability claims. SSRN Electron J. https://doi.org/10.2139/ssrn.3164764
Ohlsson E, Johansson B (2010) Non-life insurance pricing with generalized linear models. Springer, Berlin. https://doi.org/10.1007/978-3-642-10791-7
Ribeiro M, Singh S, Guestrin C (2016) “Why should i trust you?”: explaining the predictions of any classifier. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: demonstrations. Association for Computational Linguistics. https://doi.org/10.18653/v1/n16-3020
Therneau T, Atkinson E (2015) An introduction to recursive partitioning using the rpart routines
Tselentis DI, Yannis G, Vlahogianni EI (2016) Innovative insurance schemes: pay as/how you drive. Transp Res Procedia 14:362–371. https://doi.org/10.1016/j.trpro.2016.05.088
Verbelen R, Antonio K, Claeskens G (2018) Unravelling the predictive power of telematics data in car insurance pricing. J R Stat Soc Ser C (Appl Stat) 67(5):1275–1304. https://doi.org/10.1111/rssc.12283
Article MathSciNet Google Scholar
Vickrey W (1968) Automobile accidents, tort law, externalities, and insurance: an economist’s critique. Law Contemp Probl 33(3):464. https://doi.org/10.2307/1190938. https://www.jstor.org/stable/1190938?origin=crossref
Wuthrich MV, Buser C (2019) Data analytics for non-life insurance pricing. Swiss Finance Institute Research Paper (16-68)
Yang Y, Qian W, Zou H (2018) Insurance premium prediction via gradient tree-boosted tweedie compound Poisson models. J Bus Econ Stat 36(3):456–470. https://doi.org/10.1080/07350015.2016.1200981
Article MathSciNet Google Scholar

Download references

Acknowledgements

We thank the two anonymous reviewers for their careful reading of our manuscript and their insightful and relevant suggestions.

Author information

Authors and Affiliations

Univ Lyon, Université Claude Bernard Lyon, ISFA, Laboratoire SAFEA2429, 69366, Lyon, France
Arthur Maillart

Authors

Arthur Maillart
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arthur Maillart.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

A Results on partitions

Proof

The proof is organized in two parts. First, we prove that the reunion of $\tilde{{\mathcal {R}}}_g$ elements covers ${\mathbb {R}}^p$, i.e.,

For each $x \in {\mathbb {R}}^p$, there exists $i \in \{1,..,I_1\}$ and $k \in \{1,..,I_2\}$ such that $x \in \tilde{R}_{1, i} \cap \tilde{R}_{2, k}$.

Subsequently, we prove that the elements of $\tilde{{\mathcal {R}}}_g$ are pairwise disjoint, i.e.,

if $\tilde{R}_{g, 1} \in \tilde{R}_g$ and $\tilde{R}_{g, 2} \in \tilde{R}_g$ then $\tilde{R}_{g, i_1} \cap \tilde{R}_{g, i_2} = \emptyset $.

Let $x \in {\mathbb {R}}^p$, $\tilde{{\mathcal {R}}}_1$ be a partition of ${\mathbb {R}}^p$, so there exists $i \in \{ 1, I_1 \}$ such that $x \in \tilde{R}_{1, i}$. Symmetrically, there exists $k \in \{ 1, I_2 \}$ such that $x \in \tilde{R}_{2, k}$. Hence, $x \in \tilde{R}_{1, i} \cap \tilde{R}_{2, k}$. It means that $\tilde{R}_{1, i} \cap \tilde{R}_{2, k} \ne \emptyset $ and prove the first point.

Now, let us suppose that $\tilde{R}_{g, 1} \in \tilde{{\mathcal {R}}}_{g}$ and $\tilde{R}_{g, 2} \in \tilde{{\mathcal {R}}}_{g}$. There exists $i_1, i_2 \in \{ 1, I_1 \}$ and $k_1, k_2 \in \{ 1, I_2 \}$ such that $ \tilde{R}_{g, 1} = \tilde{R}_{1, i_1} \cap \tilde{R}_{2, k_1} $ and $ \tilde{R}_{g, 2} = \tilde{R}_{1, i_2} \cap \tilde{R}_{2, k_2} $ where $\tilde{R}_{1, i_1}, \tilde{R}_{1, i_2} \in \tilde{{\mathcal {R}}}_{1}$ and $\tilde{R}_{2, k_1}, \tilde{R}_{2, k_2} \in \tilde{{\mathcal {R}}}_{2}$. Suppose that $\tilde{R}_{g, 1} \cap \tilde{R}_{g, 2} \ne \emptyset $, then there exists $x \in {\mathbb {R}}^p$ such that

$$\begin{aligned} x \in \tilde{R}_{g, 1} \cap \tilde{R}_{g, 2} = \left( \tilde{R}_{1, i_1} \cap \tilde{R}_{1, i_2}\right) \cap (\tilde{R}_{2, k_1} \cap \tilde{R}_{2, k_2}) \end{aligned}$$

It means that $x \in \tilde{R}_{1, i_1} \cap \tilde{R}_{1, i_2}$ and hence that $\tilde{R}_{1, i_1} \cap \tilde{R}_{1, i_2} \ne \emptyset $. However, $\tilde{{\mathcal {R}}}_1$ is a partition. Therefore, the only way that $\tilde{R}_{1, i_1} \cap \tilde{R}_{1, i_2} = \emptyset $ is that $\tilde{R}_{1, i_1} = \tilde{R}_{1, i_2}$. The same argument holds for $\tilde{R}_{2, k_1} \cap \tilde{R}_{2, k_2}$. It follows directly that $\tilde{R}_{g, 1} = \tilde{R}_{g, 2}$. Hence, we prove that if two elements of $\tilde{{\mathcal {R}}}_g$ are not disjoint, they are equal. This concludes the proof. $\square $

B Exploratory data analysis

Table 7 Table of predictors

Full size table

1.1 Exposure variable selection

To visualize the relationships between the claim frequency and explanatory variables, we represent a scatter plot of the log claim frequency as a function of each continuous policy predictor: License seniority, Age, and Age registration. Those graphs allow us to see immediately that there is no log-linear relationship between these variables and the target. Hence, we need to discretize continuous predictors to utilize the complex relationship between predictors and the target. As we can see, there is a clear trend for License seniority: the more experienced the driver is, the lower the claim frequency. This is less obvious for Age and Age registration. This is why we choose to discretize our variables with a tree- based method described below.

Remark 14

This dataset contains very few points. Sometimes, for the highest values of continuous predictors, the claim frequency is null because no claim was reported. To represent this in Fig. 3, we remove those points because the log is not defined for these observations. Moreover, the Kwatt variable has too many zeros to be useful in a scatter plot. Therefore, we decided to represent only the discretized variable.

C Grid search

Table 8 Table of allowed random forest hyperparameters for grid search

Full size table

Table 9 Table of allowed gradient boosting hyperparameters for grid search

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maillart, A. Toward an explainable machine learning model for claim frequency: a use case in car insurance pricing with telematics data. Eur. Actuar. J. 11, 579–617 (2021). https://doi.org/10.1007/s13385-021-00270-5

Download citation

Received: 20 May 2020
Revised: 21 November 2020
Accepted: 03 March 2021
Published: 19 March 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s13385-021-00270-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Toward an explainable machine learning model for claim frequency: a use case in car insurance pricing with telematics data

Abstract

Access this article

Similar content being viewed by others

Prediction of Claim Probability with Excess Zeros

Bankruptcy prediction using machine learning and Shapley additive explanations

Using a Data Mining Approach to Detect Automobile Insurance Fraud

Notes

References

Acknowledgements