Skip to main content
Log in

Toward an explainable machine learning model for claim frequency: a use case in car insurance pricing with telematics data

  • Original Research Paper
  • Published:
European Actuarial Journal Aims and scope Submit manuscript

Abstract

In this paper, we suggest an explainable machine learning approach to model the claim frequency of a telematics car dataset. In fact, we use a data-driven method based on tree ensembles, namely, the random forest, to create a claim frequency model. Then, we present a method to build a tree that faithfully synthesizes the predictions of a tree ensemble model such as those derived from the random forest or gradient boosting. This tree serves as a global explanation of the predictions of the black-box. Thanks to this surrogate model, we can extract knowledge from a black-box tree ensemble model. Then, we provide an application to improve the performance of a generalized linear model. Indeed, we integrate this new knowledge into a generalized linear model to increase the predictive power.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

Similar content being viewed by others

Notes

  1. https://github.com/fpechon/rfCountData.

  2. https://github.com/sato9hara/defragTrees.

References

  1. Beard RE, Pentikäinen T, Pesonen E (1984) Risk theory. Springer Netherlands.https://doi.org/10.1007/978-94-011-7680-4

  2. Boucher JP, Côté S, Guillen M (2017) Exposure as duration and distance in telematics motor insurance using generalized additive models. Risks 5(4):54. https://doi.org/10.3390/risks5040054. http://www.mdpi.com/2227-9091/5/4/54

  3. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  4. Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton

    MATH  Google Scholar 

  5. Chen T, Guestrin C (2016) XGBoost: extreme gradient boosting. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM. https://doi.org/10.1145/2939672.2939785

  6. Chipman HA, George EI, McCulloch RE (2010) BART: Bayesian additive regression trees. Ann Appl Stat 4(1):266–298. https://doi.org/10.1214/09-aoas285

    Article  MathSciNet  MATH  Google Scholar 

  7. Cui Z, Chen W, He Y, Chen Y (2015) Optimal action extraction for random forests and boosted trees. In: Proceedings of the 21st ACM SIGKDD international conference on knowledge discovery and data mining—KDD ’15. ACM Press. https://doi.org/10.1145/2783258.2783281

  8. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–22

    MathSciNet  MATH  Google Scholar 

  9. Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning

  10. Frees EW, Derrig RA, Meyers G (2014) Predictive modeling applications in actuarial science, vol 1. Cambridge University Press, Cambridge

    Book  Google Scholar 

  11. Friedman J (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232

    Article  MathSciNet  Google Scholar 

  12. Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat 28(2):337–407. https://doi.org/10.1214/aos/1016218223

    Article  MATH  Google Scholar 

  13. Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42. https://doi.org/10.1007/s10994-006-6226-1

    Article  MATH  Google Scholar 

  14. Goldstein A, Kapelner A, Bleich J, Pitkin E (2015) Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation. J Comput Graph Stat 24(1):44–65. https://doi.org/10.1080/10618600.2014.907095

    Article  MathSciNet  Google Scholar 

  15. Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D (2019) A survey of methods for explaining black box models. ACM Comput Surv 51(5):1–42. https://doi.org/10.1145/3236009

    Article  Google Scholar 

  16. Hara S, Hayashi K (2016) Making tree ensembles interpretable. arXiv:1606.05390 [stat]

  17. Hara S, Hayashi K (2018) Making tree ensembles interpretable: a Bayesian model selection approach. In: Storkey A, Perez-Cruz F (eds) Proceedings of the twenty-first international conference on artificial intelligence and statistics, Proceedings of machine learning research, vol 84. PMLR, Playa Blanca, pp 77–85. http://proceedings.mlr.press/v84/hara18a.html

  18. Hastie T, Tibshirani R, Friedman J, Franklin J (2009) The elements of statistical learning: data mining, inference and prediction. Math Intell 27(2):83–85

    MATH  Google Scholar 

  19. Henckaerts R, Antonio K, Clijsters M, Verbelen R (2018) A data driven binning strategy for the construction of insurance tariff classes. Scand Actuar J 8:681–705. https://doi.org/10.1080/03461238.2018.1429300

    Article  MathSciNet  MATH  Google Scholar 

  20. Henckaerts R, Côté MP, Antonio K, Verbelen R (2019) Boosting insights in insurance tariff plans with tree-based machine learning methods. arXiv:1904.10890 [cs, stat]

  21. Jacobs RA, Jordan MI, Barto AG (1991) Task decomposition through competition in a modular connectionist architecture: the what and where vision tasks. Cogn Sci 15(2):219–250. https://doi.org/10.1207/s15516709cog1502_2

    Article  Google Scholar 

  22. Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput 3(1):79–87. https://doi.org/10.1162/neco.1991.3.1.79

    Article  Google Scholar 

  23. Jordan M, Jacobs R (1993) Hierarchical mixtures of experts and the EM algorithm. In: Proceedings of 1993 international conference on neural networks (IJCNN-93-Nagoya, Japan), vol 2. IEEE, pp 1339–1344. https://doi.org/10.1109/ijcnn.1993.716791

  24. Jordan MI, Jacobs RA (1991) Hierarchies of adaptive experts. In: NIPS

  25. Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. In: Advances in neural information processing systems, pp 4765–4774

  26. McLachlan G, Peel D (2000) Finite mixture models. Wiley. https://doi.org/10.1002/0471721182

  27. Miller T (2019) Explanation in artificial intelligence: insights from the social sciences. Artif Intell 267:1–38. https://doi.org/10.1016/j.artint.2018.07.007

    Article  MathSciNet  MATH  Google Scholar 

  28. Noll A, Salzmann R, Wuthrich MV (2018) Case study: French motor third-party liability claims. SSRN Electron J. https://doi.org/10.2139/ssrn.3164764

  29. Ohlsson E, Johansson B (2010) Non-life insurance pricing with generalized linear models. Springer, Berlin. https://doi.org/10.1007/978-3-642-10791-7

  30. Ribeiro M, Singh S, Guestrin C (2016) “Why should i trust you?”: explaining the predictions of any classifier. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: demonstrations. Association for Computational Linguistics. https://doi.org/10.18653/v1/n16-3020

  31. Therneau T, Atkinson E (2015) An introduction to recursive partitioning using the rpart routines

  32. Tselentis DI, Yannis G, Vlahogianni EI (2016) Innovative insurance schemes: pay as/how you drive. Transp Res Procedia 14:362–371. https://doi.org/10.1016/j.trpro.2016.05.088

  33. Verbelen R, Antonio K, Claeskens G (2018) Unravelling the predictive power of telematics data in car insurance pricing. J R Stat Soc Ser C (Appl Stat) 67(5):1275–1304. https://doi.org/10.1111/rssc.12283

    Article  MathSciNet  Google Scholar 

  34. Vickrey W (1968) Automobile accidents, tort law, externalities, and insurance: an economist’s critique. Law Contemp Probl 33(3):464. https://doi.org/10.2307/1190938. https://www.jstor.org/stable/1190938?origin=crossref

  35. Wuthrich MV, Buser C (2019) Data analytics for non-life insurance pricing. Swiss Finance Institute Research Paper (16-68)

  36. Yang Y, Qian W, Zou H (2018) Insurance premium prediction via gradient tree-boosted tweedie compound Poisson models. J Bus Econ Stat 36(3):456–470. https://doi.org/10.1080/07350015.2016.1200981

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank the two anonymous reviewers for their careful reading of our manuscript and their insightful and relevant suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arthur Maillart.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

A Results on partitions

Proof

The proof is organized in two parts. First, we prove that the reunion of \(\tilde{{\mathcal {R}}}_g\) elements covers \({\mathbb {R}}^p\), i.e.,

  • For each \(x \in {\mathbb {R}}^p\), there exists \(i \in \{1,..,I_1\}\) and \(k \in \{1,..,I_2\}\) such that \(x \in \tilde{R}_{1, i} \cap \tilde{R}_{2, k}\).

Subsequently, we prove that the elements of \(\tilde{{\mathcal {R}}}_g\) are pairwise disjoint, i.e.,

  • if \(\tilde{R}_{g, 1} \in \tilde{R}_g\) and \(\tilde{R}_{g, 2} \in \tilde{R}_g\) then \(\tilde{R}_{g, i_1} \cap \tilde{R}_{g, i_2} = \emptyset \).

Let \(x \in {\mathbb {R}}^p\), \(\tilde{{\mathcal {R}}}_1\) be a partition of \({\mathbb {R}}^p\), so there exists \(i \in \{ 1, I_1 \}\) such that \(x \in \tilde{R}_{1, i}\). Symmetrically, there exists \(k \in \{ 1, I_2 \}\) such that \(x \in \tilde{R}_{2, k}\). Hence, \(x \in \tilde{R}_{1, i} \cap \tilde{R}_{2, k}\). It means that \(\tilde{R}_{1, i} \cap \tilde{R}_{2, k} \ne \emptyset \) and prove the first point.

Now, let us suppose that \(\tilde{R}_{g, 1} \in \tilde{{\mathcal {R}}}_{g}\) and \(\tilde{R}_{g, 2} \in \tilde{{\mathcal {R}}}_{g}\). There exists \(i_1, i_2 \in \{ 1, I_1 \}\) and \(k_1, k_2 \in \{ 1, I_2 \}\) such that \( \tilde{R}_{g, 1} = \tilde{R}_{1, i_1} \cap \tilde{R}_{2, k_1} \) and \( \tilde{R}_{g, 2} = \tilde{R}_{1, i_2} \cap \tilde{R}_{2, k_2} \) where \(\tilde{R}_{1, i_1}, \tilde{R}_{1, i_2} \in \tilde{{\mathcal {R}}}_{1}\) and \(\tilde{R}_{2, k_1}, \tilde{R}_{2, k_2} \in \tilde{{\mathcal {R}}}_{2}\). Suppose that \(\tilde{R}_{g, 1} \cap \tilde{R}_{g, 2} \ne \emptyset \), then there exists \(x \in {\mathbb {R}}^p\) such that

$$\begin{aligned} x \in \tilde{R}_{g, 1} \cap \tilde{R}_{g, 2} = \left( \tilde{R}_{1, i_1} \cap \tilde{R}_{1, i_2}\right) \cap (\tilde{R}_{2, k_1} \cap \tilde{R}_{2, k_2}) \end{aligned}$$

It means that \(x \in \tilde{R}_{1, i_1} \cap \tilde{R}_{1, i_2}\) and hence that \(\tilde{R}_{1, i_1} \cap \tilde{R}_{1, i_2} \ne \emptyset \). However, \(\tilde{{\mathcal {R}}}_1\) is a partition. Therefore, the only way that \(\tilde{R}_{1, i_1} \cap \tilde{R}_{1, i_2} = \emptyset \) is that \(\tilde{R}_{1, i_1} = \tilde{R}_{1, i_2}\). The same argument holds for \(\tilde{R}_{2, k_1} \cap \tilde{R}_{2, k_2}\). It follows directly that \(\tilde{R}_{g, 1} = \tilde{R}_{g, 2}\). Hence, we prove that if two elements of \(\tilde{{\mathcal {R}}}_g\) are not disjoint, they are equal. This concludes the proof. \(\square \)

B Exploratory data analysis

Table 7 Table of predictors

1.1 Exposure variable selection

Fig. 25
figure 25

Time as exposure

Fig. 26
figure 26

Distance as exposure

To visualize the relationships between the claim frequency and explanatory variables, we represent a scatter plot of the log claim frequency as a function of each continuous policy predictor: License seniority, Age, and Age registration. Those graphs allow us to see immediately that there is no log-linear relationship between these variables and the target. Hence, we need to discretize continuous predictors to utilize the complex relationship between predictors and the target. As we can see, there is a clear trend for License seniority: the more experienced the driver is, the lower the claim frequency. This is less obvious for Age and Age registration. This is why we choose to discretize our variables with a tree- based method described below.

Remark 14

This dataset contains very few points. Sometimes, for the highest values of continuous predictors, the claim frequency is null because no claim was reported. To represent this in Fig. 3, we remove those points because the log is not defined for these observations. Moreover, the Kwatt variable has too many zeros to be useful in a scatter plot. Therefore, we decided to represent only the discretized variable.

C Grid search

Table 8 Table of allowed random forest hyperparameters for grid search
Table 9 Table of allowed gradient boosting hyperparameters for grid search

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maillart, A. Toward an explainable machine learning model for claim frequency: a use case in car insurance pricing with telematics data. Eur. Actuar. J. 11, 579–617 (2021). https://doi.org/10.1007/s13385-021-00270-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13385-021-00270-5

Keywords

Navigation