Skip to main content

Advertisement

Log in

Health policyholder clustering using medical consumption

A useful tool for targeting prevention plans

  • Original Research Paper
  • Published:
European Actuarial Journal Aims and scope Submit manuscript

Abstract

On paper, prevention appears to be a good complement to health insurance. However, its implementation is often costly. To maximize the impact and efficiency of prevention plans, plans should target particular groups of policyholders. In this article, we propose a way of clustering policyholders that could be a starting point for the targeting of prevention plans. This two-step method considers mainly policyholder health consumption for classification. The dimension is first reduced using a nonnegative matrix factorization algorithm, producing intermediate health product clusters. Policyholders are then clustered using Kohonen’s map algorithm. This leads to a natural visualization of the results, allowing the simple comparison of results from different databases. The method is applied to two real French health insurer datasets. The method is shown to be easily understandable and able to cluster most policyholders efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. The term health product is used for every item of health expenditure that may be refunded by the insurer (such as GP visits, nights at the hospital, medication and glasses).

  2. The legal retirement age in France is 62.

  3. More precisely, if H designs the frequency matrix as described above, the matrix \(log(H + 1)\) is computed.

  4. Six different implementations have been tested: those proposed by Lee and Seung (Lee, [37]), Brunet et al. (Brunet, [9]), Pascual-Montano et al. (nsNMF, [43]) and Badea (Offset, [4]) and the two proposed by Kim and Park (snmf/l and snmf/r, [31]). The “snmf/l” algorithm yields among the best results, while being significantly faster. Implementations from the R package “NMF”, developed by Gaujoux and Seoighe [19], were used in the analysis presented here.

  5. The R package “Kohonen”, developed by Wehrens et al. ([55]), was used in the analysis presented here.

  6. A well-known alternative is to choose the starting points via a PCA; however, Akinduko et al. show that this is not suitable for non-linear datasets [3].

  7. For the databases used here, dimension reduction dramatically improves clustering.

  8. These three concepts are quality indicators commonly measured for a clustering. Cophenetic correlation and dispersion aim to measure the stability and were introduced by Brunet et al. [9]. The silhouette was presented by Rousseeuw to test whether individuals are well clustered [49].

  9. The health product “Legal copayment” may be unfamiliar to the reader. In the French health system, many health products are partially reimbursed by the public insurer, “l’Assurance Maladie”. The price of health products is fixed by law (for example, a GP consultation costs 25 Euros). However, the public insurer does not refund all of this amount (only 16.5 Euros for GPs) to limit health consumption. Here, we call the 25–16.5 gap the “legal copayment”. Moreover, GPs are allowed to charge higher fees that are not covered by the public insurer. The reimbursement of the legal copayment is usually covered by private insurance.

  10. An individual can need glasses and have an operation in the same year. Such an individual would then belong to two clusters: the optic cluster and the hospitalization cluster. In this regard, the health sector is linked to a multi-label context.

  11. In fact, almost none of the 20 HPCs made using the PCA offer satisfying consistency.

  12. The \(R^{2}\) coefficient is given by \(R^{2} = 1 - \frac{\sum _{i=1}^{k} I_{C_{i}}}{I_{C}}\), with \(I_{C_{i}}\) denoting the inertia of the cluster \(C_{i}\). The inertia has been computed using the cosine similarity, following the recommendation of Huang [28].

  13. See Sect. 2 for a review of the subdatabases.

  14. According to Wikipedia, “Orthoptics is a profession allied to eye care professions whose primary emphasis is the diagnosis and non-surgical management of strabismus (wandering eyes), amblyopia (lazy eye) and eye movement disorders”.

  15. The same definition as that of Huang [28] has been used.

References

  1. Aggarwal CC, Yu PS (2000) Finding generalized projected clusters in high dimensional spaces, vol. 29. ACM

  2. Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications, vol. 27. ACM

  3. Akinduko AA, Mirkes EM, Gorban AN (2016) Som: Stochastic initialization versus principal components. Inform Sci 364:213–221

    Article  Google Scholar 

  4. Badea L (2008) Extracting gene expression profiles common to colon and pancreatic adenocarcinoma using simultaneous nonnegative matrix factorization. In: Biocomputing 2008, pp. 267–278. World Scientific

  5. Beaulieu N, Cutler DM, Ho K, Isham G, Lindquist T, Nelson A, O’Connor P (2006) The business case for diabetes disease management for managed care organizations. In: Forum for Health Economics & Policy, vol. 9. De Gruyter

  6. Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is nearest neighbor meaningful? In: International conference on database theory, pp. 217–235. Springer, Berlin

  7. Boutsidis C, Gallopoulos E (2008) SVD based initialization: a head start for nonnegative matrix factorization. Pattern Recogn 41(4):1350–1362

    Article  MATH  Google Scholar 

  8. Brockett PL, Xia X, Derrig RA (1998) Using Kohonen’s self-organizing feature map to uncover automobile bodily injury claims fraud. J Risk Insur pp. 245–274

  9. Brunet JP, Tamayo P, Golub TR, Mesirov JP (2004) Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci 101(12):4164–4169

    Article  Google Scholar 

  10. Bühlmann H, Gisler A (2006) A course in credibility theory and its applications. Springer Science & Business Media, Berlin

    MATH  Google Scholar 

  11. Cardoso-Cachopo A (2007) Improving Methods for Single-label Text Categorization. PdD Thesis, Instituto Superior Tecnico, Universidade Tecnica de Lisboa

  12. Cheng CH, Fu AW, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 84–93. ACM

  13. Darblade M (2015) Analyse de profils de consommation et tarification des futures garanties sur-complémentaire santé. Master’s thesis, ISFA

  14. Dargent-Molina P, Cassou B (2008) Prévention des chutes et des fractures chez les femmes âgées. Gérontologie et société 31(2):65–78

    Article  Google Scholar 

  15. Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inform Sci 41(6):391–407

    Article  Google Scholar 

  16. Derrig RA, Ostaszewski KM (1995) Fuzzy techniques of pattern recognition in risk and claim classification. J Risk Insur 62(3):447–482

    Article  Google Scholar 

  17. Ding C, He X (2004) K-means clustering via principal component analysis. In: Proceedings of the twenty-first international conference on Machine learning, p. 29. ACM

  18. Gauchon R, Hermet JP (2019) La psychiatrie: un risque important en assurance santé?

  19. Gaujoux R, Seoighe C (2010) A flexible r package for nonnegative matrix factorization. BMC Bioinform 11(1):367

    Article  Google Scholar 

  20. Ghoreyshi S, Hosseinkhani J (2015) Developing a clustering model based on k-means algorithm in order to creating different policies for policyholders in insurance industry. Int J Adv Comput Sci Inf Technol (IJACSIT) 4(2):46–53

    Google Scholar 

  21. Hainaut D (2019) A self-organizing predictive map for non-life insurance. Eur Actuar J 9(1):173–207

    Article  MathSciNet  MATH  Google Scholar 

  22. Henckaerts R, Antonio K, Clijsters M, Verbelen R (2018) A data driven binning strategy for the construction of insurance tariff classes. Scand Actuar J 2018(8):681–705

    Article  MathSciNet  MATH  Google Scholar 

  23. Herring B (2010) Suboptimal provision of preventive healthcare due to expected enrollee turnover among private insurers. Health Econ 19(4):438–448

    Article  Google Scholar 

  24. Hinneburg A, Keim DA (1999) Optimal grid-clustering: Towards breaking the curse of dimensionality in high-dimensional clustering. pp. 506–517. 25 th International Conference on Very Large Databases

  25. Hinton GE, Salakhutdinov RR (2009) Replicated softmax: an undirected topic model. In: Y. Bengio, D. Schuurmans, J.D. Lafferty, C.K.I. Williams, A. Culotta (eds.) Advances in Neural Information Processing Systems 22, pp. 1607–1614. Curran Associates, Inc.

  26. Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469

    MathSciNet  MATH  Google Scholar 

  27. Hoyle D, Rattray M (2003) PCA learning for sparse high-dimensional data. EPL (Europhys Lett) 62(1):117

    Article  Google Scholar 

  28. Huang A (2008) Similarity measures for text document clustering. In: Proceedings of the sixth new zealand computer science research student conference (NZCSRSC2008), Christchurch, New Zealand, pp. 49–56

  29. Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on Theory of computing, pp. 604–613. ACM

  30. Jones BW, Chung W (2016) Topic modeling of small sequential documents: Proposed experiments for detecting terror attacks. In: Intelligence and Security Informatics (ISI), 2016 IEEE Conference on, pp. 310–312. IEEE

  31. Kim H, Park H (2007) Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12):1495–1502

    Article  Google Scholar 

  32. Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480

    Article  Google Scholar 

  33. Kuang D, Choo J, Park H (2015) Nonnegative matrix factorization for interactive topic modeling and document clustering. Partitional clustering algorithms. Springer, Berlin, pp 215–243

    Google Scholar 

  34. Kuo R, Lin S, Shih C (2007) Mining association rules through integration of clustering analysis and ant colony system for health insurance database in taiwan. Expert Syst Appl 33(3):794–808

    Article  Google Scholar 

  35. Langville AN, Meyer CD, Albright R, Cox J, Duling D (2006) Initializations for the nonnegative matrix factorization. In: Proceedings of the twelfth ACM SIGKDD international conference on knowledge discovery and data mining, pp. 23–26. Citeseer

  36. Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, Carter SL, Stewart C, Mermel CH, Roberts SA, Kiezun A (2013) Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499(7457):214

    Article  Google Scholar 

  37. Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788

    Article  MATH  Google Scholar 

  38. Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems, pp. 556–562

  39. Mote SR, Baid UR, Talbar SN (2017) Non-negative matrix factorization and self-organizing map for brain tumor segmentation. In: Wireless Communications, Signal Processing and Networking (WiSPNET), 2017 International Conference on, pp. 1133–1137. IEEE

  40. Murtagh F (1995) Interpreting the Kohonen self-organizing feature map using contiguity-constrained clustering. Pattern Recogn Lett 16(4):399–408

    Article  Google Scholar 

  41. Nesvijevskaia A, Taudou B (2016) La data science au service de la prévention santé et prévoyance : nouveaux paradigmes - 17eme rencontre mutré, 14–15 november - nantes. Tech. rep, Malakoff Mederic

    Google Scholar 

  42. Paatero P, Tapper U (1994) Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2):111–126

    Article  Google Scholar 

  43. Pascual-Montano A, Carazo JM, Kochi K, Lehmann D, Pascual-Marqui RD (2006) Nonsmooth nonnegative matrix factorization (nsnmf). IEEE Trans Pattern Anal Mach Intell 28(3):403–415

    Article  Google Scholar 

  44. Pauca VP, Piper J, Plemmons RJ (2006) Nonnegative matrix factorization for spectral data analysis. Linear Algebra Appl 416(1):29–47

    Article  MathSciNet  MATH  Google Scholar 

  45. Pauca VP, Shahnaz F, Berry MW, Plemmons RJ (2004) Text mining using non-negative matrix factorizations. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 452–456. SIAM

  46. Peng Y, Kou G, Sabatka A, Chen Z, Khazanchi D, Shi Y (2006) Application of clustering methods to health insurance fraud detection. In: Service Systems and Service Management, 2006 International Conference on, vol. 1, pp. 116–120. IEEE (2006)

  47. Rennie JD, Shih L, Teevan J, Karger DR (2003) Tackling the poor assumptions of naive bayes text classifiers. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp. 616–623

  48. Robertson S (2004) Understanding inverse document frequency: on theoretical arguments for idf. J Doc 60(5):503–520

    Article  Google Scholar 

  49. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Article  MATH  Google Scholar 

  50. Settles B, Craven M, Ray S (2008) Multiple-instance active learning. In: Advances in neural information processing systems, pp. 1289–1296

  51. Utsumi A (2010) Evaluating the performance of nonnegative matrix factorization for constructing semantic spaces: Comparison to latent semantic analysis. In: 2010 IEEE International Conference on Systems, Man and Cybernetics, pp. 2893–2900. IEEE

  52. Van Benthem MH, Keenan MR (2004) Fast algorithm for the solution of large-scale non-negativity-constrained least squares problems. J Chemometr J Chemometr Soc 18(10):441–450

    Article  Google Scholar 

  53. Verrall RJ, Yakoubov YH (1999) A fuzzy approach to grouping by policyholder age in general insurance. J Actuar Pract 7:181–204

    MATH  Google Scholar 

  54. Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1225–1234. ACM

  55. Wehrens R, Buydens LM et al (2007) Self-and super-organizing maps in r: the Kohonen package. J Stat Softw 21(5):1–19

    Article  Google Scholar 

  56. World Health Organization (2017) Depression and other common mental disorders: global health estimates. No. WHO/MSD/MER/2017.2

  57. Wu B, Wang E, Zhu Z, Chen W, Xiao P (2018) Manifold nmf with l21 norm for clustering. Neurocomputing 273:78–88

    Article  Google Scholar 

  58. Xu H, Caramanis C, Sanghavi S (2010) Robust pca via outlier pursuit. In: Advances in Neural Information Processing Systems, pp. 2496–2504

  59. Xu L, Yuille AL (1995) Robust principal component analysis by self-organizing rules based on statistical physics approach. IEEE Trans Neural Netw 6(1):131–143

    Article  Google Scholar 

  60. Yeo AC, Smith KA, Willis RJ, Brooks M (2001) Clustering technique for risk classification and prediction of claim costs in the automobile insurance industry. Intell Syst Account Finan Manag 10(1):39–50

    Article  Google Scholar 

  61. Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. In: ACM Sigmod Record, vol. 25, pp. 103–114. ACM

Download references

Acknowledgements

The authors would like to thank Alexandra Barral for useful comments over the duration of this research, and Nabil Rachdi for technical advice. They are also grateful to Addactis in France for providing the data, and to everyone (including two reviewers) who had reread the paper. This research was carried out in the framework of the Chair Prevent'Horizon, supported by the risk foundation Louis Bachelier and in partnership with Claude Bernard Lyon 1 University, Addactis in France, AG2R La Mondiale, G2S, Covea, Groupama Gan Vie, Groupe Pasteur Mutualité, Harmonie Mutuelle, Humanis Prévoyance and La Mutuelle Générale. S. Loisel acknowledges support from the IDR Actuariat Durable sponsored by Milliman Paris, and the DAMI research chair sponsored by BNP Paribas Cardif.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Romain Gauchon.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Appendix 1: Comparing the clusters obtained using the proposed method with those obtained using a very basic approach

See Figs. 12 and 13.

Fig. 12
figure 12

Detailed statistics for all classes

Fig. 13
figure 13

Consumption if consuming at least some of a particular health product and overall consumption

1.2 Appendix 2: An example of another map obtained from the same data

See Fig. 14.

Fig. 14
figure 14

Self-organizing map obtained from the same data as in Fig. 6, using a different seed

1.3 Appendix 3: Test of the algorithm on a database with known clusters

While several tests have been carried out to test the relevance of the results obtained in health insurance databases, it is impossible to compute an objective error metric because the underlying clusters are unknown. To process this kind of test, it is thus necessary to use a dataset coming from another field. For example, the text mining field offers many databases with known clusters. Moreover, it is a common practice in this field to work with a word frequency matrix, making it realistic to apply the NMF/Kohonen process to a text-mining dataset.

The 20-newsgroups dataset is chosen to perform this test. This is a well-known text-mining dataset and has been used for various text-mining tasks, such as word embedding (e.g., [25, 54]), unsupervised clustering (e.g., [17, 28]) and supervised classification (e.g., [47, 50]). The training test dataset has been used, representing 11 293 texts from 20 different newsgroups. For this study, we use the dataset as pre-processed by Cardoso Cachopo [11] (the “no-short” dataset). The objective is to find the original newsgroup of each document.

As text mining is not one of the goals of this paper, the results presented below come from the first run of the algorithm, without attempts to calibrate the model or improve the results. The dimension is first reduced to 60 before clustering, and the frequency matrix is pre-processed using the tf-idf method, which is a common practice in text mining. Since the 20-newsgroups dataset contains 20 different natural clusters, the HAC has been calibrated to obtain 20 different classes.

From the Kohonen map (Fig. 15), it is possible to see that clusters 2 and 10 are spread out. Moreover, clusters 6 and 9 seem significantly larger than the others. Their purity score confirms that they are less homogeneous than the other clusters (purity is shown in Fig. 17). Except for in these four clusters and cluster 18, purity is acceptable. The overall purity is 62%, and the total entropyFootnote 15 is 0.4, which is significantly better than the results obtained by Huang from the same dataset [28], even though we do not aim to achieve a good score.

Fig. 15
figure 15

Kohonen’s map using the 20-Newsgroups Dataset. Cluster 10 cannot be constructed

Comparing Figs. 16 and 17, even though the algorithm does not identify all of the documents in a given cluster, the resulting clusters are still reliable. This means that if one wants to identify all of the policyholders with psychiatric medication (for example), this algorithm is not very appropriate. However, if a psychiatric class is identified, it is reliable enough to justify the targeting of a prevention plan.

Fig. 16
figure 16

Newsgroup reconstitution capacity

Fig. 17
figure 17

Cluster purity

To summarize, the method produces acceptable results for the 20-newsgroup dataset. Most of the clusters represent a specific newsgroup. However, the method cannot differentiate between very similar newsgroups, such as IBM and Mac computers. This produces large clusters containing most of the documents the method cannot differentiate.

This clustering method is thus able to construct meaningful policyholder clusters. However, large classes (such as the everyday-care cluster) are heterogeneous and should not be used to target prevention plans: they contain policyholders who cannot be differentiated by the algorithm.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gauchon, R., Loisel, S. & Rullière, JL. Health policyholder clustering using medical consumption. Eur. Actuar. J. 10, 599–626 (2020). https://doi.org/10.1007/s13385-020-00244-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13385-020-00244-z

Keywords

Navigation