当前位置: X-MOL 学术North American Actuarial Journal › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predicting High-Cost Health Insurance Members through Boosted Trees and Oversampling: An Application Using the HCCI Database
North American Actuarial Journal Pub Date : 2020-07-22 , DOI: 10.1080/10920277.2020.1754242
Brian Hartman 1 , Rebecca Owen 2 , Zoe Gibbs 1
Affiliation  

Using the Health Care Cost Institute data (approximately 47 million members over seven years), we examine how to best predict which members will be high-cost next year. We find that cost history, age, and prescription drug coverage all predict high costs, with cost history being by far the most predictive. We also compare the predictive accuracy of logistic regression to extreme gradient boosting (XGBoost) and find that the added flexibility of the extreme gradient boosting improves the predictive power. Finally, we show that with extremely unbalanced classes (because high-cost members are so rare), oversampling the minority class provides a better XGBoost predictive model than undersampling the majority class or using the training data as is. Logistic regression performance seems unaffected by the method of sampling.



中文翻译:

通过助推树和过度采样预测高成本健康保险成员:使用HCCI数据库的应用程序

使用卫生保健成本研究所的数据(七年中约有4,700万会员),我们研究了如何最好地预测明年哪些会员将是高成本的。我们发现成本历史,年龄和处方药覆盖率均预测高成本,而成本历史迄今为止最具预测性。我们还比较了逻辑回归与极限梯度增强(XGBoost)的预测准确性,发现极限梯度增强的附加灵活性提高了预测能力。最后,我们表明,在班级极不平衡的情况下(因为高成本的成员很少见),与对大多数班级进行过低抽样或按原样使用培训数据相比,对少数派进行过度抽样提供了更好的XGBoost预测模型。Logistic回归性能似乎不受抽样方法的影响。

更新日期:2020-07-22
down
wechat
bug