当前位置: X-MOL 学术Cancer Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Early detection of nasopharyngeal carcinoma through machine‐learning‐driven prediction model in a population‐based healthcare record database
Cancer Medicine ( IF 4 ) Pub Date : 2024-03-28 , DOI: 10.1002/cam4.7144
Jeng‐Wen Chen, Shih‐Tsang Lin, Yi‐Chun Lin, Bo‐Sian Wang, Yu‐Ning Chien, Hung‐Yi Chiou

ObjectiveEarly diagnosis and treatment of nasopharyngeal carcinoma (NPC) are vital for a better prognosis. Still, because of obscure anatomical sites and insidious symptoms, nearly 80% of patients with NPC are diagnosed at a late stage. This study aimed to validate a machine learning (ML) model utilizing symptom‐related diagnoses and procedures in medical records to predict nasopharyngeal carcinoma (NPC) occurrence and reduce the prediagnostic period.Materials and MethodsData from a population‐based health insurance database (2001–2008) were analyzed, comparing adults with and without newly diagnosed NPC. Medical records from 90 to 360 days before diagnosis were examined. Five ML algorithms (Light Gradient Boosting Machine [LGB], eXtreme Gradient Boosting [XGB], Multivariate Adaptive Regression Splines [MARS], Random Forest [RF], and Logistics Regression [LG]) were evaluated for optimal early NPC detection. We further use a real‐world data of 1 million individuals randomly selected for testing the final model. Model performance was assessed using AUROC. Shapley values identified significant contributing variables.ResultsLGB showed maximum predictive power using 14 features and 90 days before diagnosis. The LGB models achieved AUROC, specificity, and sensitivity were 0.83, 0.81, and 0.64 for the test dataset, respectively. The LGB‐driven NPC predictive tool effectively differentiated patients into high‐risk and low‐risk groups (hazard ratio: 5.85; 95% CI: 4.75–7.21). The model‐layering effect is valid.ConclusionsML approaches using electronic medical records accurately predicted NPC occurrence. The risk prediction model serves as a low‐cost digital screening tool, offering rapid medical decision support to shorten prediagnostic periods. Timely referral is crucial for high‐risk patients identified by the model.

中文翻译:

通过基于人群的医疗记录数据库中机器学习驱动的预测模型早期检测鼻咽癌

目的鼻咽癌(NPC)的早期诊断和治疗对于改善预后至关重要。但由于解剖部位不明确、症状隐匿,近80%的鼻咽癌患者确诊时已属晚期。本研究旨在验证机器学习 (ML) 模型,利用医疗记录中的症状相关诊断和程序来预测鼻咽癌 (NPC) 的发生并缩短诊断前时间。材料和方法数据来自基于人口的健康保险数据库 (2001- 2008)进行了分析,比较了有和没有新诊断的鼻咽癌的成年人。检查诊断前 90 至 360 天的医疗记录。对五种 ML 算法(Light Gradient Boosting Machine [LGB]、eXtreme Gradient Boosting [XGB]、多元自适应回归样条 [MARS]、随机森林 [RF] 和 Logistics Regression [LG])进行了评估,以实现最佳的早期 NPC 检测。我们进一步使用随机选择的 100 万个人的真实世界数据来测试最终模型。使用 AUROC 评估模型性能。 Shapley 值确定了重要的贡献变量。结果LGB 在诊断前 90 天使用 14 个特征显示出最大的预测能力。 LGB 模型在测试数据集上实现了 AUROC,特异性和敏感性分别为 0.83、0.81 和 0.64。 LGB 驱动的 NPC 预测工具有效地将患者分为高风险组和低风险组(风险比:5.85;95% CI:4.75-7.21)。模型分层效果有效。结论使用电子病历的机器学习方法可以准确预测鼻咽癌的发生。风险预测模型作为一种低成本的数字筛查工具,提供快速的医疗决策支持,以缩短诊断前的时间。及时转诊对于模型确定的高危患者至关重要。
更新日期:2024-03-28
down
wechat
bug