当前位置: X-MOL 学术Calcif. Tissue Int. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine Learning Approaches for Fracture Risk Assessment: A Comparative Analysis of Genomic and Phenotypic Data in 5130 Older Men.
Calcified Tissue International ( IF 4.2 ) Pub Date : 2020-07-29 , DOI: 10.1007/s00223-020-00734-y
Qing Wu 1, 2 , Fatma Nasoz 3, 4 , Jongyun Jung 1, 2 , Bibek Bhattarai 3 , Mira V Han 1, 5
Affiliation  

The study aims were to develop fracture prediction models by using machine learning approaches and genomic data, as well as to identify the best modeling approach for fracture prediction. The genomic data of Osteoporotic Fractures in Men, cohort Study (n = 5130), were analyzed. After a comprehensive genotype imputation, genetic risk score (GRS) was calculated from 1103 associated Single Nucleotide Polymorphisms for each participant. Data were normalized and split into a training set (80%) and a validation set (20%) for analysis. Random forest, gradient boosting, neural network, and logistic regression were used to develop prediction models for major osteoporotic fractures separately, with GRS, bone density, and other risk factors as predictors. In model training, the synthetic minority oversampling technique was used to account for low fracture rate, and tenfold cross-validation was employed for hyperparameters optimization. In the testing, the area under curve (AUC) and accuracy were used to assess the model performance. The McNemar test was employed to examine the accuracy difference between models. The results showed that the prediction performance of gradient boosting was the best, with AUC of 0.71 and an accuracy of 0.88, and the GRS ranked as the 7th most important variable in the model. The performance of random forest and neural network were also significantly better than that of logistic regression. This study suggested that improving fracture prediction in older men can be achieved by incorporating genetic profiling and by utilizing the gradient boosting approach. This result should not be extrapolated to women or young individuals.



中文翻译:

用于骨折风险评估的机器学习方法:5130 名老年男性基因组和表型数据的比较分析。

该研究的目的是利用机器学习方法和基因组数据开发骨折预测模型,并确定骨折预测的最佳建模方法。对男性骨质疏松性骨折队列研究 ( n = 5130)的基因组数据 进行了分析。经过全面的基因型插补后,根据每位参与者的 1103 个相关单核苷酸多态性计算遗传风险评分 (GRS)。数据被标准化并分成训练集 (80%) 和验证集 (20%) 进行分析。采用随机森林、梯度提升、神经网络和逻辑回归分别开发主要骨质疏松性骨折的预测模型,以GRS、骨密度和其他危险因素作为预测因子。在模型训练中,采用合成少数过采样技术来解决低断裂率的问题,并采用十倍交叉验证来进行超参数优化。在测试中,使用曲线下面积(AUC)和准确性来评估模型性能。采用麦克尼马尔检验来检查模型之间的准确性差异。结果表明,梯度提升的预测性能最好,AUC为0.71,准确率为0.88,GRS在模型中排名第7。随机森林和神经网络的性能也明显优于逻辑回归。这项研究表明,可以通过结合基因分析和利用梯度增强方法来改善老年男性的骨折预测。这一结果不应推断给女性或年轻人。

更新日期:2020-07-29
down
wechat
bug