当前位置: X-MOL 学术Transl. Psychiaty › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine learning for effectively avoiding overfitting is a crucial strategy for the genetic prediction of polygenic psychiatric phenotypes.
Translational Psychiatry ( IF 5.8 ) Pub Date : 2020-08-17 , DOI: 10.1038/s41398-020-00957-5
Yuta Takahashi 1, 2, 3 , Masao Ueki 2, 4 , Gen Tamiya 2, 4 , Soichi Ogishima 2 , Kengo Kinoshita 2, 5 , Atsushi Hozawa 2 , Naoko Minegishi 2 , Fuji Nagami 2 , Kentaro Fukumoto 6, 7 , Kotaro Otsuka 6, 7 , Kozo Tanno 6 , Kiyomi Sakata 6 , Atsushi Shimizu 6 , Makoto Sasaki 6 , Kenji Sobue 6 , Shigeo Kure 2 , Masayuki Yamamoto 2, 8 , Hiroaki Tomita 1, 2, 3
Affiliation  

The accuracy of previous genetic studies in predicting polygenic psychiatric phenotypes has been limited mainly due to the limited power in distinguishing truly susceptible variants from null variants and the resulting overfitting. A novel prediction algorithm, Smooth-Threshold Multivariate Genetic Prediction (STMGP), was applied to improve the genome-based prediction of psychiatric phenotypes by decreasing overfitting through selecting variants and building a penalized regression model. Prediction models were trained using a cohort of 3685 subjects in Miyagi prefecture and validated with an independently recruited cohort of 3048 subjects in Iwate prefecture in Japan. Genotyping was performed using HumanOmniExpressExome BeadChip Arrays. We used the target phenotype of depressive symptoms and simulated phenotypes with varying complexity and various effect-size distributions of risk alleles. The prediction accuracy and the degree of overfitting of STMGP were compared with those of state-of-the-art models (polygenic risk scores, genomic best linear-unbiased prediction, summary-data-based best linear-unbiased prediction, BayesR, and ridge regression). In the prediction of depressive symptoms, compared with the other models, STMGP showed the highest prediction accuracy with the lowest degree of overfitting, although there was no significant difference in prediction accuracy. Simulation studies suggested that STMGP has a better prediction accuracy for moderately polygenic phenotypes. Our investigations suggest the potential usefulness of STMGP for predicting polygenic psychiatric conditions while avoiding overfitting.



中文翻译:

机器学习有效地避免过度拟合是遗传预测多基因精神病学表型的关键策略。

以前的遗传研究在预测多基因精神病学表型方面的准确性受到限制,这主要是由于将真正易感的变体与无效变体区分开来的能力有限,并因此导致了过度拟合。一种新颖的预测算法,平滑阈值多元遗传预测(STMGP),通过选择变体和建立惩罚性回归模型来减少过度拟合,从而改善了基于基因组的精神病学表型预测。使用宫城县的3685名受试者对预测模型进行了训练,并通过日本岩手县的3048名受试者的独立招募进行了验证。使用HumanOmniExpressExome BeadChip Arrays进行基因分型。我们使用了抑郁症状的目标表型和模拟表型,这些表型具有不同的复杂性和风险等位基因的各种效应大小分布。将STMGP的预测准确性和过拟合程度与最新模型(多基因风险评分,基因组最佳线性无偏预测,基于摘要数据的最佳线性无偏预测,BayesR和ridge进行了比较)回归)。在抑郁症状的预测中,与其他模型相比,STMGP的预测准确性最高,而过度拟合程度最低,尽管预测准确性没有显着差异。仿真研究表明,STMGP对中度多基因表型具有更好的预测准确性。

更新日期:2020-08-22
down
wechat
bug