当前位置: X-MOL 学术Genet. Epidemiol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Genetic effect estimates in case‐control studies when a continuous variable is omitted from the model
Genetic Epidemiology ( IF 2.1 ) Pub Date : 2020-01-20 , DOI: 10.1002/gepi.22278
Ying Sheng 1 , Chiung‐Yu Huang 1 , Siarhei Lobach 2 , Lydia Zablotska 1 , Iryna Lobach 1 ,
Affiliation  

Large‐scale genome‐wide analyses scans on massive numbers of various cases and controls are archived in the genetic databases that are publically available, for example, the Database of Genotypes and Phenotypes (https://www.ncbi.nlm.nih.gov/gap/). These databases offer unprecscendented opportunity to study the genetic effects. Yet, the set of nongenetic variables in these databases is often brief. From the statistical literature, we know that omitting a continuous variable from a logistic regression model can result in biased estimates of odds ratios (OR), even when the omitted and the included variables are independent. We are interested in assessing what information is needed to recover the bias in the OR estimate of genotype due to omitting a continuous variable in settings when the actual values of the omitted variable are not available. We derive two estimating procedures that can recover the degree of bias based on a conditional density of the omitted variable given the disease status and the genotype or the known distribution of the omitted variable and frequency of the disease in the population. Importantly, our derivations show that omitting a continuous variable can result in either under‐ or over‐estimation of the genetic effects. We performed extensive simulation studies to examine bias, variability, false‐positive rate, and power in the model that omits a continuous variable. We show the application to two genome‐wide studies of Alzheimer's disease.

中文翻译:

当模型中省略连续变量时,病例对照研究中的遗传效应估计

对大量病例和对照进行的大规模基因组范围分析扫描已保存在可公开获得的遗传数据库中,例如,基因型和表型数据库(https://www.ncbi.nlm.nih.gov /间隙/)。这些数据库为研究遗传效应提供了前所未有的机会。但是,这些数据库中的非遗传​​变量集通常很简短。从统计文献中,我们知道从逻辑回归模型中忽略连续变量可能导致比值比(OR)的估计偏差,即使省略和包含的变量是独立的。我们有兴趣评估需要哪些信息来恢复基因型OR估计中的偏倚,因为当省略变量的实际值不可用时,由于省略了设置中的连续变量,因此需要进行设置。我们得出两个估计程序,这些程序可以根据给定的疾病状态和遗漏变量的基因型或已知分布以及人群中疾病的频率,根据遗漏变量的条件密度来恢复偏倚程度。重要的是,我们的推论表明,忽略连续变量会导致对遗传效应的低估或高估。我们进行了广泛的仿真研究,以检查模型中的偏差,变异性,假阳性率和忽略连续变量的功效。我们将其应用于两项针对阿尔茨海默氏病的全基因组研究。
更新日期:2020-01-20
down
wechat
bug