当前位置: X-MOL 学术Biom. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Statistical method for modeling sequencing data from different technologies in longitudinal studies with application to Huntington disease
Biometrical Journal ( IF 1.7 ) Pub Date : 2020-12-22 , DOI: 10.1002/bimj.201900235
Angga M Fuady 1, 2 , Willeke M C van Roon-Mom 3 , Szymon M Kiełbasa 1 , Hae-Won Uh 2 , Jeanine J Houwing-Duistermaat 2, 4
Affiliation  

Advancement of gene expression measurements in longitudinal studies enables the identification of genes associated with disease severity over time. However, problems arise when the technology used to measure gene expression differs between time points. Observed differences between the results obtained at different time points can be caused by technical differences. Modeling the two measurements jointly over time might provide insight into the causes of these different results. Our work is motivated by a study of gene expression data of blood samples from Huntington disease patients, which were obtained using two different sequencing technologies. At time point 1, DeepSAGE technology was used to measure the gene expression, with a subsample also measured using RNA-Seq technology. At time point 2, all samples were measured using RNA-Seq technology. Significant associations between gene expression measured by DeepSAGE and disease severity using data from the first time point could not be replicated by the RNA-Seq data from the second time point. We modeled the relationship between the two sequencing technologies using the data from the overlapping samples. We used linear mixed models with either DeepSAGE or RNA-Seq measurements as the dependent variable and disease severity as the independent variable. In conclusion, (1) for one out of 14 genes, the initial significant result could be replicated with both technologies using data from both time points; (2) statistical efficiency is lost due to disagreement between the two technologies, measurement error when predicting gene expressions, and the need to include additional parameters to account for possible differences.

中文翻译:

用于亨廷顿病纵向研究中不同技术测序数据建模的统计方法

纵向研究中基因表达测量的进步能够随着时间的推移识别与疾病严重程度相关的基因。然而,当用于测量基因表达的技术在时间点之间不同时,就会出现问题。在不同时间点获得的结果之间观察到的差异可能是由技术差异引起的。随着时间的推移对这两个测量联合建模可能会深入了解这些不同结果的原因。我们的工作源于对亨廷顿病患者血液样本的基因表达数据的研究,这些数据是使用两种不同的测序技术获得的。在时间点 1,DeepSAGE 技术用于测量基因表达,子样本也使用 RNA-Seq 技术测量。在时间点 2,所有样本均使用 RNA-Seq 技术进行测量。DeepSAGE 测量的基因表达与使用第一个时间点的数据的疾病严重程度之间的显着关联无法被第二个时间点的 RNA-Seq 数据复制。我们使用重叠样本的数据对两种测序技术之间的关系进行建模。我们使用线性混合模型,将 DeepSAGE 或 RNA-Seq 测量值作为因变量,将疾病严重程度作为自变量。总之,(1)对于 14 个基因中的一个,可以使用两种技术使用来自两个时间点的数据复制最初的显着结果;(2) 由于两种技术之间的分歧、预测基因表达时的测量误差以及需要包括额外的参数来解释可能的差异,导致统计效率下降。
更新日期:2020-12-22
down
wechat
bug