Statistical Inference for Streamed Longitudinal Data,arXiv - STAT - Methodology

当前位置： X-MOL 学术 › arXiv.stat.ME › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Statistical Inference for Streamed Longitudinal Data
arXiv - STAT - Methodology Pub Date : 2022-08-04 , DOI: arxiv-2208.02890
Lan Luo, Jingshen Wang, Emily C. Hector

Modern longitudinal data, for example from wearable devices, measures biological signals on a fixed set of participants at a diverging number of time points. Traditional statistical methods are not equipped to handle the computational burden of repeatedly analyzing the cumulatively growing dataset each time new data is collected. We propose a new estimation and inference framework for dynamic updating of point estimates and their standard errors across serially collected dependent datasets. The key technique is a decomposition of the extended score function of the quadratic inference function constructed over the cumulative longitudinal data into a sum of summary statistics over data batches. We show how this sum can be recursively updated without the need to access the whole dataset, resulting in a computationally efficient streaming procedure with minimal loss of statistical efficiency. We prove consistency and asymptotic normality of our streaming estimator as the number of data batches diverges, even as the number of independent participants remains fixed. Simulations highlight the advantages of our approach over traditional statistical methods that assume independence between data batches. Finally, we investigate the relationship between physical activity and several diseases through the analysis of accelerometry data from the National Health and Nutrition Examination Survey.

中文翻译：

流式纵向数据的统计推断

现代纵向数据，例如来自可穿戴设备的数据，在不同数量的时间点测量一组固定参与者的生物信号。传统的统计方法无法处理每次收集新数据时重复分析累积增长的数据集的计算负担。我们提出了一个新的估计和推理框架，用于在连续收集的相关数据集中动态更新点估计及其标准误差。关键技术是将在累积纵向数据上构建的二次推理函数的扩展得分函数分解为数据批次的汇总统计总和。我们展示了如何在不需要访问整个数据集的情况下递归更新这个总和，导致计算效率高的流式处理过程，统计效率损失最小。我们证明了我们的流估计器的一致性和渐近正态性，因为数据批次的数量不同，即使独立参与者的数量保持不变。模拟突出了我们的方法相对于假设数据批次之间独立的传统统计方法的优势。最后，我们通过分析来自全国健康和营养调查的加速度计数据来调查身体活动与几种疾病之间的关系。模拟突出了我们的方法相对于假设数据批次之间独立的传统统计方法的优势。最后，我们通过分析来自全国健康和营养调查的加速度计数据来调查身体活动与几种疾病之间的关系。模拟突出了我们的方法相对于假设数据批次之间独立的传统统计方法的优势。最后，我们通过分析来自全国健康和营养调查的加速度计数据来调查身体活动与几种疾病之间的关系。

更新日期：2022-08-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文