当前位置: X-MOL 学术BMC Med. Res. Methodol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sample size issues in time series regressions of counts on environmental exposures.
BMC Medical Research Methodology ( IF 4 ) Pub Date : 2020-01-28 , DOI: 10.1186/s12874-019-0894-6
Ben G Armstrong 1 , Antonio Gasparrini 1, 2 , Aurelio Tobias 3 , Francesco Sera 1
Affiliation  

BACKGROUND Regression analyses of time series of disease counts on environmental determinants are a prominent component of environmental epidemiology. For planning such studies, it can be useful to predict the precision of estimated coefficients and power to detect associations of given magnitude. Existing generic approaches for this have been found somewhat complex to apply and do not easily extend to multiple series studies analysed in two stages. We have sought a simpler approximate approach which can easily extend to multiple series and give insight into factors determining precision. METHODS We derive approximate expressions for precision and hence power in single and multiple time series studies of counts from basic statistical theory, compare the precision predicted by these with that estimated by analysis in real data from 51 cities of varying size, and illustrate the use of these estimators in a realistic planning scenario. RESULTS In single series studies with Poisson outcome distribution, precision and power depend only on the usable variation of exposure (i.e. that conditional on covariates) and the total number of disease events, regardless of how many days those are spread over. In multiple time series (eg multi-city) studies focusing on the meta-analytic mean coefficient, the usable exposure variation and the total number of events (in all series) are again the sole determinants if there is no between-series heterogeneity or within-series overdispersion. With heterogeneity, its extent and the number of series becomes important. For all but the crudest approximation the estimates of standard errors were on average within + 20% of those estimated in full analysis of actual data. CONCLUSIONS Predicting precision in coefficients from a planned time series study is possible simply and given limited information. The total number of disease events and usable exposure variation are the dominant factors when overdispersion and between-series heterogeneity are low.

中文翻译:

环境暴露计数时间序列回归中的样本量问题。

背景对环境决定因素的疾病计数时间序列的回归分析是环境流行病学的重要组成部分。对于规划此类研究,预测估计系数的精度和检测给定幅度关联的能力可能很有用。已经发现现有的通用方法应用起来有些复杂,并且不容易扩展到分两个阶段分析的多个系列研究。我们寻求了一种更简单的近似方法,它可以轻松扩展到多个系列并深入了解决定精度的因素。方法 我们从基本统计理论中推导出精确度的近似表达式,从而推导出计数的单个和多个时间序列研究中的功效,将这些预测的精度与 51 个不同规模城市的实际数据分析估计的精度进行比较,并说明这些估计量在现实规划场景中的使用。结果 在具有泊松结果分布的单系列研究中,精确度和功效仅取决于可用的暴露变异(即以协变量为条件的)和疾病事件的总数,而不管这些事件分布了多少天。在侧重于元分析平均系数的多个时间序列(例如多城市)研究中,如果没有序列间异质性或内部异质性,可用暴露变异和事件总数(在所有系列中)再次是唯一的决定因素-系列过度分散。对于异质性,其范围和系列数量变得很重要。除了最粗略的近似之外,标准误差的估计值平均在对实际数据的全面分析中估计值的 + 20% 以内。结论 从计划的时间序列研究中预测系数的精度是可能的,而且信息有限。当过度分散和系列间异质性较低时,疾病事件总数和可用暴露变异是主要因素。
更新日期:2020-01-30
down
wechat
bug