当前位置: X-MOL 学术Ann. Hum. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
COVID-19 and the epistemology of epidemiological models at the dawn of AI
Annals of Human Biology ( IF 1.7 ) Pub Date : 2020-11-23 , DOI: 10.1080/03014460.2020.1839132
George T. H. Ellison 1
Affiliation  

Summary

The models used to estimate disease transmission, susceptibility and severity determine what epidemiology can (and cannot tell) us about COVID-19. These include: ‘model organisms’ chosen for their phylogenetic/aetiological similarities; multivariable statistical models to estimate the strength/direction of (potentially causal) relationships between variables (through ‘causal inference’), and the (past/future) value of unmeasured variables (through ‘classification/prediction’); and a range of modelling techniques to predict beyond the available data (through ‘extrapolation’), compare different hypothetical scenarios (through ‘simulation’), and estimate key features of dynamic processes (through ‘projection’). Each of these models: address different questions using different techniques; involve assumptions that require careful assessment; and are vulnerable to generic and specific biases that can undermine the validity and interpretation of their findings. It is therefore necessary that the models used: can actually address the questions posed; and have been competently applied. In this regard, it is important to stress that extrapolation, simulation and projection cannot offer accurate predictions of future events when the underlying mechanisms (and the contexts involved) are poorly understood and subject to change. Given the importance of understanding such mechanisms/contexts, and the limited opportunity for experimentation during outbreaks of novel diseases, the use of multivariable statistical models to estimate the strength/direction of potentially causal relationships between two variables (and the biases incurred through their misapplication/misinterpretation) warrant particular attention. Such models must be carefully designed to address: ‘selection-collider bias’, ‘unadjusted confounding bias’ and ‘inferential mediator adjustment bias’ – all of which can introduce effects capable of enhancing, masking or reversing the estimated (true) causal relationship between the two variables examined.1 Selection-collider bias occurs when these two variables independently cause a third (the ‘collider’), and when this collider determines/reflects the basis for selection in the analysis. It is likely to affect all incompletely representative samples, although its effects will be most pronounced wherever selection is constrained (e.g. analyses focusing on infected/hospitalised individuals). Unadjusted confounding bias disrupts the estimated (true) causal relationship between two variables when: these share one (or more) common cause(s); and when the effects of these causes have not been adjusted for in the analyses (e.g. whenever confounders are unknown/unmeasured). Inferentially similar biases can occur when: one (or more) variable(s) (or ‘mediators’) fall on the causal path between the two variables examined (i.e. when such mediators are caused by one of the variables and are causes of the other); and when these mediators are adjusted for in the analysis. Such adjustment is commonplace when: mediators are mistaken for confounders; prediction models are mistakenly repurposed for causal inference; or mediator adjustment is used to estimate direct and indirect causal relationships (in a mistaken attempt at ‘mediation analysis’). These three biases are central to ongoing and unresolved epistemological tensions within epidemiology. All have substantive implications for our understanding of COVID-19, and the future application of artificial intelligence to ‘data-driven’ modelling of similar phenomena. Nonetheless, competently applied and carefully interpreted, multivariable statistical models may yet provide sufficient insight into mechanisms and contexts to permit more accurate projections of future disease outbreaks.



中文翻译:

AI诞生之初的COVID-19和流行病学模型的认识论

概要

用于估计疾病传播,易感性和严重程度的模型确定了流行病学可以(不能告诉我们)有关COVID-19的信息。其中包括:因系统发育/病因学相似性而选择的“模型生物”;多变量统计模型来估算的(潜在的因果关系)的强度/方向关系变量之间(通过“因果推断”),以及(过去/未来)未测量的变量(通过“分类/预测”);以及一系列建模技术,以预测超出可用数据的范围(通过“外推”),比较不同的假设情景(通过“模拟”)和估算动态过程的关键特征(通过“投影”)。每个模型:使用不同的技术解决不同的问题;涉及需要仔细评估的假设;并且容易受到一般性和特定性偏见的影响,这些偏见可能会破坏其发现的有效性解释性。因此,有必要使用以下模型:能够实际解决提出的问题;已经胜任了。在这方面,必须强调的是,当对潜在的机制(和所涉及的上下文)了解得很少并且可能发生变化时,外推,模拟和预测无法提供对未来事件的准确预测。鉴于了解此类机制/背景的重要性以及在新型疾病暴发期间进行实验的机会有限,因此,使用多变量统计模型来估算两个变量之间潜在因果关系的强度/方向(以及因误用/误解)需要特别注意。此类模型必须经过精心设计,以解决:“选择碰撞对撞机偏见”,“未调整的混杂偏差”和“推论中介调节偏差” –所有这些都会引入能够增强,掩盖或逆转所研究的两个变量之间的估计因果关系的效应。1当这两个变量独立发生时,就会发生选择对撞机偏差。导致第三个(“对撞机”),并且当此对撞机确定/反映了分析中选择的基础时。尽管在选择受到限制的任何地方(例如,针对受感染/住院患者的分析),其影响将最为明显,但它可能会影响所有不完全具有代表性的样本。在以下情况下,未经调整的混杂偏差会破坏两个变量之间的估计(真实)因果关系:变量具有一个(或多个)共同原因;掩盖或逆转所检查的两个变量之间的估计因果关系。1当两个变量独立地导致第三个变量(“对撞机”),并且该对撞机确定/反映了选择的基础时,就会发生选择对撞机偏差。分析。尽管在选择受到限制的任何地方(例如,针对受感染/住院患者的分析),其影响将最为明显,但它可能会影响所有不完全具有代表性的样本。在以下情况下,未经调整的混杂偏差会破坏两个变量之间的估计(真实)因果关系:变量具有一个(或多个)共同原因;掩盖或逆转所检查的两个变量之间的估计因果关系。1当两个变量独立地导致第三个变量(“对撞机”),并且该对撞机确定/反映了选择的基础时,就会发生选择对撞机偏差。分析。尽管在选择受到限制的任何地方(例如,针对受感染/住院患者的分析),其影响将最为明显,但它可能会影响所有不完全具有代表性的样本。在以下情况下,未经调整的混杂偏差会破坏两个变量之间的估计(真实)因果关系:变量具有一个(或多个)共同原因;尽管在选择受到限制的任何地方(例如,针对受感染/住院患者的分析),其影响将最为明显,但它可能会影响所有不完全具有代表性的样本。在以下情况下,未经调整的混杂偏差会破坏两个变量之间的估计(真实)因果关系:变量具有一个(或多个)共同原因;尽管在选择受到限制的任何地方(例如,针对受感染/住院患者的分析),其影响将最为明显,但它可能会影响所有不完全具有代表性的样本。在以下情况下,未经调整的混杂偏差会破坏两个变量之间的估计(真实)因果关系:变量具有一个(或多个)共同原因;以及在分析中针对这些原因的影响进行调整时(例如,每当混杂因素未知/无法衡量时)。在以下情况下可能会产生类似的推论性偏见:一个(或多个)变量(或“调解人”)落在所考察的两个变量之间的因果关系上(即,当此类调解人是一个变量引起而又是另一个变量的原因时)); 当这些介质在分析中进行了调整。在以下情况下,这种调整很普遍:调解员被误认为混杂因素;错误地将预测模型用于因果推理;或使用调解人调整来估计直接和间接的因果关系(在“调解分析”中的错误尝试)。这三个偏见对于流行病学内部持续存在的和尚未解决的认识论紧张至关重要。所有这些对于我们对COVID-19的理解以及人工智能在类似现象的“数据驱动”建模中的未来应用都具有实质性意义。尽管如此,经过有效应用和仔细解释的多变量统计模型仍可能提供足够的机制和背景信息,以更准确地预测未来疾病暴发。

更新日期:2020-11-25
down
wechat
bug