当前位置: X-MOL 学术Stat. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Measuring rater bias in diagnostic tests with ordinal ratings
Statistics in Medicine ( IF 2 ) Pub Date : 2021-05-09 , DOI: 10.1002/sim.9011
Chanmin Kim 1 , Xiaoyan Lin 2 , Kerrie P Nelson 3
Affiliation  

Diagnostic tests are frequently reliant upon the interpretation of images by skilled raters. In many clinical settings, however, the variability observed between experts' ratings plays a detrimental role in the degree of confidence in these interpretations, leading to uncertainty in the diagnostic process. For example, in breast cancer testing, radiologists interpret mammographic images, while breast biopsy results are examined by pathologists. Each of these procedures involves elements of subjectivity. We propose here a flexible two-stage Bayesian latent variable model to investigate how the skills of individual raters impact the diagnostic accuracy of image-related testing in large-scale medical testing studies. A strength of the proposed model is that the true disease status of a patient within a reasonable time frame may or may not be known. In these studies, many raters each contribute classifications on a large sample of patients using a defined ordinal grading scale, leading to a complex correlation structure between ratings. Our modeling approach considers the different sources of variability contributed by experts and patients while accounting for correlations present between ratings and patients, in contrast to currently available methods. We propose a novel measure of a rater's ability (magnifier) that, in contrast to conventional measures of sensitivity and specificity, is robust to the underlying prevalence of disease in the population, providing an alternative measure of diagnostic accuracy across patient populations. Extensive simulation studies demonstrate lower bias in estimation of parameters and measures of accuracy, and illustrate outperformance of the proposed model when compared with existing models. Receiver operator characteristic curves are derived to assess the diagnostic accuracy of individual experts and their overall performance. Our proposed modeling approach is applied to a large breast imaging study for known disease status and a uterine cancer dataset for unknown disease status.

中文翻译:

用序数评分测量诊断测试中的评分者偏差

诊断测试经常依赖于熟练的评估者对图像的解释。然而,在许多临床环境中,专家评级之间观察到的变异性对这些解释的置信度起着不利的作用,从而导致诊断过程中的不确定性。例如,在乳腺癌检测中,放射科医生解释乳房 X 线图像,而病理学家检查乳房活检结果。这些程序中的每一个都涉及主观因素。我们在这里提出了一个灵活的两阶段贝叶斯潜变量模型,以研究个体评估者的技能如何影响大规模医学测试研究中图像相关测试的诊断准确性。所提出的模型的优势在于,患者在合理时间范围内的真实疾病状态可能未知,也可能未知。在这些研究中,许多评估者都使用定义的有序分级量表对大量患者样本进行分类,从而导致分级之间的复杂相关结构。与目前可用的方法相比,我们的建模方法考虑了专家和患者贡献的不同变异来源,同时考虑了评级和患者之间存在的相关性。我们提出了一种新的评估者能力(放大镜)测量方法,与传统的敏感性和特异性测量相比,它对人群中疾病的潜在患病率是稳健的,为患者群体的诊断准确性提供了另一种测量方法。广泛的模拟研究表明参数估计和准确性测量的偏差较低,并说明与现有模型相比,所提出的模型的性能优于现有模型。接收者操作者特征曲线被推导出来评估个别专家的诊断准确性及其整体表现。我们提出的建模方法适用于已知疾病状态的大型乳房成像研究和未知疾病状态的子宫癌数据集。
更新日期:2021-07-12
down
wechat
bug