Comparison of different rating scales for the use in Delphi studies: different scales lead to different consensus and show different test-retest reliability.,BMC Medical Research Methodology

当前位置： X-MOL 学术 › BMC Med. Res. Methodol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Comparison of different rating scales for the use in Delphi studies: different scales lead to different consensus and show different test-retest reliability.
BMC Medical Research Methodology ( IF 4 ) Pub Date : 2020-02-10 , DOI: 10.1186/s12874-020-0912-8
Toni Lange _{1,

2} , Christian Kopkow _{1,

3} , Jörg Lützner ₂ , Klaus-Peter Günther ₂ , Sascha Gravius ₄ , Hanns-Peter Scharf ₄ , Johannes Stöve ₅ , Richard Wagner ₆ , Jochen Schmitt ₁

Affiliation

BACKGROUND Consensus-orientated Delphi studies are increasingly used in various areas of medical research using a variety of different rating scales and criteria for reaching consensus. We explored the influence of using three different rating scales and different consensus criteria on the results for reaching consensus and assessed the test-retest reliability of these scales within a study aimed at identification of global treatment goals for total knee arthroplasty (TKA). METHODS We conducted a two-stage study consisting of two surveys and consecutively included patients scheduled for TKA from five German hospitals. Patients were asked to rate 19 potential treatment goals on different rating scales (three-point, five-point, nine-point). Surveys were conducted within a 2 week period prior to TKA, order of questions (scales and treatment goals) was randomized. RESULTS Eighty patients (mean age 68 ± 10 years; 70% females) completed both surveys. Different rating scales (three-point, five-point and nine-point rating scale) lead to different consensus despite moderate to high correlation between rating scales (r = 0.65 to 0.74). Final consensus was highly influenced by the choice of rating scale with 14 (three-point), 6 (five-point), 15 (nine-point) out of 19 treatment goals reaching the pre-defined 75% consensus threshold. The number of goals reaching consensus also highly varied between rating scales for other consensus thresholds. Overall, concordance differed between the three-point (percent agreement [p] = 88.5%, weighted kappa [k] = 0.63), five-point (p = 75.3%, k = 0.47) and nine-point scale (p = 67.8%, k = 0.78). CONCLUSION This study provides evidence that consensus depends on the rating scale and consensus threshold within one population. The test-retest reliability of the three rating scales investigated differs substantially between individual treatment goals. This variation in reliability can become a potential source of bias in consensus studies. In our setting aimed at capturing patients' treatment goals for TKA, the three-point scale proves to be the most reasonable choice, as its translation into the clinical context is the most straightforward among the scales. Researchers conducting Delphi studies should be aware that final consensus is substantially influenced by the choice of rating scale and consensus criteria.

中文翻译：

在Delphi研究中使用不同等级量表的比较：不同量表导致不同的共识，并显示出不同的重测信度。

背景技术以共识为导向的Delphi研究越来越多地用于医学研究的各个领域，其使用各种不同的等级量表和标准来达成共识。我们探讨了使用三种不同的评分量表和不同的共识标准对达成共识的结果的影响，并在旨在确定全膝关节置换术（TKA）总体治疗目标的研究中评估了这些量表的重测信度。方法我们进行了一项由两个调查组成的两阶段研究，并连续纳入了来自德国五家医院的TKA计划患者。要求患者根据不同的评分标准（三分，五分，九分）对19个潜在的治疗目标进行评分。在进行TKA之前的2周内进行了调查，问题的顺序（量表和治疗目标）是随机的。结果80名患者（平均年龄68±10岁；女性70％）完成了两项调查。尽管评级量表之间存在中度到高度的相关性（r = 0.65至0.74），但不同的评级量表（三分，五分和九分评级量表）会导致不同的共识。19个治疗目标中达到预定值75％共识阈值的14个（三分），6个（五分），15个（九分）的评分量表对最终的共识有很大影响。达成共识的目标数量在其他共识阈值的评级量表之间也存在很大差异。总体而言，三点（一致性百分比[p] = 88.5％，加权kappa [k] = 0.63），五点（p = 75.3％，k = 0.47）和九点量表（p = 67.8）之间的一致性有所不同％，k = 0.78）。结论本研究提供了证据，表明共识取决于一个人群中的评分量表和共识阈值。所研究的三个等级量表的重测信度在各个治疗目标之间存在很大差异。可靠性的这种差异可能会成为共识研究中潜在的偏见来源。在我们旨在捕获患者的TKA治疗目标的环境中，三点量表被证明是最合理的选择，因为将三点量表转换为临床背景是最简单的量表。进行Delphi研究的研究人员应意识到，最终的共识很大程度上取决于评级量表和共识标准的选择。所研究的三个等级量表的重测信度在各个治疗目标之间存在很大差异。可靠性的这种差异可能会成为共识研究中潜在的偏见来源。在我们旨在捕获患者的TKA治疗目标的环境中，三点量表被证明是最合理的选择，因为将三点量表转换为临床背景是最简单的量表。进行Delphi研究的研究人员应意识到，最终的共识很大程度上取决于评级量表和共识标准的选择。三种治疗量表的重测信度在各个治疗目标之间存在很大差异。可靠性的这种差异可能会成为共识研究中潜在的偏见来源。在我们旨在捕获患者的TKA治疗目标的背景下，三点量表被证明是最合理的选择，因为将三点量表转换为临床背景是最简单的量表。进行Delphi研究的研究人员应该意识到，最终的共识很大程度上取决于评级量表和共识标准的选择。对于TKA的治疗目标，三点量表被证明是最合理的选择，因为将其转换为临床背景是量表中最直接的。进行Delphi研究的研究人员应意识到，最终的共识很大程度上取决于评级量表和共识标准的选择。对于TKA的治疗目标，三点量表被证明是最合理的选择，因为将其转换为临床背景是量表中最直接的。进行Delphi研究的研究人员应意识到，最终的共识很大程度上取决于评级量表和共识标准的选择。

更新日期：2020-02-10

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>