A Comparison of Reliability Coefficients for Ordinal Rating Scales,Journal of Classification

当前位置： X-MOL 学术 › J. Classif. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Comparison of Reliability Coefficients for Ordinal Rating Scales
Journal of Classification ( IF 2 ) Pub Date : 2021-04-22 , DOI: 10.1007/s00357-021-09386-5
Alexandra de Raadt , Matthijs J. Warrens , Roel J. Bosker , Henk A. L. Kiers

Kappa coefficients are commonly used for quantifying reliability on a categorical scale, whereas correlation coefficients are commonly applied to assess reliability on an interval scale. Both types of coefficients can be used to assess the reliability of ordinal rating scales. In this study, we compare seven reliability coefficients for ordinal rating scales: the kappa coefficients included are Cohen’s kappa, linearly weighted kappa, and quadratically weighted kappa; the correlation coefficients included are intraclass correlation ICC(3,1), Pearson’s correlation, Spearman’s rho, and Kendall’s tau-b. The primary goal is to provide a thorough understanding of these coefficients such that the applied researcher can make a sensible choice for ordinal rating scales. A second aim is to find out whether the choice of the coefficient matters. We studied to what extent we reach the same conclusions about inter-rater reliability with different coefficients, and to what extent the coefficients measure agreement in a similar way, using analytic methods, and simulated and empirical data. Using analytical methods, it is shown that differences between quadratic kappa and the Pearson and intraclass correlations increase if agreement becomes larger. Differences between the three coefficients are generally small if differences between rater means and variances are small. Furthermore, using simulated and empirical data, it is shown that differences between all reliability coefficients tend to increase if agreement between the raters increases. Moreover, for the data in this study, the same conclusion about inter-rater reliability was reached in virtually all cases with the four correlation coefficients. In addition, using quadratically weighted kappa, we reached a similar conclusion as with any correlation coefficient a great number of times. Hence, for the data in this study, it does not really matter which of these five coefficients is used. Moreover, the four correlation coefficients and quadratically weighted kappa tend to measure agreement in a similar way: their values are very highly correlated for the data in this study.

中文翻译：

序数量表的信度系数比较

Kappa系数通常用于在分类量表上量化可靠性，而相关系数通常被用于评估区间量表上的可靠性。两种类型的系数都可用于评估顺序等级量表的可靠性。在这项研究中，我们比较了七个针对信度等级量表的可靠性系数：所包含的Kappa系数为Cohen's Kappa，线性加权Kappa和二次加权Kappa。所包括的相关系数是类内相关性ICC（3,1），皮尔逊相关性，斯皮尔曼的rho和肯德尔tau-b。主要目标是提供对这些系数的透彻了解，以便应用研究人员可以为序数等级量表做出明智的选择。第二个目的是找出系数的选择是否重要。我们使用分析方法以及模拟和经验数据，研究了在多大程度上得出关于不同系数的评分者间可靠性的相同结论，以及在多大程度上以相似的方式衡量了一致性。使用分析方法表明，如果一致性变大，则二次kappa与Pearson之间的差异以及类内相关性会增加。如果评估者均值和方差之间的差异较小，则三个系数之间的差异通常较小。此外，使用模拟和经验数据显示，如果评估者之间的一致性增加，则所有可靠性系数之间的差异趋于增加。此外，对于本研究中的数据，使用四个相关系数，几乎在所有情况下都得出了关于评估者间可靠性的相同结论。此外，使用二次加权的kappa，我们得出了与任何相关系数都相同的结论。因此，对于本研究中的数据，使用这五个系数中的哪一个并不重要。此外，四个相关系数和二次加权的Kappa倾向于以相似的方式衡量一致性：在本研究中，它们的值与数据高度相关。

更新日期：2021-04-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>