当前位置: X-MOL 学术Commun. Stat. Simul. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparison of correlation measures for nominal data
Communications in Statistics - Simulation and Computation ( IF 0.9 ) Pub Date : 2021-01-15 , DOI: 10.1080/03610918.2020.1869984
Tanweer Ul Islam 1 , Mahvish Rizwan 1
Affiliation  

Abstract

In social sciences, a plethora of studies utilize nominal data to establish the relationship between the variables. This, in turn, requires the correct use of correlation technique. The choice of correlation technique depends upon the underlying assumptions and power of the test of significance. The objective of the research is to explore the best measure of association for nominal data in terms of size, power and bias in estimation. Monte Carlo simulations reveal that the Phi and Pearson correlation statistics performs equally well in terms of size, power, and bias for naturally dichotomous variables. When both variables are artificially dichotomized, the Tetrachoric statistic has an edge in terms of bias to Pearson correlation statistic. If one variable is continuous and other is artificially dichotomized, the Biserial correlation measure turns out to be less biased as compared to Pearson statistic although both statistics exhibit similar power and size properties. If one variable is continuous and other is naturally dichotomized, it is hard to choose between the Point Biserial and Pearson correlation measures. Finally, if one variable is naturally dichotomous and other is artificially dichotomized, correlation coefficient V is compared with Pearson, Phi and Tetrachoric correlation techniques in terms of bias in estimate. The results indicate that the Tetrachoric statistic considerably overestimates the correlation value against non-normal distributions. Pearson and Phi correlation slightly underestimate the correlation value. In contrast, the correlation statistic V perform well.



中文翻译:

名义数据相关性度量的比较

摘要

在社会科学中,大量研究利用名义数据来建立变量之间的关系。反过来,这需要正确使用相关技术。相关技术的选择取决于基本假设和显着性检验的功效。该研究的目的是探索名义数据在大小、功效和估计偏差方面的最佳关联度量。Monte Carlo 模拟表明,对于自然二分变量,Phi 和 Pearson 相关统计在大小、功效和偏差方面表现同样出色。当两个变量都被人为地二分时,Tetrachoric 统计量在对 Pearson 相关统计量的偏差方面具有优势。如果一个变量是连续的,而另一个是人为二分的,与 Pearson 统计量相比,Biserial 相关性度量的偏差较小,尽管这两种统计量都表现出相似的功效和大小属性。如果一个变量是连续的,而另一个是自然二分的,则很难在点双列和皮尔逊相关度量之间进行选择。最后,如果一个变量是自然二分法而另一个是人为二分法,则相关系数V与 Pearson、Phi 和 Tetrachoric 相关技术在估计偏差方面进行了比较。结果表明,Tetrachoric 统计量大大高估了与非正态分布的相关值。Pearson 和 Phi 相关性略微低估了相关性值。相比之下,相关统计量V表现良好。

更新日期:2021-01-15
down
wechat
bug