当前位置: X-MOL 学术Journal of Quantitative Linguistics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparing χ2 Tables for Separability of Distribution and Effect: Meta-Tests for Comparing Homogeneity and Goodness of Fit Contingency Test Outcomes
Journal of Quantitative Linguistics ( IF 0.7 ) Pub Date : 2018-12-17 , DOI: 10.1080/09296174.2018.1496537
Sean Wallis 1
Affiliation  

ABSTRACT

This paper describes a series of statistical meta-tests for comparing independent contingency tables for different types of significant difference. Recognizing when an experiment obtains a significantly different result and when it does not is frequently overlooked in research publication. Papers are frequently published citing ‘p values’ or test scores suggesting a ‘stronger effect’ substituting for sound statistical reasoning. This paper sets out a series of tests that together illustrate the correct approach to this question.

These meta-tests permit us to evaluate whether experiments have failed to replicate on new data; whether a particular data source or subcorpus obtains a significantly different result than another; or whether changing experimental parameters obtains a stronger effect.

The meta-tests are derived mathematically from the χ2 test and the Wilson score interval, and consist of pairwise ‘point’ tests, ‘homogeneity’ tests and ‘goodness of fit’ tests. Meta-tests for comparing tests with one degree of freedom (e.g. ‘2 × 1ʹ and ‘2 × 2ʹ tests) are generalized to those of arbitrary size. Finally, we compare our approach with a competing approach offered by Zar, which, while straightforward to calculate, turns out to be both less powerful and less robust. (Note: A spreadsheet including all the tests in this paper is publicly available at www.ucl.ac.uk/english-usage/statspapers/2x2-x2-separability.xls.)



中文翻译:

χ比较2个的均匀性和适合应急测试结果的比较善良元测试:桌分布的可分性和效果

摘要

本文介绍了一系列统计元检验,用于比较不同类型的显着差异的独立列联表。在研究出版物中经常忽略识别何时获得了明显不同的结果以及何时没有获得明显不同的结果。论文经常被引用为“ p值”或测试成绩,表明用“更强的效果”代替了合理的统计推理。本文提出了一系列测试,共同说明了解决此问题的正确方法。

这些元测试使我们能够评估实验是否无法在新数据上进行复制;一个特定的数据源或子集是否获得与另一个获得显着不同的结果;或更改实验参数是否可获得更强的效果。

所述元测试从χ数学推导2测试和威尔逊得分间隔,并且由成对的“点”测试,“均匀性”测试和测试“的拟合优度”。用于比较具有一个自由度的测试的元测试(例如“ 2×1ʹ”和“ 2×2ʹ测试”)被推广为任意大小的那些。最后,我们将我们的方法与Zar提供的竞争方法进行了比较,该方法虽然易于计算,但结果却既不那么强大,也不可靠。(注意:包含本文中所有测试的电子表格可在www.ucl.ac.uk/english-usage/statspapers/2x2-x2-separability.xls上公开获得。)

更新日期:2018-12-17
down
wechat
bug