比在坏公司中同意更好的是：对许多类似Kappa的测试的批判性分析评估一百万2x2应急表

论文标题

比在坏公司中同意更好的是：对许多类似Kappa的测试的批判性分析评估一百万2x2应急表

Better to be in agreement than in bad company: a critical analysis of many kappa-like tests assessing one-million 2x2 contingency tables

论文作者

Silveira, Paulo Sergio Panse, Siqueira, Jose Oliveira

论文摘要

我们评估了在2x2应急表中应用的几个协议系数，这些系数通常是由于受试者（例如，男性或女性）的条件或分类的便利性（例如，导致健康或疾病或非现成的，暴露或非暴露等）等分类的诊断（例如，传统阈值）在研究中应用的。更极端的表配置（例如，评估者之间的高度一致性）也是通常的，但是某些系数在表格上存在问题。在这里，我们不仅研究了一些特定的估计器，而且还为任何估计量候选者提供了一种协议度量，还为研究开发了一种一般方法。该方法是在开源R代码中开发的，它对研究人员来说是可以避免的。在这里，我们通过验证所有1,028,789个桌子的表现，大小从1到68。 Shankar和Bangdiwala的B在所有中立的情况下以及评估者之间存在更大的分歧时被误认为。 Dice的F1和McNemar的卡方不完全评估了应急表的信息，显示了所有人之间最差的性能。我们得出的结论是，Holley和Guilford的G是最佳协议估算器，紧随其后的是GWET的AC1，应将其视为应急2x2表中一致性测量的第一个选择。所有过程和数据均在R中实现，可从https://sourceforge.net/projects/tables2x2下载。

We assessed several agreement coefficients applied in 2x2 contingency tables, which are commonly applied in research due to dicotomization by the conditions of the subjects (e.g., male or female) or by conveniency of the classification (e.g., traditional thresholds leading to separations in healthy or diseased, exposed or non-exposed, etc.). More extreme table configurations (e.g., high agreement between raters) are also usual, but some of the coefficients have problems with imbalanced tables. Here, we not only studied some especific estimators, but also developed a general method to the study for any estimator candidate to be an agreement measurement. This method was developed in open source R codes and it is avaliable to the researchers. Here, we tested this method by verifying the performance of several traditional estimators over all 1,028,789 tables with size ranging from 1 to 68. Cohen's kappa showed handicapped behavior similar to Pearson's r, Yule's Q, and Yule's Y. Scott's pi has ambiguity to assess situations of agreement between raters. Shankar and Bangdiwala's B was mistaken in all situations of neutrality and when there is greater disagreement between raters. Dice's F1 and McNemar's chi-squared incompletely assess the information of the contingency table, showing the poorest performance among all. We concluded that Holley and Guilford's G is the best agreement estimator, closely followed by Gwet's AC1 and they should be considered as the first choices for agreement measurement in contingency 2x2 tables. All procedures and data were implemented in R and are available to download from https://sourceforge.net/projects/tables2x2.

下载PDF全文

下载文献需遵守相关版权规定

论文标题