论文标题

关于可疑的巧合和偶然的互信息

On Suspicious Coincidences and Pointwise Mutual Information

论文作者

Williams, Christopher K. I.

论文摘要

Barlow(1985)假设,如果$ p(a,b)\ gg p(a)p(a)p(b)$,两个事件$ a $和$ b $的共发生是“可疑的”。我们首先以$ 2 \ times 2 $的应急表(包括Yule的$ Y $(Yule,1912))审查经典的关联度量,该表仅取决于$λ$的几率,并且独立于表格的边际概率。然后,我们讨论相互信息(MI)和点相互信息(PMI),该信息取决于$ p(a,b)/p(a)p(b)$作为关联度量。我们表明,一旦删除了边缘的效果,MI和PMI的行为与$ Y $相似,作为$λ$的功能。偶然的相互信息在某些研究社区中广泛使用,以标记可疑的巧合,但重要的是要牢记PMI对边际的敏感性,而对稀疏事件的分数提高了。

Barlow (1985) hypothesized that the co-occurrence of two events $A$ and $B$ is "suspicious" if $P(A,B) \gg P(A) P(B)$. We first review classical measures of association for $2 \times 2$ contingency tables, including Yule's $Y$ (Yule, 1912), which depends only on the odds ratio $λ$, and is independent of the marginal probabilities of the table. We then discuss the mutual information (MI) and pointwise mutual information (PMI), which depend on the ratio $P(A,B)/P(A)P(B)$, as measures of association. We show that, once the effect of the marginals is removed, MI and PMI behave similarly to $Y$ as functions of $λ$. The pointwise mutual information is used extensively in some research communities for flagging suspicious coincidences, but it is important to bear in mind the sensitivity of the PMI to the marginals, with increased scores for sparser events.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源