论文标题

圆圈就像椭圆形,还是椭圆形就像圆圈?测量静态和上下文嵌入的不对称程度以及对表示学习的影响

Circles are like Ellipses, or Ellipses are like Circles? Measuring the Degree of Asymmetry of Static and Contextual Embeddings and the Implications to Representation Learning

论文作者

Zhang, Wei, Campbell, Murray, Yu, Yang, Kumaravel, Sadhana

论文摘要

人类对单词相似性的判断一直是评估单词嵌入质量的一种流行方法。但是它无法测量几何特性,例如不对称性。例如,说“椭圆类似圆圈”比“圆圈就像椭圆”更自然。从称为单词唤起实验的精神分析测试中观察到了这样的不对称性,其中一个单词被用来回忆另一个。尽管有用,但是对于测量嵌入质量的测量,这种实验数据已被显着研究。在本文中,我们使用三个众所周知的唤起数据集来获得对嵌入不对称编码的见解。我们研究静态嵌入和上下文嵌入,例如BERT。由于嵌入的动态性质,评估BERT的不对称性通常很难。因此,我们使用大量的Wikipedia上下文来探测Bert的条件概率(作为语言模型),以得出理论上合理的贝叶斯不对称评分。结果表明,上下文嵌入显示出随机性而不是静态嵌入相似性判断,同时在不对称判断上表现良好,这与其在“外部评估”(例如文本分类)上的强劲表现相吻合。不对称判断和贝叶斯方法提供了一种新的观点,可以评估上下文嵌入内在的评估,其与相似性评估的比较结束了我们的工作,讨论了当前状态和代表性学习的未来。

Human judgments of word similarity have been a popular method of evaluating the quality of word embedding. But it fails to measure the geometry properties such as asymmetry. For example, it is more natural to say "Ellipses are like Circles" than "Circles are like Ellipses". Such asymmetry has been observed from a psychoanalysis test called word evocation experiment, where one word is used to recall another. Although useful, such experimental data have been significantly understudied for measuring embedding quality. In this paper, we use three well-known evocation datasets to gain insights into asymmetry encoding of embedding. We study both static embedding as well as contextual embedding, such as BERT. Evaluating asymmetry for BERT is generally hard due to the dynamic nature of embedding. Thus, we probe BERT's conditional probabilities (as a language model) using a large number of Wikipedia contexts to derive a theoretically justifiable Bayesian asymmetry score. The result shows that contextual embedding shows randomness than static embedding on similarity judgments while performing well on asymmetry judgment, which aligns with its strong performance on "extrinsic evaluations" such as text classification. The asymmetry judgment and the Bayesian approach provides a new perspective to evaluate contextual embedding on intrinsic evaluation, and its comparison to similarity evaluation concludes our work with a discussion on the current state and the future of representation learning.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源