嵌入单词何时准确地反映出我们对人的信念的调查？

论文标题

嵌入单词何时准确地反映出我们对人的信念的调查？

When do Word Embeddings Accurately Reflect Surveys on our Beliefs About People?

论文作者

Joseph, Kenneth, Morgan, Jonathan H.

论文摘要

社会偏见是用单词嵌入编码的。这为在历史上和大规模研究社会研究中提供了独特的机会，并在下游应用中使用嵌入时会产生独特的危险。在这里，我们研究了通过传统调查方法衡量的某些类型的人的公开单词嵌入的程度。我们发现，在单词嵌入中发现的偏见平均而言，在社会含义的十七个维度上，通常会紧密地镜像调查数据。但是，我们还发现，嵌入中的偏见比其他（例如种族）的含义（例如性别）的调查数据更大，并且我们可以高度相信，基于嵌入的措施仅反映了最明显偏见的调查数据。

Social biases are encoded in word embeddings. This presents a unique opportunity to study society historically and at scale, and a unique danger when embeddings are used in downstream applications. Here, we investigate the extent to which publicly-available word embeddings accurately reflect beliefs about certain kinds of people as measured via traditional survey methods. We find that biases found in word embeddings do, on average, closely mirror survey data across seventeen dimensions of social meaning. However, we also find that biases in embeddings are much more reflective of survey data for some dimensions of meaning (e.g. gender) than others (e.g. race), and that we can be highly confident that embedding-based measures reflect survey data only for the most salient biases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题