论文标题
comp-syn:带有颜色的感知接地的单词嵌入
comp-syn: Perceptually Grounded Word Embeddings with Color
论文作者
论文摘要
自然语言处理的流行方法创建基于文本共同出现模式的单词嵌入,但通常会忽略语言的感官方面。在这里,我们介绍了Python软件包Comp-Syn,该软件包基于Google Image搜索结果的感知均匀颜色分布提供接地的单词嵌入。我们证明,Comp-Syn显着丰富了分布语义的模型。特别是,我们表明(1)comp-syn使用低维单词颜色嵌入式的Word2vec更准确,更容易解释的方式来预测人类对单词具体性的判断,并且(2)Comp-Syn在隐喻和文字文字分类任务上与Word2Vec相当地执行。 Comp-Syn是PYPI上的开源,与主流机器学习Python软件包兼容。我们的包装发布包括用于40,000多个英语单词的文字颜色嵌入,每个单词都与众源单词的具体判断有关。
Popular approaches to natural language processing create word embeddings based on textual co-occurrence patterns, but often ignore embodied, sensory aspects of language. Here, we introduce the Python package comp-syn, which provides grounded word embeddings based on the perceptually uniform color distributions of Google Image search results. We demonstrate that comp-syn significantly enriches models of distributional semantics. In particular, we show that (1) comp-syn predicts human judgments of word concreteness with greater accuracy and in a more interpretable fashion than word2vec using low-dimensional word-color embeddings, and (2) comp-syn performs comparably to word2vec on a metaphorical vs. literal word-pair classification task. comp-syn is open-source on PyPi and is compatible with mainstream machine-learning Python packages. Our package release includes word-color embeddings for over 40,000 English words, each associated with crowd-sourced word concreteness judgments.