跨语性情感强度预测

论文标题

跨语性情感强度预测

Cross-lingual Emotion Intensity Prediction

论文作者

Alejo, Irean Navas, Badia, Toni, Barnes, Jeremy

论文摘要

情绪强度预测决定了作者在文本中表达的情感的程度或强度，从而扩展了以前的分类方法。尽管此主题的大多数以前的工作都集中在英语文本上，但其他语言也将受益于细粒度的情感分类，最好不必重新创建每种新语言中英语可用的带注释的数据的数量。因此，我们探索了西班牙和加泰罗尼亚推文中的细粒度情感检测的跨语化转移方法。为此，我们使用最佳缩放标度来注释一组西班牙语和加泰罗尼亚州推文的测试集。我们比较了六种跨语言方法，例如机器翻译和跨语性嵌入，它们对并行数据具有不同的要求 - 从数百万的并行句子到完全无监督的句子。结果表明，在此数据上，比平行数据要求低的方法比使用更多并行数据的方法要好奇地表现出色，我们通过深入的误差分析来解释。我们在\ url {https://github.com/jerbarnes/fine-graining_cross-lingual_emotion}中提供数据集和代码

Emotion intensity prediction determines the degree or intensity of an emotion that the author expresses in a text, extending previous categorical approaches to emotion detection. While most previous work on this topic has concentrated on English texts, other languages would also benefit from fine-grained emotion classification, preferably without having to recreate the amount of annotated data available in English in each new language. Consequently, we explore cross-lingual transfer approaches for fine-grained emotion detection in Spanish and Catalan tweets. To this end we annotate a test set of Spanish and Catalan tweets using Best-Worst scaling. We compare six cross-lingual approaches, e.g., machine translation and cross-lingual embeddings, which have varying requirements for parallel data -- from millions of parallel sentences to completely unsupervised. The results show that on this data, methods with low parallel-data requirements perform surprisingly better than methods that use more parallel data, which we explain through an in-depth error analysis. We make the dataset and the code available at \url{https://github.com/jerbarnes/fine-grained_cross-lingual_emotion}

下载PDF全文

下载文献需遵守相关版权规定

论文标题