确定人类的策略来产生单词级别的对手示例

论文标题

确定人类的策略来产生单词级别的对手示例

Identifying Human Strategies for Generating Word-Level Adversarial Examples

论文作者

Mozes, Maximilian, Kleinberg, Bennett, Griffin, Lewis D.

论文摘要

NLP中的对抗例子正在受到越来越多的研究注意力。一条调查是针对维护自然性和语法性的微调变压器模型的单词级对抗示例的产生。先前的工作发现，人类和机器生成的对抗例子的自然性和语法正确性是可比的。最值得注意的是，人类能够比自动攻击更轻松地产生对抗性例子。在本文中，我们提供了有关人类如何创建这些对抗性例子的详细分析。通过在生成过程中探索人类工人的行为模式，我们根据人类更喜欢选择对抗性替代品（例如，单词频率，单词saliencies，情感）以及在输入序列中替换单词以及何时替换单词的单词来确定具有统计学意义的趋势。有了我们的发现，我们试图激发人们的努力，以利用人类战略来实现更强大的NLP模型。

Adversarial examples in NLP are receiving increasing research attention. One line of investigation is the generation of word-level adversarial examples against fine-tuned Transformer models that preserve naturalness and grammaticality. Previous work found that human- and machine-generated adversarial examples are comparable in their naturalness and grammatical correctness. Most notably, humans were able to generate adversarial examples much more effortlessly than automated attacks. In this paper, we provide a detailed analysis of exactly how humans create these adversarial examples. By exploring the behavioural patterns of human workers during the generation process, we identify statistically significant tendencies based on which words humans prefer to select for adversarial replacement (e.g., word frequencies, word saliencies, sentiment) as well as where and when words are replaced in an input sequence. With our findings, we seek to inspire efforts that harness human strategies for more robust NLP models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题