论文标题
可解释的滥用检测作为意图分类和插槽填充
Explainable Abuse Detection as Intent Classification and Slot Filling
论文作者
论文摘要
为了主动为社交媒体用户提供安全的在线体验,需要系统可以检测有害帖子并立即提醒平台主持人。为了确保执行一致的政策,为主持人提供了详细的指南。相比之下,大多数最先进的模型从标记的示例中学习了什么滥用,因此它们的预测基于虚假提示,例如存在组识别符,这可能是不可靠的。在这项工作中,我们介绍了政策意识到的滥用检测的概念,放弃了不现实的期望,即系统可以可靠地了解哪种现象构成单独检查数据构成滥用。我们通过将其分解为意图和插槽的集合来建议主持人希望执行的政策的机器友好表示。我们收集并注释了3,535个具有此类插槽的英文帖子的数据集,并展示了如何将意图分类和插槽填充的体系结构用于滥用检测,同时为模型决策提供了理由。
To proactively offer social media users a safe online experience, there is a need for systems that can detect harmful posts and promptly alert platform moderators. In order to guarantee the enforcement of a consistent policy, moderators are provided with detailed guidelines. In contrast, most state-of-the-art models learn what abuse is from labelled examples and as a result base their predictions on spurious cues, such as the presence of group identifiers, which can be unreliable. In this work we introduce the concept of policy-aware abuse detection, abandoning the unrealistic expectation that systems can reliably learn which phenomena constitute abuse from inspecting the data alone. We propose a machine-friendly representation of the policy that moderators wish to enforce, by breaking it down into a collection of intents and slots. We collect and annotate a dataset of 3,535 English posts with such slots, and show how architectures for intent classification and slot filling can be used for abuse detection, while providing a rationale for model decisions.