论文标题

魔鬼在详细信息中:关于模型和培训制度,用于几次意图分类

The Devil is in the Details: On Models and Training Regimes for Few-Shot Intent Classification

论文作者

Mesgar, Mohsen, Tran, Thy Thy, Glavas, Goran, Gurevych, Iryna

论文摘要

在模块化的面向任务对话系统中,很少有射击意图分类(FSIC)是主要挑战之一。虽然使用验证的语言模型来编码文本和基于邻居的分类推断时,高级FSIC方法在详细信息上有所不同。它们从不同审慎的文本编码器开始,使用具有不同相似性功能的不同编码体系结构,并采用不同的培训制度。结合这些主要是独立的设计决策以及缺乏伴随的消融研究是确定推动报告FSIC性能的因素的巨大障碍。我们在三个关键维度上研究这些细节:(1)编码体系结构:跨编码器与双重编码器; (2)相似性函数:参数化(即可训练)功能与非参数函数; (3)训练制度:情节元学习与直接(即非剧本)培训。我们对七个FSIC基准测试的实验结果揭示了三个重要发现。首先,跨编码器体系结构(具有参数化相似性评分函数)和情节元学习的未探索组合始终产生最佳的FSIC性能。其次,情节训练产生的FSIC分类器比非剧本分类器更强大。第三,在元学习方法中,将情节拆分以支持和查询集不是必须的。我们的发现为进行FSIC进行最先进的研究铺平了道路,更重要的是将社区对FSIC方法的细节的关注。我们公开发布代码和数据。

Few-shot Intent Classification (FSIC) is one of the key challenges in modular task-oriented dialog systems. While advanced FSIC methods are similar in using pretrained language models to encode texts and nearest neighbour-based inference for classification, these methods differ in details. They start from different pretrained text encoders, use different encoding architectures with varying similarity functions, and adopt different training regimes. Coupling these mostly independent design decisions and the lack of accompanying ablation studies are big obstacle to identify the factors that drive the reported FSIC performance. We study these details across three key dimensions: (1) Encoding architectures: Cross-Encoder vs Bi-Encoders; (2) Similarity function: Parameterized (i.e., trainable) functions vs non-parameterized function; (3) Training regimes: Episodic meta-learning vs the straightforward (i.e., non-episodic) training. Our experimental results on seven FSIC benchmarks reveal three important findings. First, the unexplored combination of the cross-encoder architecture (with parameterized similarity scoring function) and episodic meta-learning consistently yields the best FSIC performance. Second, Episodic training yields a more robust FSIC classifier than non-episodic one. Third, in meta-learning methods, splitting an episode to support and query sets is not a must. Our findings paves the way for conducting state-of-the-art research in FSIC and more importantly raise the community's attention to details of FSIC methods. We release our code and data publicly.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源