DFEE：交互式数据流执行和评估套件

论文标题

DFEE：交互式数据流执行和评估套件

DFEE: Interactive DataFlow Execution and Evaluation Kit

论文作者

He, Han, Feng, Song, Bonadiman, Daniele, Zhang, Yi, Mansour, Saab

论文摘要

由于对话任务的表达语义表示，数据流一直是用于构建面向任务的聊天机器人的新范式。尽管有大型数据集SMCALFLOW和简化的语法可用，但由于系统的复杂性和缺乏下游工具链，基于数据流的聊天机器人的开发和评估仍然具有挑战性。在此演示中，我们提出了DFEE，这是一个交互式数据流执行和评估工具包，该工具包支持对话输入和后端数据库的语义解析器的执行，可视化和基准测试。我们通过复杂的对话框任务来演示系统：涉及时间推理的事件调度。它还支持通过友好的接口诊断解析结果，该接口使开发人员可以检查动态数据流和相应的执行结果。为了说明如何进行基准SOTA模型，我们提出了一个新颖的基准测试，涵盖了更复杂的事件调度方案和关于任务成功评估的新指标。 DFEE的代码已在https://github.com/amazonscience/dataflow-evaluation-toolkit上发布。

DataFlow has been emerging as a new paradigm for building task-oriented chatbots due to its expressive semantic representations of the dialogue tasks. Despite the availability of a large dataset SMCalFlow and a simplified syntax, the development and evaluation of DataFlow-based chatbots remain challenging due to the system complexity and the lack of downstream toolchains. In this demonstration, we present DFEE, an interactive DataFlow Execution and Evaluation toolkit that supports execution, visualization and benchmarking of semantic parsers given dialogue input and backend database. We demonstrate the system via a complex dialog task: event scheduling that involves temporal reasoning. It also supports diagnosing the parsing results via a friendly interface that allows developers to examine dynamic DataFlow and the corresponding execution results. To illustrate how to benchmark SoTA models, we propose a novel benchmark that covers more sophisticated event scheduling scenarios and a new metric on task success evaluation. The codes of DFEE have been released on https://github.com/amazonscience/dataflow-evaluation-toolkit.

下载PDF全文

下载文献需遵守相关版权规定

论文标题