论文标题
带回归目标的动态语音终点检测
Dynamic Speech Endpoint Detection with Regression Targets
论文作者
论文摘要
在各种情况下,交互式语音助手已被广泛用作输入界面,例如在智能家居设备,可穿戴设备和AR设备上。检测语音查询的结尾,即语音终点,是语音助手与用户互动的重要任务。传统上,语音端点基于纯分类方法以及任意二进制目标。在本文中,我们提出了一个基于回归的新型语音终点模型,该模型使一个终端排列能够根据用户查询的上下文调整其检测行为。具体而言,我们提出了一种暂停建模方法,并显示了其动态终点的有效性。基于我们对供应商收集的智能手机和可穿戴设备查询的实验,与传统的基于分类的方法相比,我们的策略在端点延迟和准确性之间取得了更好的权衡。我们进一步讨论了该模型的好处以及本文中框架的概括。
Interactive voice assistants have been widely used as input interfaces in various scenarios, e.g. on smart homes devices, wearables and on AR devices. Detecting the end of a speech query, i.e. speech end-pointing, is an important task for voice assistants to interact with users. Traditionally, speech end-pointing is based on pure classification methods along with arbitrary binary targets. In this paper, we propose a novel regression-based speech end-pointing model, which enables an end-pointer to adjust its detection behavior based on context of user queries. Specifically, we present a pause modeling method and show its effectiveness for dynamic end-pointing. Based on our experiments with vendor-collected smartphone and wearables speech queries, our strategy shows a better trade-off between endpointing latency and accuracy, compared to the traditional classification-based method. We further discuss the benefits of this model and generalization of the framework in the paper.