带回归目标的动态语音终点检测

论文标题

带回归目标的动态语音终点检测

Dynamic Speech Endpoint Detection with Regression Targets

论文作者

Liang, Dawei, Su, Hang, Singh, Tarun, Mahadeokar, Jay, Puri, Shanil, Zhu, Jiedan, Thomaz, Edison, Seltzer, Mike

论文摘要

在各种情况下，交互式语音助手已被广泛用作输入界面，例如在智能家居设备，可穿戴设备和AR设备上。检测语音查询的结尾，即语音终点，是语音助手与用户互动的重要任务。传统上，语音端点基于纯分类方法以及任意二进制目标。在本文中，我们提出了一个基于回归的新型语音终点模型，该模型使一个终端排列能够根据用户查询的上下文调整其检测行为。具体而言，我们提出了一种暂停建模方法，并显示了其动态终点的有效性。基于我们对供应商收集的智能手机和可穿戴设备查询的实验，与传统的基于分类的方法相比，我们的策略在端点延迟和准确性之间取得了更好的权衡。我们进一步讨论了该模型的好处以及本文中框架的概括。

Interactive voice assistants have been widely used as input interfaces in various scenarios, e.g. on smart homes devices, wearables and on AR devices. Detecting the end of a speech query, i.e. speech end-pointing, is an important task for voice assistants to interact with users. Traditionally, speech end-pointing is based on pure classification methods along with arbitrary binary targets. In this paper, we propose a novel regression-based speech end-pointing model, which enables an end-pointer to adjust its detection behavior based on context of user queries. Specifically, we present a pause modeling method and show its effectiveness for dynamic end-pointing. Based on our experiments with vendor-collected smartphone and wearables speech queries, our strategy shows a better trade-off between endpointing latency and accuracy, compared to the traditional classification-based method. We further discuss the benefits of this model and generalization of the framework in the paper.

下载PDF全文

下载文献需遵守相关版权规定

论文标题