使用概率自适应指标无监督发现反复出现的语音模式

论文标题

使用概率自适应指标无监督发现反复出现的语音模式

Unsupervised Discovery of Recurring Speech Patterns Using Probabilistic Adaptive Metrics

论文作者

Räsänen, Okko, Blandón, María Andrea Cruz

论文摘要

无监督的口头发现（UTD）旨在从声音语音数据的语料库中找到反复出现的语音段。解决此问题的一种潜在方法是使用动态时间扭曲（DTW）从语音数据中找到良好的模式。但是，在这些初始候选片段中自动选择DTW对齐和检测到这些阶段的“足够好”对齐的对齐需要某种类型的预定标准，这通常是作为信号表示之间的成对距离指标的阈值参数的操作。在现有的UTD系统中，最佳的超参数可能会在数据集中有所不同，从而将其适用性限制在新的Corpora和真正的低资源场景中。在本文中，我们提出了一种新颖的概率方法，用于基于DTW的UTD，名为PDTW。在PDTW中，使用处理后的语料库的分布特性用于自适应评估对齐质量，从而实现了具有相似性的模式对的系统发现。我们在2017年零资源语音挑战数据集上测试PDTW，作为2020年实施挑战的一部分。结果表明，该系统使用固定的超参数在所有五种测试的语言上都持续执行，从而在检测模式的覆盖范围内显然超过了早期的基于DTW的系统。

Unsupervised spoken term discovery (UTD) aims at finding recurring segments of speech from a corpus of acoustic speech data. One potential approach to this problem is to use dynamic time warping (DTW) to find well-aligning patterns from the speech data. However, automatic selection of initial candidate segments for the DTW-alignment and detection of "sufficiently good" alignments among those require some type of pre-defined criteria, often operationalized as threshold parameters for pair-wise distance metrics between signal representations. In the existing UTD systems, the optimal hyperparameters may differ across datasets, limiting their applicability to new corpora and truly low-resource scenarios. In this paper, we propose a novel probabilistic approach to DTW-based UTD named as PDTW. In PDTW, distributional characteristics of the processed corpus are utilized for adaptive evaluation of alignment quality, thereby enabling systematic discovery of pattern pairs that have similarity what would be expected by coincidence. We test PDTW on Zero Resource Speech Challenge 2017 datasets as a part of 2020 implementation of the challenge. The results show that the system performs consistently on all five tested languages using fixed hyperparameters, clearly outperforming the earlier DTW-based system in terms of coverage of the detected patterns.

下载PDF全文

下载文献需遵守相关版权规定

论文标题