论文标题
无监督的口头发现在未转录的演讲中
Unsupervised Spoken Term Discovery on Untranscribed Speech
论文作者
论文摘要
(摘要的一部分)在本文中,我们研究了无监督的口头发现在解决此问题中的使用。无监督的口头发现旨在在语音中发现与主题相关的术语,而不知道语言和内容的语音属性。它可以进一步分为两个部分:声段建模(ASM)和无监督的模式发现。 ASM了解零资源语言音频的语音结构,而没有语音知识可用,从而产生了自源性的“音素”。音频用这些“音素”标记以获得“音素”序列。无监督的模式发现搜索“音素”序列中的重复模式。可以将发现的模式分组以确定音频的关键字。具有瓶颈层的多语言神经网络用于特征提取。实验表明,与MFCC等传统功能相比,瓶颈功能有助于ASM的训练。无监督的口语术语发现系统进行了在线讲座的实验,其中涵盖了不同扬声器的不同主题。结果表明,系统可以学习语言的语音信息,并可以发现与文本转录一致的频繁口语。通过使用信息检索技术,例如嵌入和TFIDF,可以表明,发现的关键字可以进一步用于主题比较。
(Part of the abstract) In this thesis, we investigate the use of unsupervised spoken term discovery in tackling this problem. Unsupervised spoken term discovery aims to discover topic-related terminologies in a speech without knowing the phonetic properties of the language and content. It can be further divided into two parts: Acoustic segment modelling (ASM) and unsupervised pattern discovery. ASM learns the phonetic structures of zero-resource language audio with no phonetic knowledge available, generating self-derived "phonemes". The audio are labelled with these "phonemes" to obtain "phoneme" sequences. Unsupervised pattern discovery searches for repetitive patterns in the "phoneme" sequences. The discovered patterns can be grouped to determine the keywords of the audio. Multilingual neural network with bottleneck layer is used for feature extraction. Experiments show that bottleneck features facilitate the training of ASM compared to conventional features such as MFCC. The unsupervised spoken term discovery system is experimented with online lectures covering different topics by different speakers. It is shown that the system learns the phonetic information of the language and can discover frequent spoken terms that align with text transcription. By using information retrieval technology such as word embedding and TFIDF, it is shown that the discovered keywords can be further used for topic comparison.