在语音中将内容与说话者身份分开以评估认知障碍

论文标题

在语音中将内容与说话者身份分开以评估认知障碍

Separating Content from Speaker Identity in Speech for the Assessment of Cognitive Impairments

论文作者

Heo, Dongseok, Park, Cheul Young, Cheun, Jaemin, Ko, Myung Jin

论文摘要

除了最初的说话者验证目的外，还显示出深扬声器的嵌入有效评估认知障碍。但是，该研究发现，说话者的嵌入者编码说话者的身份和一系列信息，包括说话者人口统计学，例如性别和年龄，以及在一定程度上的语音内容，这些信息在评估认知障碍时是已知的混杂因素。在本文中，我们假设使用语音转换框架与说话者身份分开的内容信息对于评估认知障碍和培训简单分类器的框架更有效，以便对Dementiabank Pitt Pitt Corpus进行比较分析。我们的结果表明，虽然内容嵌入比扬声器嵌入定义问题具有优势，但进一步的实验表明，由于用于提取内容的体系结构的固有设计，其有效性取决于说话者嵌入式中编码的信息。

Deep speaker embeddings have been shown effective for assessing cognitive impairments aside from their original purpose of speaker verification. However, the research found that speaker embeddings encode speaker identity and an array of information, including speaker demographics, such as sex and age, and speech contents to an extent, which are known confounders in the assessment of cognitive impairments. In this paper, we hypothesize that content information separated from speaker identity using a framework for voice conversion is more effective for assessing cognitive impairments and train simple classifiers for the comparative analysis on the DementiaBank Pitt Corpus. Our results show that while content embeddings have an advantage over speaker embeddings for the defined problem, further experiments show their effectiveness depends on information encoded in speaker embeddings due to the inherent design of the architecture used for extracting contents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题