论文标题
使用虚拟麦克风从单渠道教室录音中录制的扬声器诊断和识别
Speaker Diarization and Identification from Single-Channel Classroom Audio Recording Using Virtual Microphones
论文作者
论文摘要
嘈杂的录音中的演讲者身份,特别是从协作学习环境中的录音中的识别,可能极具挑战性。有必要确定同时在其他学生的小组中分别说话的个别学生。为了解决问题,我们假设每个学生组都使用一个麦克风,而无需访问以前的大型数据集进行培训。 该论文提出了一种使用与物理麦克风围绕的虚拟麦克风相关的互相关模式的扬声器识别方法。通过使用视频录制观察到的近似扬声器几何形状来模拟虚拟麦克风。这些模式是根据每个虚拟麦克风的房间脉冲响应的估计来构建的。然后使用相关模式来识别说话者。提出的方法已通过教室音频验证,并显示出大量优于Google Cloud和Amazon AWS提供的诊断服务。
Speaker identification in noisy audio recordings, specifically those from collaborative learning environments, can be extremely challenging. There is a need to identify individual students talking in small groups from other students talking at the same time. To solve the problem, we assume the use of a single microphone per student group without any access to previous large datasets for training. This dissertation proposes a method of speaker identification using cross-correlation patterns associated to an array of virtual microphones, centered around the physical microphone. The virtual microphones are simulated by using approximate speaker geometry observed from a video recording. The patterns are constructed based on estimates of the room impulse responses for each virtual microphone. The correlation patterns are then used to identify the speakers. The proposed method is validated with classroom audios and shown to substantially outperform diarization services provided by Google Cloud and Amazon AWS.