论文标题
Multiqt:在语音中进行实时问题跟踪的多模式学习
MultiQT: Multimodal Learning for Real-Time Question Tracking in Speech
论文作者
论文摘要
我们解决了一项具有挑战性且实用的任务,即在用英语打电话给紧急医疗服务的电话中,将问题实时标记,该问题嵌入了更广泛的紧急呼叫召唤者的决策支持系统中。我们提出了一种新型的多模式方法,用于语音中的实时序列标记。我们的模型将语音及其自己的文本表示视为两种单独的方式或视图,因为它可以通过自动语音识别从流音频中学到的噪音转录及其嘈杂的转录到文本中。我们的结果表明,与仅在不利噪声和有限量的训练数据相比,与文本或音频相比,从两种方式中共同学习的显着增长。结果概括了医学症状检测,我们观察到了多模式学习的相似改进模式。
We address a challenging and practical task of labeling questions in speech in real time during telephone calls to emergency medical services in English, which embeds within a broader decision support system for emergency call-takers. We propose a novel multimodal approach to real-time sequence labeling in speech. Our model treats speech and its own textual representation as two separate modalities or views, as it jointly learns from streamed audio and its noisy transcription into text via automatic speech recognition. Our results show significant gains of jointly learning from the two modalities when compared to text or audio only, under adverse noise and limited volume of training data. The results generalize to medical symptoms detection where we observe a similar pattern of improvements with multimodal learning.