论文标题
关于创建面对面多模式人机互动的语言和计算要求
On the Linguistic and Computational Requirements for Creating Face-to-Face Multimodal Human-Machine Interaction
论文作者
论文摘要
在这项研究中,人类和化身之间的对话在语言,组织和结构上进行了分析,重点是创建机器面对面的多模式接口所需的内容。我们对三十四个人类的互动进行了视频记录,对视频摘录进行了完整的语言微分析,并标记了多模式动作和事件的所有发生。将统计推断应用于数据,使我们不仅可以理解多模式动作的发生频率,还可以理解多模式事件的分布在说话者(Emitter)和听众(收件人)之间。我们还观察到每种模态的多模式出现的分布。数据表明,在面对面对话中建立了双环反馈。这使我们提出,应将对话分析(CA),认知科学和心理理论(Tom)等知识纳入描述人机多模式相互作用的知识。面对面接口需要一个额外的控制层到多模式融合层。该层必须组织对话的流程,将社会环境集成到互动中,并制定有关“什么”和“如何”进步的计划。如果我们将CA和TOM的见解纳入接口系统,最好理解这个更高的水平。
In this study, conversations between humans and avatars are linguistically, organizationally, and structurally analyzed, focusing on what is necessary for creating face-to-face multimodal interfaces for machines. We videorecorded thirty-four human-avatar interactions, performed complete linguistic microanalysis on video excerpts, and marked all the occurrences of multimodal actions and events. Statistical inferences were applied to data, allowing us to comprehend not only how often multimodal actions occur but also how multimodal events are distributed between the speaker (emitter) and the listener (recipient). We also observed the distribution of multimodal occurrences for each modality. The data show evidence that double-loop feedback is established during a face-to-face conversation. This led us to propose that knowledge from Conversation Analysis (CA), cognitive science, and Theory of Mind (ToM), among others, should be incorporated into the ones used for describing human-machine multimodal interactions. Face-to-face interfaces require an additional control layer to the multimodal fusion layer. This layer has to organize the flow of conversation, integrate the social context into the interaction, as well as make plans concerning 'what' and 'how' to progress on the interaction. This higher level is best understood if we incorporate insights from CA and ToM into the interface system.