论文标题
使用扬声器提示的情感识别
Emotion Recognition Using Speaker Cues
论文作者
论文摘要
这项研究旨在使用扬声器提示来确定未知的情绪。在这项研究中,我们使用两个阶段框架确定未知情绪。第一阶段的重点是确定说出未知情绪的说话者,而下一阶段则着重于确定公认的演讲者在上一阶段所说的未知情感。该提出的框架已在每个性别的15位讲话者说出的阿拉伯语埃米拉特语音数据库中进行了评估。在这项工作中,已将MEL频率曲线系数(MFCC)用作提取的特征,并已将隐藏的Markov模型(HMM)用作分类器。我们的发现表明,基于两个阶段框架的情绪识别精度大于基于一阶段方法以及最新的分类器和模型(例如高斯混合模型(GMM),支持向量机(SVM)和向量量化(VQ))的情感识别精度。基于两阶段方法的平均情绪识别精度为67.5%,而基于单级方法,GMM,SVM和VQ的精度分别为61.4%,63.3%,64.5%和61.5%。基于两阶段框架的结果与人类听众主观评估中获得的结果非常接近。
This research aims at identifying the unknown emotion using speaker cues. In this study, we identify the unknown emotion using a two-stage framework. The first stage focuses on identifying the speaker who uttered the unknown emotion, while the next stage focuses on identifying the unknown emotion uttered by the recognized speaker in the prior stage. This proposed framework has been evaluated on an Arabic Emirati-accented speech database uttered by fifteen speakers per gender. Mel-Frequency Cepstral Coefficients (MFCCs) have been used as the extracted features and Hidden Markov Model (HMM) has been utilized as the classifier in this work. Our findings demonstrate that emotion recognition accuracy based on the two-stage framework is greater than that based on the one-stage approach and the state-of-the-art classifiers and models such as Gaussian Mixture Model (GMM), Support Vector Machine (SVM), and Vector Quantization (VQ). The average emotion recognition accuracy based on the two-stage approach is 67.5%, while the accuracy reaches to 61.4%, 63.3%, 64.5%, and 61.5%, based on the one-stage approach, GMM, SVM, and VQ, respectively. The achieved results based on the two-stage framework are very close to those attained in subjective assessment by human listeners.