使用面部标志性跟踪来确定美国手语识别的句子类型的分类模型

论文标题

使用面部标志性跟踪来确定美国手语识别的句子类型的分类模型

A Classification Model Utilizing Facial Landmark Tracking to Determine Sentence Types for American Sign Language Recognition

论文作者

Nguyen, Janice, Wang, Y. Curtis

论文摘要

聋人和难以听见的社区依靠美国的手语（ASL）作为其主要的交流方式，但是与不知道ASL的其他人的沟通可能很困难，尤其是在没有解释器的紧急情况下。为了减轻这个问题，基于计算机视觉的实时ASL解释模型的研究正在进行中。但是，这些模型中的大多数都是基于手形的（手势），并且缺乏面部提示的整合，这对于ASL至关重要，可以传达音调和区分句子类型。因此，在基于计算机视觉的ASL解释模型中面部提示的整合具有提高性能和可靠性的潜力。在本文中，我们介绍了一个简单，计算上的基于面部表达的分类模型，可用于改善ASL解释模型。该模型利用面部地标的相对角度通过主成分分析和随机的森林分类树模型来对签名完整句子的ASL用户的视频进行分类。该模型将帧分类为语句或断言。该模型能够达到86.5％的精度。

The deaf and hard of hearing community relies on American Sign Language (ASL) as their primary mode of communication, but communication with others who do not know ASL can be difficult, especially during emergencies where no interpreter is available. As an effort to alleviate this problem, research in computer vision based real time ASL interpreting models is ongoing. However, most of these models are hand shape (gesture) based and lack the integration of facial cues, which are crucial in ASL to convey tone and distinguish sentence types. Thus, the integration of facial cues in computer vision based ASL interpreting models has the potential to improve performance and reliability. In this paper, we introduce a simple, computationally efficient facial expression based classification model that can be used to improve ASL interpreting models. This model utilizes the relative angles of facial landmarks with principal component analysis and a Random Forest Classification tree model to classify frames taken from videos of ASL users signing a complete sentence. The model classifies the frames as statements or assertions. The model was able to achieve an accuracy of 86.5%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题