通过三流网络的实时牛互动识别

论文标题

通过三流网络的实时牛互动识别

Real-Time Cattle Interaction Recognition via Triple-stream Network

论文作者

Yang, Yang, Komatsu, Mizuka, Oyama, Kenji, Ohkawa, Takenao

论文摘要

在肉牛的库存中，基于计算机视觉的方法已被广泛用于监测牛状况（例如，物理，生理和健康）。为此，准确有效地认识牛行动是先决条件。通常，大多数现有模型仅限于个人行为，这些行为使用基于视频的方法提取时空特征来识别每只牛的个体作用。但是，牛之间存在社会性，它们的相互作用通常反映了重要条件，例如Estrus以及基于视频的方法忽略了模型的实时功能。基于这一点，我们解决了本文单一框架中牛之间的实时识别的具有挑战性的任务。我们方法的管道包括两个主要模块：牛本地化网络和交互识别网络。在每时每刻，牛本地化网络都会从每个检测到的牛输出高质量的互动建议，并将其馈入具有三流体系结构的交互识别网络。这样的三流网络使我们能够融合与识别交互作用相关的不同功能。具体而言，这三种特征是一个视觉特征，它提取了相互作用建议的外观表示，这是反映牛之间空间关系的几何特征，以及一种语义特征，一种语义特征，它捕获了我们对牛个人作用与牛相互作用之间关系的先验知识。此外，为了解决数量不足的标记数据问题，我们基于自学学习的学习预先培训模型。定性和定量评估证明了我们框架作为实时识别牛相互作用的有效方法的性能。

In stockbreeding of beef cattle, computer vision-based approaches have been widely employed to monitor cattle conditions (e.g. the physical, physiology, and health). To this end, the accurate and effective recognition of cattle action is a prerequisite. Generally, most existing models are confined to individual behavior that uses video-based methods to extract spatial-temporal features for recognizing the individual actions of each cattle. However, there is sociality among cattle and their interaction usually reflects important conditions, e.g. estrus, and also video-based method neglects the real-time capability of the model. Based on this, we tackle the challenging task of real-time recognizing interactions between cattle in a single frame in this paper. The pipeline of our method includes two main modules: Cattle Localization Network and Interaction Recognition Network. At every moment, cattle localization network outputs high-quality interaction proposals from every detected cattle and feeds them into the interaction recognition network with a triple-stream architecture. Such a triple-stream network allows us to fuse different features relevant to recognizing interactions. Specifically, the three kinds of features are a visual feature that extracts the appearance representation of interaction proposals, a geometric feature that reflects the spatial relationship between cattle, and a semantic feature that captures our prior knowledge of the relationship between the individual action and interaction of cattle. In addition, to solve the problem of insufficient quantity of labeled data, we pre-train the model based on self-supervised learning. Qualitative and quantitative evaluation evidences the performance of our framework as an effective method to recognize cattle interaction in real time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题