论文标题
Fisheyehdk:超宽视野图像识别的双曲线可变形内核学习
FisheyeHDK: Hyperbolic Deformable Kernel Learning for Ultra-Wide Field-of-View Image Recognition
论文作者
论文摘要
在狭窄的视野(FOV)图像中训练的常规卷积神经网络(CNN)是对象识别任务的最新方法。一些方法提出了通过学习可变形核对CNN对超宽FOV图像的适应。但是,它们受到欧几里得几何形状的限制及其在鱼眼预测引起的剧烈扭曲下的准确性降低。在这项工作中,我们证明了在非欧几里得空间中学习卷积内核的形状比现有的可变形核方法更好。特别是,我们提出了一种新方法,该方法在双曲线空间中学习可变形的内核参数(位置)。 Fisheyehdk是一种混合CNN体系结构,结合了双曲线和欧几里得卷积层,用于位置和特征学习。首先,我们为广泛的FOV图像提供双曲线空间的直觉。使用合成失真曲线,我们证明了方法的有效性。我们选择了两个数据集 - 城市景观和BDD100K 2020-透视图像,它们在不同的缩放因子(模拟与焦距)上转换为fisheye等效物。最后,我们提供了一个由真实鱼眼相机收集的数据的实验。验证和实验表明,我们的方法改善了现有的可变形核方法,用于CNN适应鱼眼图像。
Conventional convolution neural networks (CNNs) trained on narrow Field-of-View (FoV) images are the state-of-the-art approaches for object recognition tasks. Some methods proposed the adaptation of CNNs to ultra-wide FoV images by learning deformable kernels. However, they are limited by the Euclidean geometry and their accuracy degrades under strong distortions caused by fisheye projections. In this work, we demonstrate that learning the shape of convolution kernels in non-Euclidean spaces is better than existing deformable kernel methods. In particular, we propose a new approach that learns deformable kernel parameters (positions) in hyperbolic space. FisheyeHDK is a hybrid CNN architecture combining hyperbolic and Euclidean convolution layers for positions and features learning. First, we provide an intuition of hyperbolic space for wide FoV images. Using synthetic distortion profiles, we demonstrate the effectiveness of our approach. We select two datasets - Cityscapes and BDD100K 2020 - of perspective images which we transform to fisheye equivalents at different scaling factors (analog to focal lengths). Finally, we provide an experiment on data collected by a real fisheye camera. Validations and experiments show that our approach improves existing deformable kernel methods for CNN adaptation on fisheye images.