论文标题
通过合成数据集设计调查神经体系结构
Investigating Neural Architectures by Synthetic Dataset Design
论文作者
论文摘要
近年来,许多新的神经网络结构(建筑和层)的出现。为了解决给定的任务,网络需要在其结构中反映的一定能力。所需的能力取决于每个任务。到目前为止,还没有对拟议神经结构的实际能力进行系统的研究。每种结构可以和无法实现的问题仅通过其在共同基准上的性能得到部分回答。实际上,自然数据包含复杂的未知统计提示。因此,不可能知道给定神经结构在此类数据中利用什么提示。在这项工作中,我们通过设计临时合成数据集来绘制一种方法,以测量每种结构对网络能力的影响。每个数据集都经过量身定制以评估给定能力,并将其简化为最简单的形式:每个输入完全包含解决任务所需的信息量。我们通过构建三个数据集来评估以下三个网络属性中的每个数据集来说明我们的方法论:a)将本地提示与遥远推论联系起来的能力,b)翻译协方差和c)能够将具有相同特征的像素分组并在其中共享信息。使用第一个简化的深度估计数据集,我们确定了U-NET的严重非本地赤字。然后,我们通过将其结构嵌入非局部层来评估如何解决此限制,从而允许计算具有远距离依赖性的复杂特征。使用第二个数据集,我们比较了不同的位置编码方法,并使用结果进一步改善了深度估计任务的U-NET。第三个介绍的数据集用于证明需要自我注意的机制来解决更现实的深度估计任务。
Recent years have seen the emergence of many new neural network structures (architectures and layers). To solve a given task, a network requires a certain set of abilities reflected in its structure. The required abilities depend on each task. There is so far no systematic study of the real capacities of the proposed neural structures. The question of what each structure can and cannot achieve is only partially answered by its performance on common benchmarks. Indeed, natural data contain complex unknown statistical cues. It is therefore impossible to know what cues a given neural structure is taking advantage of in such data. In this work, we sketch a methodology to measure the effect of each structure on a network's ability, by designing ad hoc synthetic datasets. Each dataset is tailored to assess a given ability and is reduced to its simplest form: each input contains exactly the amount of information needed to solve the task. We illustrate our methodology by building three datasets to evaluate each of the three following network properties: a) the ability to link local cues to distant inferences, b) the translation covariance and c) the ability to group pixels with the same characteristics and share information among them. Using a first simplified depth estimation dataset, we pinpoint a serious nonlocal deficit of the U-Net. We then evaluate how to resolve this limitation by embedding its structure with nonlocal layers, which allow computing complex features with long-range dependencies. Using a second dataset, we compare different positional encoding methods and use the results to further improve the U-Net on the depth estimation task. The third introduced dataset serves to demonstrate the need for self-attention-like mechanisms for resolving more realistic depth estimation tasks.