论文标题
Eudoxus:在自动机器中表征和加速本地化
Eudoxus: Characterizing and Accelerating Localization in Autonomous Machines
论文作者
论文摘要
我们在全球范围内开发和商业化自动驾驶机器,例如物流机器人和自动驾驶汽车。在资源限制下,对我们以及任何自主机器的一个关键挑战是准确有效的本地化,该机器最近推动了专业定位加速器。事先加速工作是点解决方案,因为它们每个都专门针对特定的本地化算法。但是,在现实的商业部署中,自主机器通常在不同的环境下运行,并且没有单个本地化算法适合所有环境。简单地将点解决方案堆叠在一起,不仅会导致成本和功率预算超支,而且还会导致过度复杂的软件堆栈。 本文展示了我们针对自动机器本地化的新软件硬件共同设计的框架,该框架通过融合基本算法原始词来适应不同的操作场景。通过表征软件框架,我们确定了理想的加速候选者,这些候选者对端到端的潜伏期和/或延迟变化产生了重大贡献。我们展示了如何共同设计硬件加速器以系统利用本地化框架固有的并行性,局部性和常见的构建块。我们在下一代自动驾驶汽车上构建,部署和评估FPGA原型。为了证明我们的框架的灵活性,我们还实例化了另一个代表移动自动机器的FPGA原型针对无人机。与在通用平台上广泛,优化的实现相比,我们实现了约2倍的加速和4倍的能量。
We develop and commercialize autonomous machines, such as logistic robots and self-driving cars, around the globe. A critical challenge to our -- and any -- autonomous machine is accurate and efficient localization under resource constraints, which has fueled specialized localization accelerators recently. Prior acceleration efforts are point solutions in that they each specialize for a specific localization algorithm. In real-world commercial deployments, however, autonomous machines routinely operate under different environments and no single localization algorithm fits all the environments. Simply stacking together point solutions not only leads to cost and power budget overrun, but also results in an overly complicated software stack. This paper demonstrates our new software-hardware co-designed framework for autonomous machine localization, which adapts to different operating scenarios by fusing fundamental algorithmic primitives. Through characterizing the software framework, we identify ideal acceleration candidates that contribute significantly to the end-to-end latency and/or latency variation. We show how to co-design a hardware accelerator to systematically exploit the parallelisms, locality, and common building blocks inherent in the localization framework. We build, deploy, and evaluate an FPGA prototype on our next-generation self-driving cars. To demonstrate the flexibility of our framework, we also instantiate another FPGA prototype targeting drones, which represent mobile autonomous machines. We achieve about 2x speedup and 4x energy reduction compared to widely-deployed, optimized implementations on general-purpose platforms.