沃尔（Walle）：用于设备云协作机器学习的端到端，通用和大型生产系统

论文标题

沃尔（Walle）：用于设备云协作机器学习的端到端，通用和大型生产系统

Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning

论文作者

Lv, Chengfei, Niu, Chaoyue, Gu, Renjie, Jiang, Xiaotang, Wang, Zhaode, Liu, Bin, Wu, Ziqi, Yao, Qiulin, Huang, Congyu, Huang, Panos, Huang, Tao, Shu, Hui, Song, Jinde, Zou, Bin, Lan, Peng, Xu, Guohuan, Wu, Fei, Tang, Shaojie, Wu, Fan, Chen, Guihai

论文摘要

为了打破基于云的主流机器学习（ML）范式的瓶颈，我们采用了设备云协作的ML，并建立了第一个端到端和通用系统，称为Walle作为基础。沃尔（Walle）由一个部署平台组成，及时将ML任务分配给十亿个尺度设备；数据管道，有效准备任务输入；以及一个计算容器，提供跨平台和高性能执行环境，同时促进日常任务迭代。具体而言，计算容器基于移动神经网络（MNN），张量计算引擎以及数据处理和模型执行库，这些库是通过精制的Python线程级虚拟机（VM）公开的，以支持多样化的ML任务和同时执行。 MNN的核心是操作员分解和半自动搜索的新型机制，在手动优化数百个硬件后端的数百个运算符时大大降低了工作量，并通过计算图进一步快速识别运行时的后端。数据管道引入了设备流处理框架，以启用源的处理用户行为数据。部署平台通过有效的推动方法释放ML任务，并支持多粒度部署策略。我们在实用的电子商务应用程序方案中评估沃尔勒，以证明其有效性，效率和可扩展性。广泛的微基准也强调了MNN和Python线程级VM的出色性能。沃尔（Walle）一直在阿里巴巴进行大规模生产使用，而MNN则是开源的，对社区产生了广泛的影响。

To break the bottlenecks of mainstream cloud-based machine learning (ML) paradigm, we adopt device-cloud collaborative ML and build the first end-to-end and general-purpose system, called Walle, as the foundation. Walle consists of a deployment platform, distributing ML tasks to billion-scale devices in time; a data pipeline, efficiently preparing task input; and a compute container, providing a cross-platform and high-performance execution environment, while facilitating daily task iteration. Specifically, the compute container is based on Mobile Neural Network (MNN), a tensor compute engine along with the data processing and model execution libraries, which are exposed through a refined Python thread-level virtual machine (VM) to support diverse ML tasks and concurrent task execution. The core of MNN is the novel mechanisms of operator decomposition and semi-auto search, sharply reducing the workload in manually optimizing hundreds of operators for tens of hardware backends and further quickly identifying the best backend with runtime optimization for a computation graph. The data pipeline introduces an on-device stream processing framework to enable processing user behavior data at source. The deployment platform releases ML tasks with an efficient push-then-pull method and supports multi-granularity deployment policies. We evaluate Walle in practical e-commerce application scenarios to demonstrate its effectiveness, efficiency, and scalability. Extensive micro-benchmarks also highlight the superior performance of MNN and the Python thread-level VM. Walle has been in large-scale production use in Alibaba, while MNN has been open source with a broad impact in the community.

下载PDF全文

下载文献需遵守相关版权规定

论文标题