论文标题
RLFLOW:通过世界模型优化神经网络子图转换
RLFlow: Optimising Neural Network Subgraph Transformation with World Models
论文作者
论文摘要
培训深度学习模型需要很长时间的执行时间,并且消耗了大量的计算资源。同时,最近的研究提出的系统和编译器有望降低深度学习模型运行时。希望在数据处理中采用有效的优化方法,而深度学习模型的计算要求减少是广泛研究的重点。 在本文中,我们通过探索加强学习(RL)代理以提高性能来解决神经网络子图转换。我们提出的方法RLFlow可以学会执行神经网络子图转换,而无需专业设计的启发式方法以实现高水平的性能。 最近的工作旨在将RL应用于具有成功的计算机系统,尤其是使用无模型的RL技术。基于模型的增强学习方法已经在研究中的重点增加了,因为它们可用于学习环境的过渡动态。可以利用这可以使用诸如世界模型(WM)之类的致幻环境来训练代理,从而提高样品效率与无模型方法相比。 WM使用各种自动编码器,并构建了系统的模型,并允许以廉价的方式探索该模型。 在RLFlow中,我们为具有WM的基于模型的代理提供了一种设计,该设计学会通过执行一系列子图转换来减少模型运行时来优化神经网络的体系结构。我们表明,我们的方法可以与常见的卷积网络上的最新性能相匹配,并且最多可比5%
Training deep learning models takes an extremely long execution time and consumes large amounts of computing resources. At the same time, recent research proposed systems and compilers that are expected to decrease deep learning models runtime. An effective optimisation methodology in data processing is desirable, and the reduction of compute requirements of deep learning models is the focus of extensive research. In this paper, we address the neural network sub-graph transformation by exploring reinforcement learning (RL) agents to achieve performance improvement. Our proposed approach RLFlow can learn to perform neural network subgraph transformations, without the need for expertly designed heuristics to achieve a high level of performance. Recent work has aimed at applying RL to computer systems with some success, especially using model-free RL techniques. Model-based reinforcement learning methods have seen an increased focus in research as they can be used to learn the transition dynamics of the environment; this can be leveraged to train an agent using a hallucinogenic environment such as World Model (WM), thereby increasing sample efficiency compared to model-free approaches. WM uses variational auto-encoders and it builds a model of the system and allows exploring the model in an inexpensive way. In RLFlow, we propose a design for a model-based agent with WM which learns to optimise the architecture of neural networks by performing a sequence of sub-graph transformations to reduce model runtime. We show that our approach can match the state-of-the-art performance on common convolutional networks and outperforms by up to 5% those based on transformer-style architectures