论文标题
使用MPI进行计算工程的分布式机器学习
Distributed Machine Learning for Computational Engineering using MPI
论文作者
论文摘要
我们为训练神经网络以及在平行计算环境中的部分微分方程(PDE)提供了一个框架。与大多数用于深神经网络的分布式计算框架不同,我们的重点是在前和伴随计算中同时将数值求解器和深神经网络平行。我们的并行计算模型将数据通信视为数值模拟的计算图中的节点。我们模型的优点是,数据通信和计算是干净分离的,从而提供了更好的灵活性,模块化和可检验性。我们证明了使用各种大规模问题,通过在训练与PDE的深层神经网络中使用平行求解器,可以实现大量加速。
We propose a framework for training neural networks that are coupled with partial differential equations (PDEs) in a parallel computing environment. Unlike most distributed computing frameworks for deep neural networks, our focus is to parallelize both numerical solvers and deep neural networks in forward and adjoint computations. Our parallel computing model views data communication as a node in the computational graph for numerical simulations. The advantage of our model is that data communication and computing are cleanly separated and thus provide better flexibility, modularity, and testability. We demonstrate using various large-scale problems that we can achieve substantial acceleration by using parallel solvers for PDEs in training deep neural networks that are coupled with PDEs.