动态的GPU能源优化用于机器学习培训工作量

论文标题

动态的GPU能源优化用于机器学习培训工作量

Dynamic GPU Energy Optimization for Machine Learning Training Workloads

论文作者

Wang, Farui, Zhang, Weizhe, Lai, Shichao, Hao, Meng, Wang, Zheng

论文摘要

GPU被广泛用于加速机器学习工作负载的培训。随着现代机器学习模型越来越大，它们需要更长的时间来训练，从而导致GPU更高的能耗。本文介绍了GPOEO，这是用于机器学习培训工作量的在线GPU能源优化框架。 GPOEO通过采用新颖的技术来在线测量，多目标预测建模和搜索优化来动态确定最佳能量配置。为了表征目标工作负载行为，GPOEO使用GPU性能计数器。为了减少绩效计数器分析开销，它使用一个分析模型来检测训练迭代的变化，仅在检测到迭代转移时收集性能计数器数据。 GPOEO采用基于梯度提升和本地搜索算法的多目标模型，以在执行时间和能耗之间找到权衡。我们通过将其应用于在NVIDIA RTX3080TI GPU上运行的两个AI基准套件的71个机器学习工作负载来评估GPOEO。与NVIDIA默认调度策略相比，GPOEO的平均能源节省16.2％，平均执行时间适度增加5.1％。

GPUs are widely used to accelerate the training of machine learning workloads. As modern machine learning models become increasingly larger, they require a longer time to train, leading to higher GPU energy consumption. This paper presents GPOEO, an online GPU energy optimization framework for machine learning training workloads. GPOEO dynamically determines the optimal energy configuration by employing novel techniques for online measurement, multi-objective prediction modeling, and search optimization. To characterize the target workload behavior, GPOEO utilizes GPU performance counters. To reduce the performance counter profiling overhead, it uses an analytical model to detect the training iteration change and only collects performance counter data when an iteration shift is detected. GPOEO employs multi-objective models based on gradient boosting and a local search algorithm to find a trade-off between execution time and energy consumption. We evaluate the GPOEO by applying it to 71 machine learning workloads from two AI benchmark suites running on an NVIDIA RTX3080Ti GPU. Compared with the NVIDIA default scheduling strategy, GPOEO delivers a mean energy saving of 16.2% with a modest average execution time increase of 5.1%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题