论文标题

AIBENCH培训:平衡行业标准的AI培训基准测试

AIBench Training: Balanced Industry-Standard AI Training Benchmarking

论文作者

Tang, Fei, Gao, Wanling, Zhan, Jianfeng, Lan, Chuanxin, Wen, Xu, Wang, Lei, Luo, Chunjie, Dai, Jiahui, Cao, Zheng, Xiong, Xingwang, Jiang, Zihan, Hao, Tianshu, Fan, Fanda, Zhang, Fan, Huang, Yunyou, Chen, Jianan, Du, Mengjia, Ren, Rui, Zheng, Chen, Zheng, Daoyi, Tang, Haoning, Zhan, Kunlin, Wang, Biao, Kong, Defei, Yu, Minghe, Tan, Chongkang, Li, Huan, Tian, Xinhui, Li, Yatao, Shao, Junchao, Wang, Zhenyu, Wang, Xiaoyu, Ye, Hainan

论文摘要

对新的AI体系结构/系统的早期评估需要负担得起的基准。仅在其他阶段使用几个AI组件基准等MLPERFALONE可能会导致误导性结论。此外,学习动态尚未得到充分理解,基准的保质期很短。本文提出了平衡的基准测定方法。我们使用现实世界的基准来涵盖在最大程度上影响学习动态的因素空间。在对Internet Service AI域进行了详尽的调查之后,我们通过最先进的模型识别并实施了19个代表性AI任务。对于可重复的性能排名(RPR子集)和工作负载表征(WC子集),我们将两个子集保持在最低限度以实现可负担性。我们为迄今为止最全面的AI培训基准套件做出了贡献。评估显示:(1)AIBCHENCH培训(v1.1)在模型复杂性,计算成本,收敛速度,计算速率和记忆访问模式以及热点功能的多样性和代表性方面优于mlperftraining(v0.7); (2)针对Aibench全基准测试,其RPR子集可缩短基准成本64%,同时保持主要的工作量特征; (3)性能排名显示,具有优化的TensorFlowFramework的单用AI ACELERATOR,例如TENSORFOLLAMEWORK的性能优于GPU,同时失去了后者对各种AI模型的一般支持。可从AIBENCH主页https://www.benchcouncil.org/aibench-training/index.html获得规范,源代码和性能号。

Earlier-stage evaluations of a new AI architecture/system need affordable benchmarks. Only using a few AI component benchmarks like MLPerfalone in the other stages may lead to misleading conclusions. Moreover, the learning dynamics are not well understood, and the benchmarks' shelf-life is short. This paper proposes a balanced benchmarking methodology. We use real-world benchmarks to cover the factors space that impacts the learning dynamics to the most considerable extent. After performing an exhaustive survey on Internet service AI domains, we identify and implement nineteen representative AI tasks with state-of-the-art models. For repeatable performance ranking (RPR subset) and workload characterization (WC subset), we keep two subsets to a minimum for affordability. We contribute by far the most comprehensive AI training benchmark suite. The evaluations show: (1) AIBench Training (v1.1) outperforms MLPerfTraining (v0.7) in terms of diversity and representativeness of model complexity, computational cost, convergent rate, computation, and memory access patterns, and hotspot functions; (2) Against the AIBench full benchmarks, its RPR subset shortens the benchmarking cost by 64%, while maintaining the primary workload characteristics; (3) The performance ranking shows the single-purpose AI accelerator like TPU with the optimized TensorFlowframework performs better than that of GPUs while losing the latter's general support for various AI models. The specification, source code, and performance numbers are available from the AIBench homepage https://www.benchcouncil.org/aibench-training/index.html.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源