论文标题

对弹弓互连的深入分析

An In-Depth Analysis of the Slingshot Interconnect

论文作者

De Sensi, Daniele, Di Girolamo, Salvatore, McMahon, Kim H., Roweth, Duncan, Hoefler, Torsten

论文摘要

互连是大规模计算系统中最关键的组件之一,其对应用程序性能的影响将随系统尺寸而增加。在本文中,我们将描述Slingshot,这是一个用于大规模计算系统的互连网络。 Slingshot基于高速公路开关,该开关允许使用Exascale和Hyperscale Datacenters网络具有最多三个开关转换为开关啤酒花。此外,弹弓提供有效的自适应路由和拥塞控制算法以及高度可调的交通类别。 Slingshot使用优化的以太网协议,该协议可以与标准以太网设备互操作,同时为HPC应用程序提供高性能。我们分析了Slingshot提供这些功能的程度,并在Microbenchs和DataCenter和AI Worlds以及HPC应用程序中对其进行评估。我们发现,与上一代网络相比,在弹弓上运行的应用程序受拥挤的影响较小。

The interconnect is one of the most critical components in large scale computing systems, and its impact on the performance of applications is going to increase with the system size. In this paper, we will describe Slingshot, an interconnection network for large scale computing systems. Slingshot is based on high-radix switches, which allow building exascale and hyperscale datacenters networks with at most three switch-to-switch hops. Moreover, Slingshot provides efficient adaptive routing and congestion control algorithms, and highly tunable traffic classes. Slingshot uses an optimized Ethernet protocol, which allows it to be interoperable with standard Ethernet devices while providing high performance to HPC applications. We analyze the extent to which Slingshot provides these features, evaluating it on microbenchmarks and on several applications from the datacenter and AI worlds, as well as on HPC applications. We find that applications running on Slingshot are less affected by congestion compared to previous generation networks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源