论文标题
Recshard:针对行业尺度神经建议的基于统计功能的内存优化
RecShard: Statistical Feature-Based Memory Optimization for Industry-Scale Neural Recommendation
论文作者
论文摘要
我们提出了Recshard,这是深度学习推荐模型(DLRMS)的细粒嵌入式嵌入式(EMB)分区和放置技术。 Recshard是基于两个关键观察的设计。首先,并非所有EMB都是相等的,在访问模式方面,EMB中的所有行都相等。 EMB具有独特的记忆特性,为跨分层内存层次结构的智能嵌入分区和放置提供了性能优化机会。第二,在现代DLRM中,EMB充当哈希表。结果,EMB表现出有趣的现象,例如生日悖论,使EMB严重未充分利用。 Recshard根据训练数据分布和模型特征确定了一组EMB的最佳EMB碎片策略,以及基础分层内存层次结构的带宽特征。在此过程中,Recshard在容量受限的DLRM中平均达到了超过6倍的EMB训练吞吐量。吞吐量增加来自改善EMB负载余额以上超过12倍,并且从降低对较慢的内存的访问量则增加了87次以上。
We propose RecShard, a fine-grained embedding table (EMB) partitioning and placement technique for deep learning recommendation models (DLRMs). RecShard is designed based on two key observations. First, not all EMBs are equal, nor all rows within an EMB are equal in terms of access patterns. EMBs exhibit distinct memory characteristics, providing performance optimization opportunities for intelligent EMB partitioning and placement across a tiered memory hierarchy. Second, in modern DLRMs, EMBs function as hash tables. As a result, EMBs display interesting phenomena, such as the birthday paradox, leaving EMBs severely under-utilized. RecShard determines an optimal EMB sharding strategy for a set of EMBs based on training data distributions and model characteristics, along with the bandwidth characteristics of the underlying tiered memory hierarchy. In doing so, RecShard achieves over 6 times higher EMB training throughput on average for capacity constrained DLRMs. The throughput increase comes from improved EMB load balance by over 12 times and from the reduced access to the slower memory by over 87 times.