论文标题

可扩展的集合编码具有通用迷你批量一致性和无偏置的全套梯度近似

Scalable Set Encoding with Universal Mini-Batch Consistency and Unbiased Full Set Gradient Approximation

论文作者

Willette, Jeffrey, Lee, Seanie, Andreis, Bruno, Kawaguchi, Kenji, Lee, Juho, Hwang, Sung Ju

论文摘要

针对集合功能的微型批次一致性(MBC)的最新工作引起了人们对分区设置的依次处理和汇总块的需求,同时保证所有分区的相同输出。但是,现有对MBC体系结构的限制导致表达能力有限的模型。此外,在需要全套梯度时,先前的工作尚未解决如何在训练过程中处理大型集合。为了解决这些问题,我们提出了一系列普遍的MBC(UMBC)类别功能类别,可以与任意的非MBC组件一起使用,同时仍然满足MBC,从而在MBC设置中使用更广泛的功能类。此外,我们提出了一种有效的MBC训练算法,该算法对完整集梯度的近似值无偏见,并且对于火车和测试时间的任何设定大小都有恒定的内存开销。我们进行了广泛的实验,包括图像完成,文本分类,无监督的聚类和高分辨率图像的癌症检测,以验证我们可扩展的集合编码框架的效率和功效。我们的代码可在github.com/jeffwillette/umbc上获得

Recent work on mini-batch consistency (MBC) for set functions has brought attention to the need for sequentially processing and aggregating chunks of a partitioned set while guaranteeing the same output for all partitions. However, existing constraints on MBC architectures lead to models with limited expressive power. Additionally, prior work has not addressed how to deal with large sets during training when the full set gradient is required. To address these issues, we propose a Universally MBC (UMBC) class of set functions which can be used in conjunction with arbitrary non-MBC components while still satisfying MBC, enabling a wider range of function classes to be used in MBC settings. Furthermore, we propose an efficient MBC training algorithm which gives an unbiased approximation of the full set gradient and has a constant memory overhead for any set size for both train- and test-time. We conduct extensive experiments including image completion, text classification, unsupervised clustering, and cancer detection on high-resolution images to verify the efficiency and efficacy of our scalable set encoding framework. Our code is available at github.com/jeffwillette/umbc

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源