论文标题
广义减少:使任何分层聚类公平,平衡,成本低廉
Generalized Reductions: Making any Hierarchical Clustering Fair and Balanced with Low Cost
论文作者
论文摘要
聚类是现代统计分析管道的基本组成部分。近年来,公平的聚类吸引了机器学习社区的关注。在Ahmadian等人的结果之后,我们是在等级聚类的背景下研究公平性的一些人。从2020年的Neurips来看。我们使用Dasgupta的成本函数评估了结果,这也许是用于分层聚类评估的最普遍的理论指标之一。我们的工作大大改善了前一个$ o(n^{5/6} poly \ log(n))$公平的近似值,成本近似于接近polygarithmic $ o(n^Δpoly\ log \ log(n))$公平近似值的任何常数$δ\ in(0,1)$。该结果建立了成本上的权衡,并且比以前的工作更加广泛地限制了更广泛的公平限制。我们还展示了如何更改现有的层次结构聚类,以确保层次结构中任何级别的公平性和聚类平衡。
Clustering is a fundamental building block of modern statistical analysis pipelines. Fair clustering has seen much attention from the machine learning community in recent years. We are some of the first to study fairness in the context of hierarchical clustering, after the results of Ahmadian et al. from NeurIPS in 2020. We evaluate our results using Dasgupta's cost function, perhaps one of the most prevalent theoretical metrics for hierarchical clustering evaluation. Our work vastly improves the previous $O(n^{5/6}poly\log(n))$ fair approximation for cost to a near polylogarithmic $O(n^δpoly\log(n))$ fair approximation for any constant $δ\in(0,1)$. This result establishes a cost-fairness tradeoff and extends to broader fairness constraints than the previous work. We also show how to alter existing hierarchical clusterings to guarantee fairness and cluster balance across any level in the hierarchy.