论文标题

广义减少:使任何分层聚类公平,平衡,成本低廉

Generalized Reductions: Making any Hierarchical Clustering Fair and Balanced with Low Cost

论文作者

Knittel, Marina, Springer, Max, Dickerson, John P., Hajiaghayi, MohammadTaghi

论文摘要

聚类是现代统计分析管道的基本组成部分。近年来,公平的聚类吸引了机器学习社区的关注。在Ahmadian等人的结果之后,我们是在等级聚类的背景下研究公平性的一些人。从2020年的Neurips来看。我们使用Dasgupta的成本函数评估了结果,这也许是用于分层聚类评估的最普遍的理论指标之一。我们的工作大大改善了前一个$ o(n^{5/6} poly \ log(n))$公平的近似值,成本近似于接近polygarithmic $ o(n^Δpoly\ log \ log(n))$公平近似值的任何常数$δ\ in(0,1)$。该结果建立了成本上的权衡,并且比以前的工作更加广泛地限制了更广泛的公平限制。我们还展示了如何更改现有的层次结构聚类,以确保层次结构中任何级别的公平性和聚类平衡。

Clustering is a fundamental building block of modern statistical analysis pipelines. Fair clustering has seen much attention from the machine learning community in recent years. We are some of the first to study fairness in the context of hierarchical clustering, after the results of Ahmadian et al. from NeurIPS in 2020. We evaluate our results using Dasgupta's cost function, perhaps one of the most prevalent theoretical metrics for hierarchical clustering evaluation. Our work vastly improves the previous $O(n^{5/6}poly\log(n))$ fair approximation for cost to a near polylogarithmic $O(n^δpoly\log(n))$ fair approximation for any constant $δ\in(0,1)$. This result establishes a cost-fairness tradeoff and extends to broader fairness constraints than the previous work. We also show how to alter existing hierarchical clusterings to guarantee fairness and cluster balance across any level in the hierarchy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源