论文标题
聚类而不知道如何:应用和评估
Clustering Without Knowing How To: Application and Evaluation
论文作者
论文摘要
众包允许在大批工人上执行简单的人类智能任务,从而解决问题,这些问题很难在合理的时间内制定算法或训练机器学习模型。这样的问题之一是数据聚类是通过针对人类简单的标准不足的标准进行的,但对于机器来说很难。在此演示论文中,我们在https://github.com/toloka/crowdclustering上构建了一个众包系统,用于图像聚类并发布其代码。我们在两个不同的图像数据集上进行的实验,来自Zalando的Feidegger的连衣裙和Toloka Shoes数据集中的鞋子,确认一个可以在没有机器学习算法的情况下产生有意义的簇,而纯粹是众包。
Crowdsourcing allows running simple human intelligence tasks on a large crowd of workers, enabling solving problems for which it is difficult to formulate an algorithm or train a machine learning model in reasonable time. One of such problems is data clustering by an under-specified criterion that is simple for humans, but difficult for machines. In this demonstration paper, we build a crowdsourced system for image clustering and release its code under a free license at https://github.com/Toloka/crowdclustering. Our experiments on two different image datasets, dresses from Zalando's FEIDEGGER and shoes from the Toloka Shoes Dataset, confirm that one can yield meaningful clusters with no machine learning algorithms purely with crowdsourcing.