论文标题
Globem数据集:纵向人类行为建模的多年数据集
GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization
论文作者
论文摘要
最近的研究表明,智能手机和可穿戴设备捕获的行为信号的能力用于纵向行为建模。但是,缺乏一个全面的公共数据集,它可以作为算法之间公平比较的开放测试。此外,先前的研究主要在短时间内使用来自单个人群的数据评估算法,而无需测量这些算法的跨数据集概括性。我们介绍了第一个多年的被动传感数据集,其中包含700多个用户年度和497个从移动和可穿戴传感器收集的唯一用户数据,以及广泛的福祉指标。我们的数据集可以支持对不同用户和年份中算法的可推广性的行为建模的多个跨数据集评估。作为起点,我们为抑郁症检测任务提供了18算法的基准结果。我们的结果表明,先前的抑郁检测算法和域的概括技术都显示出潜力,但需要进一步的研究以实现足够的跨数据集概括性。我们设想我们的多年数据集可以支持ML社区开发可推广的纵向行为建模算法。
Recent research has demonstrated the capability of behavior signals captured by smartphones and wearables for longitudinal behavior modeling. However, there is a lack of a comprehensive public dataset that serves as an open testbed for fair comparison among algorithms. Moreover, prior studies mainly evaluate algorithms using data from a single population within a short period, without measuring the cross-dataset generalizability of these algorithms. We present the first multi-year passive sensing datasets, containing over 700 user-years and 497 unique users' data collected from mobile and wearable sensors, together with a wide range of well-being metrics. Our datasets can support multiple cross-dataset evaluations of behavior modeling algorithms' generalizability across different users and years. As a starting point, we provide the benchmark results of 18 algorithms on the task of depression detection. Our results indicate that both prior depression detection algorithms and domain generalization techniques show potential but need further research to achieve adequate cross-dataset generalizability. We envision our multi-year datasets can support the ML community in developing generalizable longitudinal behavior modeling algorithms.