论文标题
汇集:数据收集概念,数据集和基准,用于野外独立社交互动的机器分析
ConfLab: A Data Collection Concept, Dataset, and Benchmark for Machine Analysis of Free-Standing Social Interactions in the Wild
论文作者
论文摘要
由于几个因素之间的微妙权衡:参与者的隐私,生态有效性,数据保真度和后勤开销,记录野外无脚本人类相互作用的动态。为了解决这些问题,按照社区精神为社区的“数据集”,我们提出了会议生活实验室(Conflab):一种新的概念,用于多模式多传感器数据集合的野外独立社交对话。对于此处描述的汇集的首次实例化,我们在一次大型国际会议上组织了现实生活中的专业网络活动。该数据集涉及48个会议参与者,捕获了地位,熟人和网络动机的各种组合。 Our capture setup improves upon the data fidelity of prior in-the-wild datasets while retaining privacy sensitivity: 8 videos (1920x1080, 60 fps) from a non-invasive overhead view, and custom wearable sensors with onboard recording of body motion (full 9-axis IMU), privacy-preserving low-frequency audio (1250 Hz), and Bluetooth-based proximity.此外,我们开发了用于分布式硬件同步的自定义解决方案,并以高采样速率对人体关键点和动作进行时间效率的连续注释。我们的基准测试显示了一些与野外隐私的社交数据分析相关的开放研究任务:从高架摄像头视图,基于骨架的NO-Audio扬声器检测和F-Formation检测中检测到关键点的检测。
Recording the dynamics of unscripted human interactions in the wild is challenging due to the delicate trade-offs between several factors: participant privacy, ecological validity, data fidelity, and logistical overheads. To address these, following a 'datasets for the community by the community' ethos, we propose the Conference Living Lab (ConfLab): a new concept for multimodal multisensor data collection of in-the-wild free-standing social conversations. For the first instantiation of ConfLab described here, we organized a real-life professional networking event at a major international conference. Involving 48 conference attendees, the dataset captures a diverse mix of status, acquaintance, and networking motivations. Our capture setup improves upon the data fidelity of prior in-the-wild datasets while retaining privacy sensitivity: 8 videos (1920x1080, 60 fps) from a non-invasive overhead view, and custom wearable sensors with onboard recording of body motion (full 9-axis IMU), privacy-preserving low-frequency audio (1250 Hz), and Bluetooth-based proximity. Additionally, we developed custom solutions for distributed hardware synchronization at acquisition and time-efficient continuous annotation of body keypoints and actions at high sampling rates. Our benchmarks showcase some of the open research tasks related to in-the-wild privacy-preserving social data analysis: keypoints detection from overhead camera views, skeleton-based no-audio speaker detection, and F-formation detection.