论文标题
偏见字节:关于数字痕迹估算食品消耗的有效性
Biased Bytes: On the Validity of Estimating Food Consumption from Digital Traces
论文作者
论文摘要
鉴于在人口规模上测量食品消费是一项艰巨的任务,研究人员已经开始探索数字痕迹(例如,从社交媒体或食品跟踪应用程序中)作为潜在的代理。但是,尚不清楚数字痕迹在多大程度上反映了真正的食物消耗。本研究的目的是通过量化通过社交媒体(Twitter)V.S.捕获的饮食行为之间的联系来弥合这一差距。食品跟踪应用程序(myfoodrepo)。我们通过设计和部署一个新型的众包框架来估算有关营养特性和外观的偏见,专注于瑞士的情况以及通过两个平台收集的食物的对比图像。我们发现社交媒体中的食物类型分布v.s。食物跟踪分歧;例如,在消费和追踪食品中,面包的频率是Twitter上的2.5倍,而Twitter上的蛋糕的频率高12倍。在控制不同的食物类型分布的情况下,我们将特定类型的食物与Twitter上共享的食物进行了对比。在食品类型中,与消费和追踪的食物相比,在Twitter上发布的食物被认为是更美味,更热量,更健康,更少在家中消耗的食物,更复杂且更大的食物。通过两个平台来衡量的食物消费之间存在差异,这意味着,至少两者中的至少一个不是忠实地代表瑞士人人口中真正的食品消费量。因此,研究人员应保持专注,并旨在在使用数字痕迹作为一般人群的真正食品消耗之前,建立有效性的证据。我们通过讨论了这些偏见的潜在来源及其含义,概述了陷阱和对有效性的威胁,并提出了克服它们的可行方式。
Given that measuring food consumption at a population scale is a challenging task, researchers have begun to explore digital traces (e.g., from social media or from food-tracking applications) as potential proxies. However, it remains unclear to what extent digital traces reflect real food consumption. The present study aims to bridge this gap by quantifying the link between dietary behaviors as captured via social media (Twitter) v.s. a food-tracking application (MyFoodRepo). We focus on the case of Switzerland and contrast images of foods collected through the two platforms, by designing and deploying a novel crowdsourcing framework for estimating biases with respect to nutritional properties and appearance. We find that the food type distributions in social media v.s. food tracking diverge; e.g., bread is 2.5 times more frequent among consumed and tracked foods than on Twitter, whereas cake is 12 times more frequent on Twitter. Controlling for the different food type distributions, we contrast consumed and tracked foods of a given type with foods shared on Twitter. Across food types, food posted on Twitter is perceived as tastier, more caloric, less healthy, less likely to have been consumed at home, more complex, and larger-portioned, compared to consumed and tracked foods. The fact that there is a divergence between food consumption as measured via the two platforms implies that at least one of the two is not a faithful representation of the true food consumption in the general Swiss population. Thus, researchers should be attentive and aim to establish evidence of validity before using digital traces as a proxy for the true food consumption of a general population. We conclude by discussing the potential sources of these biases and their implications, outlining pitfalls and threats to validity, and proposing actionable ways for overcoming them.