论文标题

解决机器学习概念漂移揭示了COVID-19大流行期间疫苗情绪下降

Addressing machine learning concept drift reveals declining vaccine sentiment during the COVID-19 pandemic

论文作者

Müller, Martin, Salathé, Marcel

论文摘要

社交媒体分析已成为评估各种主题的公众舆论的常见方法,包括有关健康的主题,几乎是实时的。社交媒体帖子的日益增长导致对自然语言处理中现代机器学习方法的使用增加。尽管社交媒体的快速动态可以迅速捕获潜在的趋势,但它也构成了一个技术问题:过去对带注释的数据训练的算法在应用于当代数据时可能表现不佳。当快速转移在感兴趣的话题本身或讨论主题的方式中,这种现象被称为概念漂移,可能会特别有问题。在这里,我们通过将重点放在Twitter上表达的疫苗情感上来探讨机器学习概念的影响,这是一个至关重要的话题,尤其是在Covid-19-19大流行期间。我们表明,尽管疫苗情绪在2020年大流行期间大幅下降,但经过大频繁数据训练的算法会由于概念漂移而在很大程度上错过了这种下降。我们的结果表明,社交媒体分析系统必须以连续的方式解决概念漂移,以避免系统错误地分类数据的风险,这在危机期间尤其可能是在危机中,当基础数据突然而迅速地变化时。

Social media analysis has become a common approach to assess public opinion on various topics, including those about health, in near real-time. The growing volume of social media posts has led to an increased usage of modern machine learning methods in natural language processing. While the rapid dynamics of social media can capture underlying trends quickly, it also poses a technical problem: algorithms trained on annotated data in the past may underperform when applied to contemporary data. This phenomenon, known as concept drift, can be particularly problematic when rapid shifts occur either in the topic of interest itself, or in the way the topic is discussed. Here, we explore the effect of machine learning concept drift by focussing on vaccine sentiments expressed on Twitter, a topic of central importance especially during the COVID-19 pandemic. We show that while vaccine sentiment has declined considerably during the COVID-19 pandemic in 2020, algorithms trained on pre-pandemic data would have largely missed this decline due to concept drift. Our results suggest that social media analysis systems must address concept drift in a continuous fashion in order to avoid the risk of systematic misclassification of data, which is particularly likely during a crisis when the underlying data can change suddenly and rapidly.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源