论文标题
通过拓扑数据分析的Zika Space-Times扩展的合奏预测
Ensemble Forecasting of the Zika Space-TimeSpread with Topological Data Analysis
论文作者
论文摘要
根据世界卫生组织的记录,2015年5月在巴西发生了第一个正式报道的寨卡病毒发病率。随后疾病迅速传播到美洲和东亚的其他国家,影响了100万人。寨卡病毒主要是通过埃德斯(Aedes aedes aegypti and adyes bolopotus)的叮咬感染的蚊子传播的。在具有高降水量,高温和较高种群密度的地区,寨卡病毒感染的丰度以及寨卡病毒感染的普遍性很常见。这种数据的nonlinelear时空依赖性和缺乏历史公共卫生记录的缺乏依赖性,使该病毒的差异使病毒蔓延预测,特别是具有挑战性。在本文中,我们通过介绍拓扑数据分析的概念,特别是大气变量的持续同源性来增强Zika的预测,并将其介绍到病毒扩散建模中。拓扑摘要允许在大气变量之间捕获高阶依赖性,否则,这些基于通过欧几里得距离评估的地理接近度的常规时空建模方法可能无法理解。我们介绍了累积贝蒂数字的新概念,然后将累积的Betti数字作为拓扑描述源整合到三个预测机器学习模型中:随机森林,广义增强的回归和深层神经网络。此外,为了更好地量化各种不确定性来源,我们将所得的单个模型预测结合到使用贝叶斯模型平均的Zika扩散预测的合奏中。提出的方法在2018年应用于在巴西的Zika时空扩散的应用中进行了说明。
As per the records of theWorld Health Organization, the first formally reported incidence of Zika virus occurred in Brazil in May 2015. The disease then rapidly spread to other countries in Americas and East Asia, affecting more than 1,000,000 people. Zika virus is primarily transmitted through bites of infected mosquitoes of the species Aedes (Aedes aegypti and Aedes albopictus). The abundance of mosquitoes and, as a result, the prevalence of Zika virus infections are common in areas which have high precipitation, high temperature, and high population density.Nonlinear spatio-temporal dependency of such data and lack of historical public health records make prediction of the virus spread particularly challenging. In this article, we enhance Zika forecasting by introducing the concepts of topological data analysis and, specifically, persistent homology of atmospheric variables, into the virus spread modeling. The topological summaries allow for capturing higher order dependencies among atmospheric variables that otherwise might be unassessable via conventional spatio-temporal modeling approaches based on geographical proximity assessed via Euclidean distance. We introduce a new concept of cumulative Betti numbers and then integrate the cumulative Betti numbers as topological descriptors into three predictive machine learning models: random forest, generalized boosted regression, and deep neural network. Furthermore, to better quantify for various sources of uncertainties, we combine the resulting individual model forecasts into an ensemble of the Zika spread predictions using Bayesian model averaging. The proposed methodology is illustrated in application to forecasting of the Zika space-time spread in Brazil in the year 2018.