论文标题
多个多元时间序列的无限隐藏马尔可夫模型,没有数据
Infinite Hidden Markov Models for Multiple Multivariate Time Series with Missing Data
论文作者
论文摘要
暴露于空气污染与发病率和死亡率的增加有关。最近的技术进步允许收集时间分辨的个人暴露数据。这种数据通常是不完整的,缺少观察结果和暴露于检测极限,这限制了它们在健康效应研究中的使用。在本文中,我们为多个异步多元时间序列序列开发了无限的隐藏马尔可夫模型,而数据丢失了。我们的模型旨在包括可以告知隐藏状态之间过渡的协变量。我们实施光束采样,切片采样和动态编程的组合,以对隐藏状态进行采样,以及一种贝叶斯多重插入算法来估算丢失的数据。在仿真研究中,我们的模型在估计隐藏状态和特定状态的手段和归纳观察结果方面表现出色,这些观察值是随机或低于检测极限。我们从柯林斯堡通勤研究中验证了我们对数据的插补方法。我们表明,与现有方法相比,估计的隐藏状态改善了随机丢失的数据的归精。在柯林斯堡通勤研究的案例研究中,我们描述了从我们的模型中获得的推论收益,包括改进的丢失数据的插补以及能够在个人和不同个体中重复采样日内识别活动和暴露的共享模式和暴露。
Exposure to air pollution is associated with increased morbidity and mortality. Recent technological advancements permit the collection of time-resolved personal exposure data. Such data are often incomplete with missing observations and exposures below the limit of detection, which limit their use in health effects studies. In this paper we develop an infinite hidden Markov model for multiple asynchronous multivariate time series with missing data. Our model is designed to include covariates that can inform transitions among hidden states. We implement beam sampling, a combination of slice sampling and dynamic programming, to sample the hidden states, and a Bayesian multiple imputation algorithm to impute missing data. In simulation studies, our model excels in estimating hidden states and state-specific means and imputing observations that are missing at random or below the limit of detection. We validate our imputation approach on data from the Fort Collins Commuter Study. We show that the estimated hidden states improve imputations for data that are missing at random compared to existing approaches. In a case study of the Fort Collins Commuter Study, we describe the inferential gains obtained from our model including improved imputation of missing data and the ability to identify shared patterns in activity and exposure among repeated sampling days for individuals and among distinct individuals.