自动睡眠阶段分类，在混合体环境中具有深层残留网络

论文标题

自动睡眠阶段分类，在混合体环境中具有深层残留网络

Automatic sleep stage classification with deep residual networks in a mixed-cohort setting

论文作者

Olesen, Alexander Neergaard, Jennum, Poul, Mignot, Emmanuel, Sorensen, Helge B D

论文摘要

研究目标：睡眠阶段评分是由睡眠专家手动执行的，并且容易主观解释具有较低的内部和间可靠性的评分规则。许多自动系统依靠几个小规模数据库来开发模型，因此对新数据集的推广性尚不清楚。我们研究了一个新型的深神经网络，以评估几个大规模队列的普遍性。方法：使用来自五个不同队列的15684多个多摄影研究研究开发了深层神经网络模型。我们应用了四种不同的方案：1）模型中不同时间尺度的影响； 2）相对于单个队列上其他队列的性能，单个队列在其他较小，较大或相等的队列上的性能； 3）与使用单一原始数据相比，混合核心训练数据的比例不同； 4）比较对2、3和4个队列数据组合训练的模型。结果：随着培训数据的增加（0.25 $ \％$：0.782 $ \ pm $ 0.097，95 $ \％$ \％$ \％$ \％$ \％$ \％$ \％$ \％$：0.869 $ \ pm $ 0.064，95 $ 0.064，95 $ \％$ ci [0.864-0.864-0.864-0.864-0.864-0.8864-0.864-0.864-872 (2: 0.788 $\pm$ 0.102, 95$\%$ CI [0.787-0.790]; 3: 0.808 $\pm$ 0.092, 95$\%$ CI [0.807-0.810]; 4: 0.821 $\pm$ 0.085, 95$\%$ CI [0.819-0.823]).不同的队列显示对其他人群的概括程度不同。结论：基于深度学习算法的自动睡眠阶段评分系统应从可用的多个来源中考虑尽可能多的数据，以确保适当的概括。应该为将来的研究提供用于基准测试的公共数据集。

Study Objectives: Sleep stage scoring is performed manually by sleep experts and is prone to subjective interpretation of scoring rules with low intra- and interscorer reliability. Many automatic systems rely on few small-scale databases for developing models, and generalizability to new datasets is thus unknown. We investigated a novel deep neural network to assess the generalizability of several large-scale cohorts. Methods: A deep neural network model was developed using 15684 polysomnography studies from five different cohorts. We applied four different scenarios: 1) impact of varying time-scales in the model; 2) performance of a single cohort on other cohorts of smaller, greater or equal size relative to the performance of other cohorts on a single cohort; 3) varying the fraction of mixed-cohort training data compared to using single-origin data; and 4) comparing models trained on combinations of data from 2, 3, and 4 cohorts. Results: Overall classification accuracy improved with increasing fractions of training data (0.25$\%$: 0.782 $\pm$ 0.097, 95$\%$ CI [0.777-0.787]; 100$\%$: 0.869 $\pm$ 0.064, 95$\%$ CI [0.864-0.872]), and with increasing number of data sources (2: 0.788 $\pm$ 0.102, 95$\%$ CI [0.787-0.790]; 3: 0.808 $\pm$ 0.092, 95$\%$ CI [0.807-0.810]; 4: 0.821 $\pm$ 0.085, 95$\%$ CI [0.819-0.823]). Different cohorts show varying levels of generalization to other cohorts. Conclusions: Automatic sleep stage scoring systems based on deep learning algorithms should consider as much data as possible from as many sources available to ensure proper generalization. Public datasets for benchmarking should be made available for future research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题