论文标题
机器学习歧视帕金森氏病阶段从沃克安装的传感器数据数据
Machine learning discrimination of Parkinson's Disease stages from walker-mounted sensors data
论文作者
论文摘要
评估帕金森氏病(PD)中步态的临床方法主要是定性的。定量方法需要昂贵的仪器或繁琐的可穿戴设备,这限制了它们的可用性。这些方法中只有很少能区分PD进程中的不同阶段。这项研究应用机器学习方法来区分PD的六个阶段。在运动障碍诊所的实验中,低成本沃克的传感器获得了数据,并在临床上标记了PD阶段。提取了一系列特征,提取了本研究独特的一些功能,并使用多级随机森林(RF)分类器比较了三种特征选择方法。通过方差分析(ANOVA)方法选择的特征子集提供了类似于完整功能集的性能:精度为93%,计算时间明显较短。与PCA相比,此方法还启用了所选功能的临床解释性,这是医疗应用的重要属性。所有选定的功能集都由信息理论特征和统计特征主导,并提供了对PD中步态恶化特征的见解。结果表明,机器学习的可行性可以准确地从低成本,沃克安装的传感器获得的运动学信号中对PD严重性阶段进行分类,这意味着有可能帮助医生对PD进展的定量评估。该研究提出了针对小而嘈杂的数据问题的解决方案,这在大多数基于传感器的医疗保健评估中很常见。
Clinical methods that assess gait in Parkinson's Disease (PD) are mostly qualitative. Quantitative methods necessitate costly instrumentation or cumbersome wearable devices, which limits their usability. Only few of these methods can discriminate different stages in PD progression. This study applies machine learning methods to discriminate six stages of PD. The data was acquired by low cost walker-mounted sensors in an experiment at a movement disorders clinic and the PD stages were clinically labeled. A large set of features, some unique to this study are extracted and three feature selection methods are compared using a multi-class Random Forest (RF) classifier. The feature subset selected by the Analysis of Variance (ANOVA) method provided performance similar to the full feature set: 93% accuracy and had significantly shorter computation time. Compared to PCA, this method also enabled clinical interpretability of the selected features, an essential attribute to healthcare applications. All selected-feature sets are dominated by information theoretic features and statistical features and offer insights into the characteristics of gait deterioration in PD. The results indicate a feasibility of machine learning to accurately classify PD severity stages from kinematic signals acquired by low-cost, walker-mounted sensors and implies a potential to aid medical practitioners in the quantitative assessment of PD progression. The study presents a solution to the small and noisy data problem, which is common in most sensor-based healthcare assessments.