应用随机投影算法来优化乳腺病变分类的机器学习模型

论文标题

应用随机投影算法来优化乳腺病变分类的机器学习模型

Applying a random projection algorithm to optimize machine learning model for breast lesion classification

论文作者

Heidari, Morteza, Lakshmivarahan, Sivaramakrishnan, Mirniaharikandehei, Seyedehnafiseh, Danala, Gopichandh, Maryada, Sai Kiran R., Liu, Hong, Zheng, Bin

论文摘要

机器学习广泛用于开发医学图像的计算机辅助诊断（CAD）方案。但是，CAD通常从目标区域计算大量图像特征，这会挑战如何识别小型且最佳的特征向量以构建强大的机器学习模型。在这项研究中，我们研究了应用随机投影算法从最初生成的大型库中构建最佳特征向量并改善机器学习模型的性能。我们组装了一个回顾性数据集，其中涉及1,487例乳房X线照片，其中644例证实了恶性肿块病变和843例具有良性病变。首先将CAD方案应用于细分质量区域，并最初计算181个功能。然后，构建了嵌入了几种特征维度降低方法的支持向量机（SVM）模型，以预测病变的可能性是恶性。所有SVM模型均已使用剩下的一盘交叉验证方法训练和测试。 SVM在单视乳房X线照片上描述的每个分段质量区域的可能性得分。通过在两视乳房X线照片上描绘的两个相同质量分数的融合，还评估了基于病例的似然评分。与原理成分分析，非负矩阵分解和卡方方法相比，与随机投影算法嵌入的SVM相比，在ROC曲线下，在0.84+0.01的ROC曲线下，基于病例的病变分类性能显着较高（p <0.02）。该研究表明，随机项目算法是一种有前途的方法，可以生成最佳特征向量，以帮助提高医疗图像的机器学习模型的性能。

Machine learning is widely used in developing computer-aided diagnosis (CAD) schemes of medical images. However, CAD usually computes large number of image features from the targeted regions, which creates a challenge of how to identify a small and optimal feature vector to build robust machine learning models. In this study, we investigate feasibility of applying a random projection algorithm to build an optimal feature vector from the initially CAD-generated large feature pool and improve performance of machine learning model. We assemble a retrospective dataset involving 1,487 cases of mammograms in which 644 cases have confirmed malignant mass lesions and 843 have benign lesions. A CAD scheme is first applied to segment mass regions and initially compute 181 features. Then, support vector machine (SVM) models embedded with several feature dimensionality reduction methods are built to predict likelihood of lesions being malignant. All SVM models are trained and tested using a leave-one-case-out cross-validation method. SVM generates a likelihood score of each segmented mass region depicting on one-view mammogram. By fusion of two scores of the same mass depicting on two-view mammograms, a case-based likelihood score is also evaluated. Comparing with the principle component analyses, nonnegative matrix factorization, and Chi-squared methods, SVM embedded with the random projection algorithm yielded a significantly higher case-based lesion classification performance with the area under ROC curve of 0.84+0.01 (p<0.02). The study demonstrates that the random project algorithm is a promising method to generate optimal feature vectors to help improve performance of machine learning models of medical images.

下载PDF全文

下载文献需遵守相关版权规定

论文标题