论文标题

基于肿瘤微环境的形态特征,用于癌症诊断的非凸SVM

Non-convex SVM for cancer diagnosis based on morphologic features of tumor microenvironment

论文作者

Kent, Sean, Yu, Menggang

论文摘要

癌性肿瘤的周围环境会影响人类的生长和发展方式。来自早期乳腺癌患者的新数据包含有关肿瘤组织周围胶原蛋白纤维的信息 - 提供了寻找诊断和预后的其他生物标志物的希望 - 但对典型分析提出了两个挑战。每个图像部分都包含有关数百个纤维的信息,每个组织都有多个图像部分,有助于单一预测肿瘤与非肿瘤。在组织样品中图像点内的纤维之间的这种嵌套关系需要一种专门的分析方法。 我们为此数据结构设计了一种新型的支持向量机(SVM)的预测算法。通过将纤维的收集视为概率分布,我们可以通过灵活的内核方法来衡量集合之间的相似性。通过假设图像部分与组织样本之间的肿瘤状态关系,构造的SVM问题是非凸面,而传统算法则无法应用。我们提出了两种算法,可以交换计算准确性和效率来管理各种大小的数据。两种算法的预测性能均在胶原纤维数据集和其他模拟方案上评估。我们在R软件包中提供了这种方法的两种算法的可再现实现。

The surroundings of a cancerous tumor impact how it grows and develops in humans. New data from early breast cancer patients contains information on the collagen fibers surrounding the tumorous tissue -- offering hope of finding additional biomarkers for diagnosis and prognosis -- but poses two challenges for typical analysis. Each image section contains information on hundreds of fibers, and each tissue has multiple image sections contributing to a single prediction of tumor vs. non-tumor. This nested relationship of fibers within image spots within tissue samples requires a specialized analysis approach. We devise a novel support vector machine (SVM)-based predictive algorithm for this data structure. By treating the collection of fibers as a probability distribution, we can measure similarities between the collections through a flexible kernel approach. By assuming the relationship of tumor status between image sections and tissue samples, the constructed SVM problem is non-convex and traditional algorithms can not be applied. We propose two algorithms that exchange computational accuracy and efficiency to manage data of all sizes. The predictive performance of both algorithms is evaluated on the collagen fiber data set and additional simulation scenarios. We offer reproducible implementations of both algorithms of this approach in the R package mildsvm.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源