大规模多模式变压器的医学诊断：利用多种数据以进行更准确的诊断

论文标题

大规模多模式变压器的医学诊断：利用多种数据以进行更准确的诊断

Medical Diagnosis with Large Scale Multimodal Transformers: Leveraging Diverse Data for More Accurate Diagnosis

论文作者

Khader, Firas, Mueller-Franzes, Gustav, Wang, Tianci, Han, Tianyu, Arasteh, Soroosh Tayebi, Haarburger, Christoph, Stegmaier, Johannes, Bressem, Keno, Kuhl, Christiane, Nebelung, Sven, Kather, Jakob Nikolas, Truhn, Daniel

论文摘要

多模式深度学习已用于预测临床常规数据中的临床终点和诊断。但是，这些模型遇到了缩放问题：他们必须学习每种数据类型中每条信息之间的成对相互作用，从而使模型复杂性升级超出了可管理的尺度。到目前为止，这排除了多模式深度学习的广泛使用。在这里，我们提出了一种“可学习协同作用”的新技术方法，其中模型仅选择数据模式之间的相关交互，并保留相关数据的“内部记忆”。我们的方法很容易扩展，并且自然适应了临床常规的多模式数据输入。我们在来自放射学和眼科的三个大型多模式数据集上证明了这种方法，并表明它在临床上相关的诊断任务中优于最先进的模型。我们的新方法是可以转移的，将允许将多模式深度学习应用于一系列临床相关问题。

Multimodal deep learning has been used to predict clinical endpoints and diagnoses from clinical routine data. However, these models suffer from scaling issues: they have to learn pairwise interactions between each piece of information in each data type, thereby escalating model complexity beyond manageable scales. This has so far precluded a widespread use of multimodal deep learning. Here, we present a new technical approach of "learnable synergies", in which the model only selects relevant interactions between data modalities and keeps an "internal memory" of relevant data. Our approach is easily scalable and naturally adapts to multimodal data inputs from clinical routine. We demonstrate this approach on three large multimodal datasets from radiology and ophthalmology and show that it outperforms state-of-the-art models in clinically relevant diagnosis tasks. Our new approach is transferable and will allow the application of multimodal deep learning to a broad set of clinically relevant problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题