跨模式临床图变压器，用于眼科报告生成

论文标题

跨模式临床图变压器，用于眼科报告生成

Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation

论文作者

Li, Mingjie, Cai, Wenjia, Verspoor, Karin, Pan, Shirui, Liang, Xiaodan, Chang, Xiaojun

论文摘要

使用数据驱动的神经网络自动生成眼科报告在临床实践中具有很大的潜力。在撰写报告时，眼科医生通过先前的临床知识进行推断。在先前的医学报告生成方法中忽略了这些知识。为了赋予能力合并专家知识的能力，我们提出了用于眼科报告生成（ORG）的跨模式临床图变压器（CGT），其中将临床关系三元注入了视觉特征作为先验知识以驱动解码程序的先验知识。但是，两个主要的常识噪声（KN）问题可能会影响模型的有效性。 1）现有的一般生物医学知识库（例如UMLS）可能并不有意义地与报告的特定上下文和语言一致，从而限制了其知识注入的效用。 2）结合过多的知识可能会将视觉特征从正确的含义中转移出来。为了克服这些限制，我们根据自然语言处理设计了一种自动信息提取方案，以直接从内域培训报告中获得临床实体和关系。给定一组眼科图像，我们的CGT首先从临床图中恢复了一个子图，并将恢复的三元组注入了视觉特征。然后在编码过程中采用可见矩阵，以限制知识的影响。最后，通过变压器解码器编码的跨模式特征预测了报告。大规模FFA-IR基准测试的广泛实验表明，所提出的CGT能够超越先前的基准方法并实现最先进的性能。

Automatic generation of ophthalmic reports using data-driven neural networks has great potential in clinical practice. When writing a report, ophthalmologists make inferences with prior clinical knowledge. This knowledge has been neglected in prior medical report generation methods. To endow models with the capability of incorporating expert knowledge, we propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG), in which clinical relation triples are injected into the visual features as prior knowledge to drive the decoding procedure. However, two major common Knowledge Noise (KN) issues may affect models' effectiveness. 1) Existing general biomedical knowledge bases such as the UMLS may not align meaningfully to the specific context and language of the report, limiting their utility for knowledge injection. 2) Incorporating too much knowledge may divert the visual features from their correct meaning. To overcome these limitations, we design an automatic information extraction scheme based on natural language processing to obtain clinical entities and relations directly from in-domain training reports. Given a set of ophthalmic images, our CGT first restores a sub-graph from the clinical graph and injects the restored triples into visual features. Then visible matrix is employed during the encoding procedure to limit the impact of knowledge. Finally, reports are predicted by the encoded cross-modal features via a Transformer decoder. Extensive experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods and achieve state-of-the-art performances.

下载PDF全文

下载文献需遵守相关版权规定

论文标题