论文标题
用于FDA药物标签中自动ADME语义标签的微调BERT,以增强产品特定的指导评估
Fine-Tuning BERT for Automatic ADME Semantic Labeling in FDA Drug Labeling to Enhance Product-Specific Guidance Assessment
论文作者
论文摘要
美国食品药品监督管理局(FDA)推荐的产品特定指南(PSG)有助于促进和指导通用药物产品开发。为了评估PSG,FDA评估者需要花费大量时间和精力来从参考列出的药物标签中手动检索吸收,分布,代谢和排泄(ADME)的支持性药物信息。在这项工作中,我们利用最先进的预训练的语言模型自动将来自FDA批准的药物标签的药代动力学部分中的ADME段落标记,以促进PSG评估。我们通过微调从变形金刚(BERT)模型的预训练的双向编码器表示,采用了转移学习方法来开发ADME语义标签的新应用,该应用可以自动从药物标记中自动检索ADME段落而不是手动工作。我们证明,对预训练的BERT模型进行微调可以胜过传统的机器学习技术,可实现高达11.6%的绝对F1改进。据我们所知,我们是第一个成功应用BERT来解决Adme语义标签任务的人。我们进一步评估了使用一系列分析方法,例如注意力相似性和基于层的消融,进一步评估了培训和微调对BERT模型整体性能的相对贡献。我们的分析表明,通过微调学到的信息集中在BERT的顶层中的特定于任务知识上,而预先训练的BERT模型的好处来自底层。
Product-specific guidances (PSGs) recommended by the United States Food and Drug Administration (FDA) are instrumental to promote and guide generic drug product development. To assess a PSG, the FDA assessor needs to take extensive time and effort to manually retrieve supportive drug information of absorption, distribution, metabolism, and excretion (ADME) from the reference listed drug labeling. In this work, we leveraged the state-of-the-art pre-trained language models to automatically label the ADME paragraphs in the pharmacokinetics section from the FDA-approved drug labeling to facilitate PSG assessment. We applied a transfer learning approach by fine-tuning the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model to develop a novel application of ADME semantic labeling, which can automatically retrieve ADME paragraphs from drug labeling instead of manual work. We demonstrated that fine-tuning the pre-trained BERT model can outperform the conventional machine learning techniques, achieving up to 11.6% absolute F1 improvement. To our knowledge, we were the first to successfully apply BERT to solve the ADME semantic labeling task. We further assessed the relative contribution of pre-training and fine-tuning to the overall performance of the BERT model in the ADME semantic labeling task using a series of analysis methods such as attention similarity and layer-based ablations. Our analysis revealed that the information learned via fine-tuning is focused on task-specific knowledge in the top layers of the BERT, whereas the benefit from the pre-trained BERT model is from the bottom layers.