论文标题
用部分观察的treecrfs嵌套命名的实体识别
Nested Named Entity Recognition with Partially-Observed TreeCRFs
论文作者
论文摘要
命名实体识别(NER)是自然语言处理中的一项良好的任务。但是,广泛使用的序列标记框架很难检测具有嵌套结构的实体。在这项工作中,我们将嵌套的NER视为与部分观察的树木进行分析,并用部分观察的Treecrf对其进行建模。具体而言,我们将所有标记的实体跨度视为选区树中观察到的节点,而其他跨度为潜在节点。使用TreeCrf,我们实现了一种统一的方式,可以共同对观察到的节点进行建模。为了计算部分边缘化的部分树木的可能性,我们提出了内部算法的变体,\ textsc {textsc {basked Inside}算法,该算法支持不同的节点的不同推理操作(评估观察到的,对NODES的潜在训练的降级,并有效地实现了无效的均衡,并有效地将其置于有效的范围内。实验表明,我们的方法在ACE2004,ACE2005数据集上实现了最新的F1(SOTA)F1分数,并显示出与Genia DataSet上SOTA模型的可比性能。我们的方法在以下方面实现:\ url {https://github.com/franxyao/partally-observed-treecrfs}。
Named entity recognition (NER) is a well-studied task in natural language processing. However, the widely-used sequence labeling framework is difficult to detect entities with nested structures. In this work, we view nested NER as constituency parsing with partially-observed trees and model it with partially-observed TreeCRFs. Specifically, we view all labeled entity spans as observed nodes in a constituency tree, and other spans as latent nodes. With the TreeCRF we achieve a uniform way to jointly model the observed and the latent nodes. To compute the probability of partial trees with partial marginalization, we propose a variant of the Inside algorithm, the \textsc{Masked Inside} algorithm, that supports different inference operations for different nodes (evaluation for the observed, marginalization for the latent, and rejection for nodes incompatible with the observed) with efficient parallelized implementation, thus significantly speeding up training and inference. Experiments show that our approach achieves the state-of-the-art (SOTA) F1 scores on the ACE2004, ACE2005 dataset, and shows comparable performance to SOTA models on the GENIA dataset. Our approach is implemented at: \url{https://github.com/FranxYao/Partially-Observed-TreeCRFs}.