论文标题

E2Eet:从管道到端到端实体通过基于变压器的嵌入

E2EET: From Pipeline to End-to-end Entity Typing via Transformer-Based Embeddings

论文作者

Stewart, Michael, Liu, Wei

论文摘要

实体键入(ET)是识别语料库中每个实体的语义类型的过程。与命名的实体识别相反,句子中的每个令牌都标有零或一个类标签,ET涉及将每个实体提及的一个或多个类标签标记。在提及级别上运行的现有实体键入模型受两个关键因素的限制:它们不利用最近提供的上下文依赖性嵌入,并且在固定上下文窗口上接受了培训。因此,它们对窗口尺寸的选择敏感,无法整合整个文档的上下文。鉴于这些缺点,我们建议使用基于变压器的嵌入为提及级别模型的上下文,并使用BI-GRU进行端到端模型,以删除对窗口大小的依赖性。一项广泛的消融研究证明了情境化嵌入对于提及级别模型的有效性以及我们对实体键入的端到端模型的竞争力。

Entity Typing (ET) is the process of identifying the semantic types of every entity within a corpus. In contrast to Named Entity Recognition, where each token in a sentence is labelled with zero or one class label, ET involves labelling each entity mention with one or more class labels. Existing entity typing models, which operate at the mention level, are limited by two key factors: they do not make use of recently-proposed context-dependent embeddings, and are trained on fixed context windows. They are therefore sensitive to window size selection and are unable to incorporate the context of the entire document. In light of these drawbacks we propose to incorporate context using transformer-based embeddings for a mention-level model, and an end-to-end model using a Bi-GRU to remove the dependency on window size. An extensive ablative study demonstrates the effectiveness of contextualised embeddings for mention-level models and the competitiveness of our end-to-end model for entity typing.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源