论文标题
Huspacy:工业强度的匈牙利自然语言处理工具包
HuSpaCy: an industrial-strength Hungarian natural language processing toolkit
论文作者
论文摘要
尽管有几个开源语言处理管道可用于匈牙利语,但它们都不满足当今NLP应用程序的要求。语言处理管道应包括接近最先进的障碍,形式分析,实体识别和单词嵌入。工业文本处理应用程序必须满足非功能性软件质量要求,此外,支持多种语言的框架越来越受到青睐。本文介绍了Huspacy,这是一种可行业的匈牙利语言处理工具包。提出的工具为最重要的基本语言分析任务提供了组件。它是开源的,可在允许的许可下获得。我们的系统建立在Spacy的NLP组件基础上,从而产生易于使用,快速但准确的应用。实验证实,Huspacy在维持资源有效的预测能力的同时具有很高的精度。
Although there are a couple of open-source language processing pipelines available for Hungarian, none of them satisfies the requirements of today's NLP applications. A language processing pipeline should consist of close to state-of-the-art lemmatization, morphosyntactic analysis, entity recognition and word embeddings. Industrial text processing applications have to satisfy non-functional software quality requirements, what is more, frameworks supporting multiple languages are more and more favored. This paper introduces HuSpaCy, an industry-ready Hungarian language processing toolkit. The presented tool provides components for the most important basic linguistic analysis tasks. It is open-source and is available under a permissive license. Our system is built upon spaCy's NLP components resulting in an easily usable, fast yet accurate application. Experiments confirm that HuSpaCy has high accuracy while maintaining resource-efficient prediction capabilities.