论文标题
跨越低资源的非洲语言的引导NLP工具:概述和前景
Bootstrapping NLP tools across low-resourced African languages: an overview and prospects
论文作者
论文摘要
计算和互联网访问是南部非洲的大幅增长,这带来了对非洲土著语言的当地内容和工具的需求,这使它增加了。由于这些语言中的大多数是低资源的,因此从另一种非洲语言中引起了一种自举工具的概念。本文概述了尼日尔 - 哥哥B(“班图”)语言的这些努力。已证明对地理上遥远语言的引导语法仍然对形态和规则或基于语法的自然语言产生具有积极的结果。通过数据驱动的NLP任务方法进行引导很难有意义地使用,无论地理位置如何,这主要是由于拼字法和词汇量引起的词汇多样性。比较语言学中的包装方法可能会为引导策略提供信息,并且相似性措施也可以作为引导潜力的代理,这两个都可以进行进一步研究。
Computing and Internet access are substantially growing markets in Southern Africa, which brings with it increasing demands for local content and tools in indigenous African languages. Since most of those languages are low-resourced, efforts have gone into the notion of bootstrapping tools for one African language from another. This paper provides an overview of these efforts for Niger-Congo B (`Bantu') languages. Bootstrapping grammars for geographically distant languages has been shown to still have positive outcomes for morphology and rules or grammar-based natural language generation. Bootstrapping with data-driven approaches to NLP tasks is difficult to use meaningfully regardless geographic proximity, which is largely due to lexical diversity due to both orthography and vocabulary. Cladistic approaches in comparative linguistics may inform bootstrapping strategies and similarity measures might serve as proxy for bootstrapping potential as well, with both fertile ground for further research.