联合变压器/RNN架构用于指示语言的手势键入

论文标题

联合变压器/RNN架构用于指示语言的手势键入

Joint Transformer/RNN Architecture for Gesture Typing in Indic Languages

论文作者

Biju, Emil, Sriram, Anirudh, Khapra, Mitesh M., Kumar, Pratyush

论文摘要

手势键入是一种通过创建通过相关键的连续跟踪在基于触摸的键盘上键入单词的方法。这项工作旨在开发一个键盘，以支持指示语言的手势打字。我们首先要注意，在处理指示语言时，需要迎合两组不同的用户：（i）喜欢在本机Indio脚本（Devanagari，Bengali等）中输入的用户和（ii）喜欢输入英语脚本但希望输出输出转换为本机脚本的用户。在这两种情况下，我们都需要一个将跟踪作为输入并将其映射到预期词的模型。为了启用这些模型的开发，我们创建并发布了两个数据集。首先，我们创建一个数据集，其中包含7种指示语言中的193,658个单词的键盘跟踪。其次，我们策划了104,412英语 - 印度音译对，来自Wikidata跨这些语言。使用这些数据集，我们构建了一个执行路径解码，音译和音译校正的模型。与先前的方法不同，我们提出的模型在解码过程中并未成为共同特征的独立性假设。我们模型在7种语言中的总体准确性从70-95％不等。

Gesture typing is a method of typing words on a touch-based keyboard by creating a continuous trace passing through the relevant keys. This work is aimed at developing a keyboard that supports gesture typing in Indic languages. We begin by noting that when dealing with Indic languages, one needs to cater to two different sets of users: (i) users who prefer to type in the native Indic script (Devanagari, Bengali, etc.) and (ii) users who prefer to type in the English script but want the output transliterated into the native script. In both cases, we need a model that takes a trace as input and maps it to the intended word. To enable the development of these models, we create and release two datasets. First, we create a dataset containing keyboard traces for 193,658 words from 7 Indic languages. Second, we curate 104,412 English-Indic transliteration pairs from Wikidata across these languages. Using these datasets we build a model that performs path decoding, transliteration, and transliteration correction. Unlike prior approaches, our proposed model does not make co-character independence assumptions during decoding. The overall accuracy of our model across the 7 languages varies from 70-95%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题