四合一：一种逆文本归一化，标点符号，大写和自动语音识别的反应的联合方法

论文标题

四合一：一种逆文本归一化，标点符号，大写和自动语音识别的反应的联合方法

Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition

论文作者

Tan, Sharman, Behre, Piyush, Kibre, Nick, Alphonso, Issac, Chang, Shuangyu

论文摘要

实体的标点符号，资本化和格式化等功能对于可读性，理解和自然语言处理任务很重要。但是，自动语音识别（ASR）系统产生的口语形式文本没有格式，并且一次仅一个或两个功能的格式化地址的标记方法。在本文中，我们通过两个阶段的过程统一了对编写的文本转换：首先，我们使用单个变压器标记模型共同生产代币级别的标签，以进行逆文本归一化（ITN），标点符号，资本化，资本化和缺陷。然后，我们应用标签生成书面形式文本，并使用加权有限状态传感器（WFST）语法进行格式化标记为ITN实体跨度。尽管将四个型号加入了一个模型，但我们统一的标记方法匹配或在几个域的基准测试集上的所有四个任务上都胜过特定于任务的模型。

Features such as punctuation, capitalization, and formatting of entities are important for readability, understanding, and natural language processing tasks. However, Automatic Speech Recognition (ASR) systems produce spoken-form text devoid of formatting, and tagging approaches to formatting address just one or two features at a time. In this paper, we unify spoken-to-written text conversion via a two-stage process: First, we use a single transformer tagging model to jointly produce token-level tags for inverse text normalization (ITN), punctuation, capitalization, and disfluencies. Then, we apply the tags to generate written-form text and use weighted finite state transducer (WFST) grammars to format tagged ITN entity spans. Despite joining four models into one, our unified tagging approach matches or outperforms task-specific models across all four tasks on benchmark test sets across several domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题