论文标题
通过选区测试无监督解析
Unsupervised Parsing via Constituency Tests
论文作者
论文摘要
我们提出了一种基于选区检验的语言概念的无监督解析方法。一种类型的选区测试涉及通过某种转换(例如用代词代替跨度),然后判断结果(例如检查它是语法)。在这个想法的激励下,我们通过指定一组转换并使用无监督的神经可接受性模型来制定语法决策,从而设计了一个无监督的解析器。要产生一个给定句子的树,我们通过汇总其选区测试判断来得分,并选择总分最高的二进制树。尽管这种方法已经在当前方法的范围内达到了性能,但我们通过通过改进程序微调语法模型进一步提高了准确性,在改进估计的树和改善语法模型之间,我们可以在其中交替。精致的模型在宾夕法尼亚州立树库测试集上实现了62.8 F1,比以前的最佳发布结果的绝对改善了7.6点。
We propose a method for unsupervised parsing based on the linguistic notion of a constituency test. One type of constituency test involves modifying the sentence via some transformation (e.g. replacing the span with a pronoun) and then judging the result (e.g. checking if it is grammatical). Motivated by this idea, we design an unsupervised parser by specifying a set of transformations and using an unsupervised neural acceptability model to make grammaticality decisions. To produce a tree given a sentence, we score each span by aggregating its constituency test judgments, and we choose the binary tree with the highest total score. While this approach already achieves performance in the range of current methods, we further improve accuracy by fine-tuning the grammaticality model through a refinement procedure, where we alternate between improving the estimated trees and improving the grammaticality model. The refined model achieves 62.8 F1 on the Penn Treebank test set, an absolute improvement of 7.6 points over the previous best published result.