论文标题

现代和历史文本中的角色熵:未确定手稿的比较指标

Character Entropy in Modern and Historical Texts: Comparison Metrics for an Undeciphered Manuscript

论文作者

Lindemann, Luke, Bowern, Claire

论文摘要

本文概述了创建了三个用于多语言比较和分析的Voynich手稿的Corpora:由Currier语言,抄写手和转录系统划分的Voynich文本的语料库,由Wikipedia编写的294个语言样本,以及由Wikipedia汇编的,以及八种语言中的八种划分的历史文本的语料库。耶鲁大学的Voynich工作组将使用这些语料库在随后的工作中使用。 我们证明了这些语料库在研究Voynich脚本和语言特征的实用性,并分析了Voynichese的条件性格熵。我们讨论字符熵与语言,脚本大小和类型,字形组成性,抄写惯例和缩写,位置字符变体和Bigram频率之间的相互作用。 该分析表征了脚本组成性,字符大小和可预测性之间的相互作用。我们表明,对字形组成的大量操纵不足以使条件熵水平与天然语言保持一致。 Voynichese脚本的异常可预测的性质并非归因于特定的脚本或转录系统,基础语言或替代密码。 Voynichese与我们语料库中的每个比较文本不同,因为字符的位置在单词中受到了高度限制,这可能表明语音与基本语言的差异丧失。

This paper outlines the creation of three corpora for multilingual comparison and analysis of the Voynich manuscript: a corpus of Voynich texts partitioned by Currier language, scribal hand, and transcription system, a corpus of 294 language samples compiled from Wikipedia, and a corpus of eighteen transcribed historical texts in eight languages. These corpora will be utilized in subsequent work by the Voynich Working Group at Yale University. We demonstrate the utility of these corpora for studying characteristics of the Voynich script and language, with an analysis of conditional character entropy in Voynichese. We discuss the interaction between character entropy and language, script size and type, glyph compositionality, scribal conventions and abbreviations, positional character variants, and bigram frequency. This analysis characterizes the interaction between script compositionality, character size, and predictability. We show that substantial manipulations of glyph composition are not sufficient to align conditional entropy levels with natural languages. The unusually predictable nature of the Voynichese script is not attributable to a particular script or transcription system, underlying language, or substitution cipher. Voynichese is distinct from every comparison text in our corpora because character placement is highly constrained within the word, and this may indicate the loss of phonemic distinctions from the underlying language.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源