论文标题

在规范,非规范和非文本中全球结构的比较计算分析

Comparative Computational Analysis of Global Structure in Canonical, Non-Canonical and Non-Literary Texts

论文作者

Mohseni, Mahdi, Gast, Volker, Redies, Christoph

论文摘要

这项研究研究了文学和非文本的全球性质。在文学文本中,在规范和非规范作品之间进行了区别。该研究的中心假设是,三种类型(非文本,文学/规范和文学/非典型)在结构设计特征方面表现出系统的差异,这是读者中美学响应的相关性。为了调查这些差异,我们编制了一个包含三类感兴趣的文本的语料库,即Jena Textual Aesthetics语料库。研究了全球结构的两个方面,即变异性和自相似(分形)模式,这些模式反映了沿文本的长距离相关性。我们使用四种类型的基本观测值:(i)每句话的pos-tag的频率,(ii)句子长度,(iii)文本块中的词汇多样性,以及(iv)文本块中主题概率的分布。这些基本观察结果分为另外两个类别,(a)在句子级别(反映语言解码)上观察到的低级特性(i)和(ii),以及(b)高级属性(III)和(iv),在文本级别(反映理解)。基本观察结果转化为时间序列,并且这些时间序列受多型降解波动分析(MFDFA)的约束。我们的结果表明,对于所分析的三种文本类型,文本的低级特性比高级属性更好。规范的文学文本与非典型的文本不同,主要在可变性方面。分形似乎是文本的普遍特征,在非文学文本中比文学文本更为明显。除了研究的具体结果外,我们打算对文本美学实验研究的新观点开放。

This study investigates global properties of literary and non-literary texts. Within the literary texts, a distinction is made between canonical and non-canonical works. The central hypothesis of the study is that the three text types (non-literary, literary/canonical and literary/non-canonical) exhibit systematic differences with respect to structural design features as correlates of aesthetic responses in readers. To investigate these differences, we compiled a corpus containing texts of the three categories of interest, the Jena Textual Aesthetics Corpus. Two aspects of global structure are investigated, variability and self-similar (fractal) patterns, which reflect long-range correlations along texts. We use four types of basic observations, (i) the frequency of POS-tags per sentence, (ii) sentence length, (iii) lexical diversity in chunks of text, and (iv) the distribution of topic probabilities in chunks of texts. These basic observations are grouped into two more general categories, (a) the low-level properties (i) and (ii), which are observed at the level of the sentence (reflecting linguistic decoding), and (b) the high-level properties (iii) and (iv), which are observed at the textual level (reflecting comprehension). The basic observations are transformed into time series, and these time series are subject to multifractal detrended fluctuation analysis (MFDFA). Our results show that low-level properties of texts are better discriminators than high-level properties, for the three text types under analysis. Canonical literary texts differ from non-canonical ones primarily in terms of variability. Fractality seems to be a universal feature of text, more pronounced in non-literary than in literary texts. Beyond the specific results of the study, we intend to open up new perspectives on the experimental study of textual aesthetics.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源