论文标题
自然语言聚类现象的两个波动分析的比较:泰勒和埃伯林和内曼方法
A Comparison of Two Fluctuation Analyses for Natural Language Clustering Phenomena: Taylor and Ebeling & Neiman Methods
论文作者
论文摘要
本文考虑了泰勒和埃伯林和内曼的波动分析方法。尽管两者都应用于统计力学领域的各种现象,但尚未阐明它们的相似性和差异。在考虑了他们的分析方面之后,本文介绍了这些方法在文本中的大规模应用。发现两种方法都可以将真实文本与独立和分布的(i.i.d.)序列区分开。此外,发现从单词中获得的泰勒指数可以大致区分文本类别。 Ebeling和Neiman指数也是如此,但程度较小。此外,这两种方法都显示出捕获脚本种类的一些可能性。
This article considers the fluctuation analysis methods of Taylor and Ebeling & Neiman. While both have been applied to various phenomena in the statistical mechanics domain, their similarities and differences have not been clarified. After considering their analytical aspects, this article presents a large-scale application of these methods to text. It is found that both methods can distinguish real text from independently and identically distributed (i.i.d.) sequences. Furthermore, it is found that the Taylor exponents acquired from words can roughly distinguish text categories; this is also the case for Ebeling and Neiman exponents, but to a lesser extent. Additionally, both methods show some possibility of capturing script kinds.