多尺度的主要因素选择具有结构依赖性和异质性的复杂系统数据

论文标题

多尺度的主要因素选择具有结构依赖性和异质性的复杂系统数据

Multiscale major factor selections for complex system data with structural dependency and heterogeneity

论文作者

Fushing, Hsieh, Chou, Elizabeth, Chen, Ting-Li

论文摘要

基于从大型复杂系统得出的结构化数据，我们通过适应许多功能之间的结构依赖性和异质性来进一步开发和完善主要因素选择方案，以揭示数据的信息内容。两个操作概念：``de-sysociating''及其对应方``遮蔽''在我们的协议中扮演关键角色，通过应变表平台进行了理性，解释和执行。该协议通过``de-sysociating''功能将通过识别哪些协变功能集的功能或不提供第一个确定的主要因素以加入主要因素作为中学成员的主要因素来体现数据的信息内容。我们的计算发展始于全球表征复杂系统，通过多个响应（RE）特征与许多协变量（CO）特征之间的结构依赖性。我们首先将主要因素选择方案应用于行为风险因素监视系统（BRFSS）数据集，以证明对心脏病患者成为多数或进一步降低少数族裔的地方发现的发现，从而彻底降低了巨大的对比数据的不平衡性质。然后，我们研究了一个由3个赛季的12个投手组成的美国职棒大联盟（MLB）数据集，揭示了有关投球动态的详细多尺度信息内容，并为多个季节的多类分类（MCC）问题提供了几乎完美的分辨率，以及在多个季节中任何单个投手的特殊变化的困难任务。我们通过假设直观的猜想来结束结论：与推论主题相关的大型复杂系统只能通过发现数据的多尺度信息内容来有效地解决，从而反映了系统的真实结构依赖性和异质性。

Based on structured data derived from large complex systems, we computationally further develop and refine a major factor selection protocol by accommodating structural dependency and heterogeneity among many features to unravel data's information content. Two operational concepts: ``de-associating'' and its counterpart ``shadowing'' that play key roles in our protocol, are reasoned, explained, and carried out via contingency table platforms. This protocol via ``de-associating'' capability would manifest data's information content by identifying which covariate feature-sets do or don't provide information beyond the first identified major factors to join the collection of major factors as secondary members. Our computational developments begin with globally characterizing a complex system by structural dependency between multiple response (Re) features and many covariate (Co) features. We first apply our major factor selection protocol on a Behavioral Risk Factor Surveillance System (BRFSS) data set to demonstrate discoveries of localities where heart-diseased patients become either majorities or further reduced minorities that sharply contrast data's imbalance nature. We then study a Major League Baseball (MLB) data set consisting of 12 pitchers across 3 seasons, reveal detailed multiscale information content regarding pitching dynamics, and provide nearly perfect resolutions to the Multiclass Classification (MCC) problem and the difficult task of detecting idiosyncratic changes of any individual pitcher across multiple seasons. We conclude by postulating an intuitive conjecture that large complex systems related to inferential topics can only be efficiently resolved through discoveries of data's multiscale information content reflecting the system's authentic structural dependency and heterogeneity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题