论文标题
从生物学时间序列数据中发现随机动力学方程
Discovering stochastic dynamical equations from biological time series data
论文作者
论文摘要
理论研究表明,随机性可以以违反直觉的方式影响生态系统的动态。但是,在不知道管理人群或生态系统动态的方程式的情况下,很难确定随机性在实际数据集中的作用。因此,从数据集中推断管理随机方程的逆问题很重要。在这里,我们提出了一种方程发现方法,该方法将状态变量的时间序列数据作为输入,并输出随机微分方程。我们通过将随机微积分的传统方法与方程 - 发现技术结合在一起来实现这一目标。我们通过多个应用程序演示了该方法的通用性。首先,我们故意选择具有根本不同的管理方程式的各种随机模型。但是它们产生了几乎相同的稳态分布。我们表明,我们可以从单独的时间序列数据分析中精确地恢复正确的基础方程,从而准确地推断出其稳定性的结构。我们在两个现实世界数据集(鱼类学校和单细胞迁移)上演示了我们的方法 - 它们具有较大的时空尺度和动态。我们说明了该方法的各种局限性和潜在的陷阱,以及如何通过诊断措施克服它们。最后,我们通过名为Pydaddy(用于数据驱动动力学的Python库)的软件包提供了我们的开源代码。
Theoretical studies have shown that stochasticity can affect the dynamics of ecosystems in counter-intuitive ways. However, without knowing the equations governing the dynamics of populations or ecosystems, it is difficult to ascertain the role of stochasticity in real datasets. Therefore, the inverse problem of inferring the governing stochastic equations from datasets is important. Here, we present an equation discovery methodology that takes time series data of state variables as input and outputs a stochastic differential equation. We achieve this by combining traditional approaches from stochastic calculus with the equation-discovery techniques. We demonstrate the generality of the method via several applications. First, we deliberately choose various stochastic models with fundamentally different governing equations; yet they produce nearly identical steady-state distributions. We show that we can recover the correct underlying equations, and thus infer the structure of their stability, accurately from the analysis of time series data alone. We demonstrate our method on two real-world datasets -- fish schooling and single-cell migration -- which have vastly different spatiotemporal scales and dynamics. We illustrate various limitations and potential pitfalls of the method and how to overcome them via diagnostic measures. Finally, we provide our open-source codes via a package named PyDaDDy (Python library for Data Driven Dynamics).