论文标题
机器学习在药物基因组学中的应用:聚类等离子体浓度曲线
Applications of Machine Learning in Pharmacogenomics: Clustering Plasma Concentration-Time Curves
论文作者
论文摘要
药品研究人员正在不断寻找技术以改善药物开发过程和患者结局。最近感兴趣的领域是药理学中机器学习的潜力(ML)应用。尚未进行仔细研究的一种应用是血浆浓度曲线的无监督聚类,以下是药代动力学(PK)曲线。在本文中,我们介绍了如何通过其相似性群集PK曲线的发现。具体而言,我们发现聚类可以有效地识别相似形状的PK曲线,并提供信息,以了解每个PK曲线群中的模式。由于PK曲线是时间序列数据对象,因此我们的方法利用了与时间序列数据聚类有关的广泛研究体作为起点。因此,我们检查了时间序列数据对象之间的许多差异度量,以找到最适合PK曲线的对象。我们将欧几里得距离确定为通常用于聚类PK曲线的最合适的距离,我们进一步表明,动态的时间扭曲,fréchet和基于结构(例如相关性)的基于结构的测量可能会产生意外的结果。作为例证,我们在先前的药物基因组学研究中使用的250个PK曲线的案例研究中应用了这些方法。我们的案例研究发现,没有任何受试者遗传信息的无监督的ML聚类,能够独立验证与参考药物基因组学结果相同的结论。据我们所知,这是第一次这样的演示。此外,该案例研究表明,PK曲线的聚类如何产生洞察力,这些见解可能仅在PK指标的人口水平汇总统计数据中很难感知。
Pharmaceutical researchers are continually searching for techniques to improve both drug development processes and patient outcomes. An area of recent interest is the potential for machine learning (ML) applications within pharmacology. One such application not yet given close study is the unsupervised clustering of plasma concentration-time curves, hereafter, pharmacokinetic (PK) curves. In this paper, we present our findings on how to cluster PK curves by their similarity. Specifically, we find clustering to be effective at identifying similar-shaped PK curves and informative for understanding patterns within each cluster of PK curves. Because PK curves are time series data objects, our approach utilizes the extensive body of research related to the clustering of time series data as a starting point. As such, we examine many dissimilarity measures between time series data objects to find those most suitable for PK curves. We identify Euclidean distance as generally most appropriate for clustering PK curves, and we further show that dynamic time warping, Fréchet, and structure-based measures of dissimilarity like correlation may produce unexpected results. As an illustration, we apply these methods in a case study with 250 PK curves used in a previous pharmacogenomic study. Our case study finds that an unsupervised ML clustering with Euclidean distance, without any subject genetic information, is able to independently validate the same conclusions as the reference pharmacogenomic results. To our knowledge, this is the first such demonstration. Further, the case study demonstrates how the clustering of PK curves may generate insights that could be difficult to perceive solely with population level summary statistics of PK metrics.