当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Lag penalized weighted correlation for time series clustering.
BMC Bioinformatics ( IF 3 ) Pub Date : 2020-07-17 , DOI: 10.1186/s12859-019-3324-1
Thevaa Chandereng 1, 2, 3 , Anthony Gitter 1, 2
Affiliation  

The similarity or distance measure used for clustering can generate intuitive and interpretable clusters when it is tailored to the unique characteristics of the data. In time series datasets generated with high-throughput biological assays, measurements such as gene expression levels or protein phosphorylation intensities are collected sequentially over time, and the similarity score should capture this special temporal structure. We propose a clustering similarity measure called Lag Penalized Weighted Correlation (LPWC) to group pairs of time series that exhibit closely-related behaviors over time, even if the timing is not perfectly synchronized. LPWC aligns time series profiles to identify common temporal patterns. It down-weights aligned profiles based on the length of the temporal lags that are introduced. We demonstrate the advantages of LPWC versus existing time series and general clustering algorithms. In a simulated dataset based on the biologically-motivated impulse model, LPWC is the only method to recover the true clusters for almost all simulated genes. LPWC also identifies clusters with distinct temporal patterns in our yeast osmotic stress response and axolotl limb regeneration case studies. LPWC achieves both of its time series clustering goals. It groups time series with correlated changes over time, even if those patterns occur earlier or later in some of the time series. In addition, it refrains from introducing large shifts in time when searching for temporal patterns by applying a lag penalty. The LPWC R package is available at https://github.com/gitter-lab/LPWC and CRAN under a MIT license.

中文翻译:

时间序列聚类的滞后惩罚加权相关性。

当用于聚类的相似性或距离度量根据数据的独特特征进行定制时,可以生成直观且可解释的聚类。在通过高通量生物测定生成的时间序列数据集中,随着时间的推移连续收集基因表达水平或蛋白质磷酸化强度等测量值,并且相似性得分应该捕获这种特殊的时间结构。我们提出了一种称为滞后惩罚加权相关性(LPWC)的聚类相似性度量,用于对随着时间的推移表现出密切相关行为的时间序列对进行分组,即使时间不完全同步。LPWC 对齐时间序列剖面以识别常见的时间模式。它根据引入的时间滞后的长度来降低对齐轮廓的权重。我们展示了 LPWC 与现有时间序列和通用聚类算法相比的优势。在基于生物驱动脉冲模型的模拟数据集中,LPWC 是恢复几乎所有模拟基因的真实簇的唯一方法。LPWC 还在我们的酵母渗透应激反应和蝾螈肢体再生案例研究中识别出具有不同时间模式的簇。LPWC 实现了其两个时间序列聚类目标。它将时间序列与随时间相关的变化进行分组,即使这些模式在某些时间序列中出现得更早或更晚。此外,它在通过应用滞后惩罚来搜索时间模式时避免引入较大的时间偏移。LPWC R 包可在 https://github.com/gitter-lab/LPWC 和 CRAN 上获得,并获得 MIT 许可。
更新日期:2020-07-17
down
wechat
bug