Time series cluster kernels to exploit informative missingness and incomplete label information,Pattern Recognition

当前位置： X-MOL 学术 › Pattern Recogn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Time series cluster kernels to exploit informative missingness and incomplete label information
Pattern Recognition ( IF 8 ) Pub Date : 2021-02-20 , DOI: 10.1016/j.patcog.2021.107896
Karl Øyvind Mikalsen , Cristina Soguero-Ruiz , Filippo Maria Bianchi , Arthur Revhaug , Robert Jenssen

The time series cluster kernel (TCK) provides a powerful tool for analysing multivariate time series subject to missing data. TCK is designed using an ensemble learning approach in which Bayesian mixture models form the base models. Because of the Bayesian approach, TCK can naturally deal with missing values without resorting to imputation and the ensemble strategy ensures robustness to hyperparameters, making it particularly well suited for unsupervised learning.

However, TCK assumes missing at random and that the underlying missingness mechanism is ignorable, i.e. uninformative, an assumption that does not hold in many real-world applications, such as e.g. medicine. To overcome this limitation, we present a kernel capable of exploiting the potentially rich information in the missing values and patterns, as well as the information from the observed data. In our approach, we create a representation of the missing pattern, which is incorporated into mixed mode mixture models in such a way that the information provided by the missing patterns is effectively exploited. Moreover, we also propose a semi-supervised kernel, capable of taking advantage of incomplete label information to learn more accurate similarities.

Experiments on benchmark data, as well as a real-world case study of patients described by longitudinal electronic health record data who potentially suffer from hospital-acquired infections, demonstrate the effectiveness of the proposed methods.

中文翻译：

时间序列集群内核可利用信息缺失和不完整的标签信息

时间序列聚类内核（TCK）提供了一个功能强大的工具，可以分析缺少数据的多元时间序列。TCK是使用集成学习方法设计的，其中贝叶斯混合模型构成了基础模型。由于采用了贝叶斯方法，TCK可以自然地处理缺失值，而无需进行插补，并且集成策略可确保对超参数的鲁棒性，使其特别适合无监督学习。

但是，TCK假定随机丢失，并且潜在的丢失机制是可忽略的，即无信息，这种假设在许多实际应用（例如医学）中不成立。为了克服这个限制，我们提出了一个内核，该内核能够利用缺失的值和模式中潜在的丰富信息以及来自观察到的数据的信息。在我们的方法中，我们创建缺失模式的表示形式，将其合并到混合模式混合模型中，以有效利用缺失模式提供的信息。此外，我们还提出了一种半监督内核，该内核能够利用不完整的标签信息来学习更准确的相似性。

在基准数据上进行的实验以及由纵向电子健康记录数据描述的可能遭受医院获得性感染的患者的真实案例研究证明了所提出方法的有效性。

更新日期：2021-03-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南