Identifying subpopulations of septic patients: A temporal data-driven approach

https://doi.org/10.1016/j.compbiomed.2020.104182Get rights and content

Highlights

  • Temporal vital signs data are informative in stratifying septic patients.

  • Cluster analysis can be used to identify subpopulations in Sepsis-3 population.

  • Identifying subpopulations of septic patients may inform customizing care.

Abstract

Sepsis is one of the deadliest diseases in North America and in spite of the vast amount of research on this topic there is still uncertainty in the outcome of sepsis treatments. This study aimed at investigating the informativeness of temporal electronic health records (EHR) in stratifying septic patients and identifying subpopulations of septic patients with similar trajectories and clinical needs. We performed hierarchical clustering and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) analyses using data from septic patients in the MIMIC III intensive care unit database. The t-Distributed Stochastic Neighbor Embedding (t-SNE) method was utilized to map patients to a two-dimensional space. We utilized silhouette index and cluster-wise stability assessment by resampling to investigate the validity of the clusters. The hierarchical clustering with Euclidean metric identified twelve clinically recognizable subgroups that demonstrated different characteristics in spite of sharing common conditions. Our results demonstrated that data-driven approaches can help in customizing care platforms for septic patients by identifying similar clinically relevant groups.

Introduction

In intensive care units (ICUs), it is important to monitor patient health status over time. This necessity results in multiple measurements of a particular clinical variable across a given patients stay. Due to the time-varying nature of these measurements, they are usually referred to as temporal/longitudinal electronic health records (EHRs). Since temporal EHR data have valuable information about the evolution of patient health status, researchers have been able to improve predictive modeling [[1], [2], [3]] and patient stratification [4] through using these additional data. However, analyzing longitudinal data is accompanied with multiple challenges such as irregular sampling rates and varying lengths of available measurements, as well as the inherent correlation of repeated measurements of a variable over time for the same patient.

The goal of this research is two-fold. First, through an exploratory analysis, we investigated the informativeness of the temporal vital signs data in our septic patient stratification via functional data analysis [5]. Second, we added other clinical information about the patients to the temporal vital signs data and performed cluster analysis to identify subpopulations of septic patients with similar clinical needs and trajectories in our data. An exploratory analysis of the included variables was used to interpret the identified clusters and derive insights. This information may be used to design customized care platforms for septic patients who share similar needs.

Section snippets

Longitudinal data analysis in ICU patients

ICUs have a heterogeneous population with different health status dynamics but similar needs for constant care [6,7]. The heterogeneity in ICUs adds to the importance of finding similar patients and detecting the underlying phenotypic groups. Recently, efforts have been made to employ the temporal information of heterogeneous EHR data to reveal subpopulations.

A large and growing body of literature has investigated vital sign trajectories to discover patient subpopulations in order to identify

Study sample

This study utilized a subset of data from patients admitted to the ICUs of the Beth Israel Deaconess Medical Center between 2008 and 2012 (MIMIC III database [23]) provided in [18] and data extraction was done using the code provided by the authors [24]. The data from 2001 to 2007 were excluded to focus on the population of MIMIC which have antibiotic prescription measured. Because the MIMIC database is de-identified and public, patient consent and research ethics approval were waived. From

FPC score extraction

Applying PACE on vital signs with fraction-of-variance-explained threshold set to 98% resulted in three eigenfunctions for HR and RR, four for MBP, SysBP and SpO2 and five for Temp. The scores for these eigenfunctions were extracted for each patient to be used as features for the clustering phase.

Step 1: investigating the informativeness of temporal data over cross-sectional data

This section presents the results for hierarchical clustering with Euclidean metric (HE), hierarchical clustering with cosine metric (HC), DBSCAN with Euclidean metric (DE) and DBSCAN with cosine

Discussion

This study focused on leveraging temporal vital sign data along with other clinical characteristics in clustering septic patients. Summarizing originally longitudinal vital sign measurements by their average eliminates the temporal dynamics of patient health status changes which can be helpful in phenotyping septic patients. Our results from the exploratory analysis in Step 1 (Section 4.2.1) implied that including information about vital signs trends results in the emergence of subpopulations

Conclusions

In this paper, we deployed hierarchical clustering and DBSCAN to discover phenotypes in Sepsis-3 patients while including temporal aspect of vital signs. We used t-SNE to construct a two-dimensional mapping of patients based on patient similarities derived by Euclidean and cosine metrics. In the first step, we demonstrated that including temporal characteristics of vital signs independent of clustering method and patient similarity metric helps in stratifying septic patients. In the second

Declaration of competing interestCOI

None declared.

Acknowledgement

This study was supported by Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grants (RGPIN-2014-04743, RGPIN-2020-04382) and an Early Researcher Award from the Ontario Ministry of Research, Innovation and Science (ER15-11-188).

References (49)

  • Critical care statistics. URL...
  • M. Prin et al.

    International comparisons of intensive care: informing outcomes and improving standards

    Curr. Opin. Crit. Care

    (2012)
  • M. Pimentel et al.

    Modelling patient time-series data from electronic health records using Gaussian processes

  • L.w.H. Lehman et al.

    Uncovering clinical significance of vital sign dynamics in critical care

  • L.W. Lehman et al.

    A physiological time series dynamics-based approach to patient monitoring and outcome prediction

    IEEE J Biomed Health Inform

    (2015)
  • L.w.H. Lehman et al.

    Hemodynamic monitoring using switching autoregressive dynamics of multivariate vital sign time series

  • V. Agarwal et al.

    Learning attributes of disease progression from trajectories of sparse lab values

  • M. Singer et al.

    The third international consensus definitions for sepsis and septic shock (sepsis-3)

    J. Am. Med. Assoc.

    (2016)
  • M. Wu et al.

    Understanding vasopressor intervention and weaning: risk prediction in a public heterogeneous clinical time series database

    J. Am. Med. Inf. Assoc.

    (2017)
  • A.S. Fialho et al.

    Disease-based modeling to predict fluid response in intensive care units

    Methods Inf. Med.

    (2013)
  • R.L. Kravitz et al.

    Evidence-based medicine, heterogeneity of treatment effects, and the trouble with averages

    Milbank Q.

    (2004)
  • A.E. Fohner et al.

    Assessing clinical heterogeneity in sepsis through treatment patterns and machine learning

    J. Am. Med. Inf. Assoc.

    (2019)
  • A.E.W. Johnson et al.

    A comparative analysis of sepsis identification methods in an electronic database

    Crit. Care Med.

    (2018)
  • F. Khoshnevisan et al.

    Recent temporal pattern mining for septic shock early prediction

  • Cited by (0)

    View full text