The detection of hospitalized patients at risk of testing positive to multi-drug resistant bacteria using MOCA-I, a rule-based “white-box” classification algorithm for medical data

https://doi.org/10.1016/j.ijmedinf.2020.104242Get rights and content

Abstract

Background

Multi-drug resistant (MDR) bacteria are a major health concern. In this retrospective study, a rule-based classification algorithm, MOCA-I (Multi-Objective Classification Algorithm for Imbalanced data) is used to identify hospitalized patients at risk of testing positive for multidrug-resistant (MDR) bacteria, including Methicillin-resistant Staphylococcus aureus (MRSA), before or during their stay.

Methods

Applied to a data set of 48,945 hospital stays (including known cases of carriage) with up to 16,325 attributes per stay, MOCA-I generated alert rules for risk of carriage or infection. A risk score was then computed from each stay according to the triggered rules.Recall and precision curves were plotted.

Results

The classification can be focused on specifically detecting high risk of having a positive test, or identifying large numbers of at-risk patients by modulating the risk score cut-off level. For a risk score above 0.85,recall (sensitivity) is 62 % with 69 % precision (confidence) for MDR bacteria, recall is 58 % with 88 % precision for MRSA. In addition, MOCA-I identifies 38 and 21 cases of previously unknown MDR and MRSA respectively.

Conclusions

MOCA-I generates medically pertinent alert rules. This classification algorithm can be used to detect patients with high risk of testing positive to MDR bacteria (including MRSA). Classification can be modulated by appropriately setting the risk score cut-off level to favor specific detection of small numbers of patients at very high risk or identification of large numbers of patients at risk. MOCA-I can thus contribute to more adapted treatments and preventive measures from admission, depending on the clinical setting or management strategy.

Introduction

Multi-drug resistant (MDR) bacteria, including methicillin-resistant Staphylococcus aureus (MRSA), are a major health concern because MDR infections are very difficult to treat and can have significant medical impact, potentially leading to a fatal outcome if not treated appropriately. It is thus crucial to limit the diffusion of MDR bacteria. In hospital, this means identifying patients who are carriers or infected with MDR bacteria so precautionary measures can be instituted [1] from admission.

In hospital, the infection control (IC) team receives information about MDR status from many sources and is responsible for ensuring that colonized or infected patients receive adapted care. For instance, the hospital bacteriology laboratory may alert the IC team whenever a test sample is positive for MDR bacteria. This enables the team to identify large numbers of MDR patients but misses others because such tests are not always ordered. Patients who had a positive test before admission might also be missed. Care units complete initial screening for MDR and inform the IC team using alert systems that recall available data on current and prior history of MDR colonization or infection.

Different expert groups have proposed specific screening rules for hospital patients, e.g. the alert system described for MRSA by Evans et al. [2]. Data mining techniques can also be used to generate alert rules automatically. There has been a large volume of work dedicated to medical data mining [3], including identification of patients at risk of contagious infection [4] or risks factors for MDR or MRSA harbouring [[5], [6], [7], [8]]. In our case, we used MOCA-I (Multi-Objective Classification Algorithm for Imbalanced data), a rule-based classification algorithm adapted to specificities of medical data [9]. First, MOCA-I is able to process binary or qualitative data (ordered or not) with more than 15,000 variables, when the previously presented approaches deal with a small number of variables (n≤50) [4,[6], [7], [8]]. This eliminates the need for data filtering and the risk of setting aside useful information. Secondly, an interesting feature of MOCA-I is its capacity to manage highly imbalanced data sets. According to the MDR 2014 report from RAISIN (French Nosocomial Infection Warning and Surveillance Investigation Network), the incidence density of MRSA is 0.27 per 1000 hospitalization days. This explains why many classical data mining algorithms fail [10]. Finally, MOCA-I is a white box classification algorithm, in opposition to state-of-the-art machine learning techniques such as Neural Networks and Random Forest. This is consistent with November 2018 CCNE (French National Consultative Ethics Committee)’ recommendations about AI and health, suggesting to use AI approaches that the care team can criticize or challenge [11]. However new approaches started to emerge recently [12] that allows to explain the decision given by black-box models, that could be very useful in the future.

The approach proposed in the present work is also novel in that it assigns a risk score to each patient. This score can then be used to adapt the number of patient files to investigate as a function of available resources and the probability of detecting MDR carriage or infection.

The main purpose of the present work is to apply MOCA-I to a large-volume real-life data set in order to assess its capacity to identify patients at risk of testing positive to MDR. A secondary objective is to determine the medical pertinence of the alert rules and the ranking generated by the system. Two use cases are envisaged for the rules obtained. Retrospectively to identify coding errors or missed patients. Prospectively to create a questionnaire with relevant questions to ask incoming patients.

Section snippets

Data set elaboration

The data used for this retrospective study was obtained from the annual activity records of the 750-bed Lille Catholic Hospitals (St-Philibert and St-Vincent-de-Paul hospitals, Lille - France) in 2013, which represents 48,945 hospital stays, all units combined. During this period, the IC (Infection Control) team identified 340 stays concerning patients who were tested positive for MDR before or during their stay, including 128 for MRSA. Our preliminary work focused on MRSA, which has long been

Results

Fig. 2 present respectively the recall and the precision as a function of the risk score for test sets, for both MDR and MRSA. Recall (respectively Precision) and the number of patients screened are plotted as a function the risk score. For a given score, patients above the cut-off level are considered at risk for MDR or MRSA.

Discussion

These results demonstrate that with a cut-off of 0.85, MOCA-I recalls in average 62 % of patients who have a positive test for MDR and 58 % for MRSA. Moreover, it identifies 38 other patients suspected of MDR or MRSA (21) not known to the IC team. With a high cut-off (0.9) the screened patients are relevant since 71 % (mean precision on test in Table 3) of the identified MDR patients had a positive test as 88 % of the identified MRSA patients. Setting the cut-off score at 0.85 yields a new set

Conclusions

MOCA-I is a classification algorithm capable of detecting hospital patients at risk of testing positive for MDR or MRSA bacteria. Applied to the annual data issuing from 48,945 hospital stays in our institution, MOCA-I identified a majority of known carriers or infected plus 39 supplementary patients for MDR 27 for MRSA. The screening rules generated by the system are medically pertinent. MOCA-I can be used at hospital admission to screen for additional patients at high risk of having a

Authors' contributions

CD, LJ, JJ, JT: evaluation protocol design

JJ, JT: data collection, processing

CD, LJ, JJ, JT: system elaboration and calibration

HMH, VL: review of patient files, medical expertise, qualitative analysis of generated rules

JJ: first draft, system implementation and evaluation, results collection and analysis

All authors: rereading, approval of final manuscript

Patient consent

This is a data-reuse study. During their stay, patients are informed that they give their implied consent for the re-use of their data for research and educational purposes. Patients may refuse this implied consent, in this case the concerned patients' files were removed from this study.

Summary table

What was already know on the topic

  • MDR and MRSA bacteria carriage or infection need adapted care.

  • MOCA-I is efficient for machine learning on medical data (imbalance, uncertainty, volumetry: high

Declaration of Competing Interest

Julien Taillard and David Delerue are employed by Alicante, which is the company that publishes MOCA-I. Julie Jacques is a former employee of Alicante.

Acknowledgements

Sabrina Meniaoui and Amélie Vasseur for reviewing the patient files

Justine Lemtiri-Florek for her medical expertise for the first versions of the system

Laurene Norberciak for advice on the evaluation protocol

Funding

This work was supported by internal funds from Lille Catholic hospitals, Lille Catholic University (GHICL - Groupement des Hôpitaux de l'Institut Catholique de Lille).

Part of this work was conducted within the framework of the CLINMINE ANR-13-TECS-0009 French project.

References (24)

  • S. Harbarth et al.

    Evaluating the probability of previously unknown carriage of MRSA at hospital admission

    Am. J. Med.

    (2006)
  • C. Couderc et al.

    Fluoroquinolone use is a risk factor for methicillin-resistant Staphylococcus aureus acquisition in long-term care facilities: a nested case-case-control study

    Clin. Infect. Dis.

    (2014)
  • Cited by (0)

    View full text