Summary table
What was already know on the topic
- •
MDR and MRSA bacteria carriage or infection need adapted care.
- •
MOCA-I is efficient for machine learning on medical data (imbalance, uncertainty, volumetry: high
Multi-drug resistant (MDR) bacteria, including methicillin-resistant Staphylococcus aureus (MRSA), are a major health concern because MDR infections are very difficult to treat and can have significant medical impact, potentially leading to a fatal outcome if not treated appropriately. It is thus crucial to limit the diffusion of MDR bacteria. In hospital, this means identifying patients who are carriers or infected with MDR bacteria so precautionary measures can be instituted [1] from admission.
In hospital, the infection control (IC) team receives information about MDR status from many sources and is responsible for ensuring that colonized or infected patients receive adapted care. For instance, the hospital bacteriology laboratory may alert the IC team whenever a test sample is positive for MDR bacteria. This enables the team to identify large numbers of MDR patients but misses others because such tests are not always ordered. Patients who had a positive test before admission might also be missed. Care units complete initial screening for MDR and inform the IC team using alert systems that recall available data on current and prior history of MDR colonization or infection.
Different expert groups have proposed specific screening rules for hospital patients, e.g. the alert system described for MRSA by Evans et al. [2]. Data mining techniques can also be used to generate alert rules automatically. There has been a large volume of work dedicated to medical data mining [3], including identification of patients at risk of contagious infection [4] or risks factors for MDR or MRSA harbouring [[5], [6], [7], [8]]. In our case, we used MOCA-I (Multi-Objective Classification Algorithm for Imbalanced data), a rule-based classification algorithm adapted to specificities of medical data [9]. First, MOCA-I is able to process binary or qualitative data (ordered or not) with more than 15,000 variables, when the previously presented approaches deal with a small number of variables (n≤50) [4,[6], [7], [8]]. This eliminates the need for data filtering and the risk of setting aside useful information. Secondly, an interesting feature of MOCA-I is its capacity to manage highly imbalanced data sets. According to the MDR 2014 report from RAISIN (French Nosocomial Infection Warning and Surveillance Investigation Network), the incidence density of MRSA is 0.27 per 1000 hospitalization days. This explains why many classical data mining algorithms fail [10]. Finally, MOCA-I is a white box classification algorithm, in opposition to state-of-the-art machine learning techniques such as Neural Networks and Random Forest. This is consistent with November 2018 CCNE (French National Consultative Ethics Committee)’ recommendations about AI and health, suggesting to use AI approaches that the care team can criticize or challenge [11]. However new approaches started to emerge recently [12] that allows to explain the decision given by black-box models, that could be very useful in the future.
The approach proposed in the present work is also novel in that it assigns a risk score to each patient. This score can then be used to adapt the number of patient files to investigate as a function of available resources and the probability of detecting MDR carriage or infection.
The main purpose of the present work is to apply MOCA-I to a large-volume real-life data set in order to assess its capacity to identify patients at risk of testing positive to MDR. A secondary objective is to determine the medical pertinence of the alert rules and the ranking generated by the system. Two use cases are envisaged for the rules obtained. Retrospectively to identify coding errors or missed patients. Prospectively to create a questionnaire with relevant questions to ask incoming patients.
The data used for this retrospective study was obtained from the annual activity records of the 750-bed Lille Catholic Hospitals (St-Philibert and St-Vincent-de-Paul hospitals, Lille - France) in 2013, which represents 48,945 hospital stays, all units combined. During this period, the IC (Infection Control) team identified 340 stays concerning patients who were tested positive for MDR before or during their stay, including 128 for MRSA. Our preliminary work focused on MRSA, which has long been
Fig. 2 present respectively the recall and the precision as a function of the risk score for test sets, for both MDR and MRSA. Recall (respectively Precision) and the number of patients screened are plotted as a function the risk score. For a given score, patients above the cut-off level are considered at risk for MDR or MRSA.
These results demonstrate that with a cut-off of 0.85, MOCA-I recalls in average 62 % of patients who have a positive test for MDR and 58 % for MRSA. Moreover, it identifies 38 other patients suspected of MDR or MRSA (21) not known to the IC team. With a high cut-off (0.9) the screened patients are relevant since 71 % (mean precision on test in Table 3) of the identified MDR patients had a positive test as 88 % of the identified MRSA patients. Setting the cut-off score at 0.85 yields a new set
MOCA-I is a classification algorithm capable of detecting hospital patients at risk of testing positive for MDR or MRSA bacteria. Applied to the annual data issuing from 48,945 hospital stays in our institution, MOCA-I identified a majority of known carriers or infected plus 39 supplementary patients for MDR 27 for MRSA. The screening rules generated by the system are medically pertinent. MOCA-I can be used at hospital admission to screen for additional patients at high risk of having a
CD, LJ, JJ, JT: evaluation protocol design
JJ, JT: data collection, processing
CD, LJ, JJ, JT: system elaboration and calibration
HMH, VL: review of patient files, medical expertise, qualitative analysis of generated rules
JJ: first draft, system implementation and evaluation, results collection and analysis
All authors: rereading, approval of final manuscript
This is a data-reuse study. During their stay, patients are informed that they give their implied consent for the re-use of their data for research and educational purposes. Patients may refuse this implied consent, in this case the concerned patients' files were removed from this study. Summary table What was already know on the topic MDR and MRSA bacteria carriage or infection need adapted care. MOCA-I is efficient for machine learning on medical data (imbalance, uncertainty, volumetry: high
Julien Taillard and David Delerue are employed by Alicante, which is the company that publishes MOCA-I. Julie Jacques is a former employee of Alicante.
Sabrina Meniaoui and Amélie Vasseur for reviewing the patient files
Justine Lemtiri-Florek for her medical expertise for the first versions of the system
Laurene Norberciak for advice on the evaluation protocol
This work was supported by internal funds from Lille Catholic hospitals, Lille Catholic University (GHICL - Groupement des Hôpitaux de l'Institut Catholique de Lille).
Part of this work was conducted within the framework of the CLINMINE ANR-13-TECS-0009 French project.