FormalPara Key Summary Points

An automated algorithm was developed and validated for identification of potential DILI cases in a real-time, real-world PV database.

The algorithm was designed and optimized to maximize inclusion of potential DILI cases.

The algorithm demonstrated a sensitivity of 97.8% and a specificity of 79.3%.

Compared to manual case review, application of the automated algorithm resulted in an estimated time saving of 42.2%.

Introduction

Drug-induced liver injury (DILI) is a potentially fatal adverse drug reaction that is the most frequent cause of acute liver failure in North America and Europe [1,2,3]. Concern for DILI is one of the main barriers to marketing authorization, one of the most frequent causes of post-marketing restrictions, and one of the major reasons for marketing withdrawal [4,5,6,7,8]. Despite significant downstream morbidity and mortality, DILI eludes early upstream detection because associated signs and symptoms are nonspecific and pathognomonic biomarkers remain unelucidated [3, 9,10,11]. As there is no standardized method to predict DILI, creation of a systematic approach for early detection is currently the best strategy for prevention and potential intervention [1, 12].

While several criteria have been proposed to detect potential DILI cases, there is significant variability in how these criteria can be applied in stopping rules for clinical trial settings, risk–benefit assessments in post-marketing settings, reporting to health authorities, and to physicians and patients [1, 12,13,14]. Further, since most criteria involve a combination of laboratory values and patient symptoms, manual review may be needed in addition to automated filtering by laboratory criteria [15, 16]. A recent meta-analysis of algorithms created to identify potential DILI cases found a low range of detection (1.0–40.2%) that varied with threshold criteria, case definitions, diagnostic codes, and study drugs [17]. Many of these algorithms were applied to retrospective data, and there remains a significant need to develop more efficient automated methods to consider cases prospectively as part of active, ongoing pharmacovigilance surveillance.

Given the lack of standardized methods to routinely monitor for DILI, we developed an automated algorithm to facilitate detection of potential DILI cases. The algorithm does not confirm diagnosis of DILI. The objective of this study was to evaluate the application of our potential DILI detection algorithm in activities related to routine pharmacovigilance. The logic of the algorithm can be further applied to identify other criteria-based pathologies.

Methods

Datasets and Case Selection

The datasets used in this study compiled post-marketing individual case safety reports (ICSRs) from a real-time, global PV database between 19 March 2017 and 18 June 2018. Thirteen of 15 datasets corresponding to monthlong reporting periods were included in the final analysis as algorithm assessment was not performed for two non-consecutive monthlong reporting periods.

To identify potential DILI cases among reported hepatic adverse events, dataset inclusion criteria consisted of (1) initial or follow-up ICSR reporting for a specific potential hepatotoxic agent during the respective monthlong period and (2) at least one lower level MedDRA (Medical Dictionary for Regulatory Activities) term contained within the following five hepatic Standardized MedDRA Queries (SMQs) [18]:

  1. 1.

    Cholestasis and jaundice of hepatic origin

  2. 2.

    Hepatic failure, fibrosis and cirrhosis and other liver damage-related conditions

  3. 3.

    Hepatitis, non-infectious

  4. 4.

    Liver-related investigations, signs, and symptoms

  5. 5.

    Liver-related coagulation and bleeding disturbances

As the global PV database is maintained and updated in real time, the same ICSR could appear in successive datasets as additional follow-up information was received and case details and assessment evolved. While the study period spanned MedDRA versions 20.0, 20.1, and 21.0, SMQ parameters were maintained throughout versioning by the MedDRA Maintenance and Support Services Organization [18].

Case Assessment

In practice, pharmacovigilance activities related to potential hepatotoxic agents include identification of potential DILI cases that will then be further evaluated by a hepatic adjudication committee, consisting of an expert panel of hepatologists. The case reviews and analyses performed in this study occurred prior to submission to a hepatic adjudication committee, reflecting the process of deciding which cases describe potential DILI and merit hepatic adjudication. For the purposes of this study cases identified as potential DILI cases through manual review were considered to be “positive” and were sent for additional assessment by a hepatic adjudication committee, whereas cases that were not identified as potential DILI cases were considered to be “negative” and not sent for hepatic adjudication. Assessments performed in this study do not reflect the judgment of a hepatic adjudication committee, nor do they confirm diagnosis of DILI.

Conservatively expanding laboratory ranges from the US Food and Drug Administration (FDA) guidance for DILI evaluation to be inclusive, increases of serum alanine transaminase (ALT), serum aspartate transaminase (AST), serum total bilirubin (TB), or international normalized ratio of prothrombin time (INR) falling into any of the following ranges were considered sufficient to identify a potential DILI case [12]:

  • ALT or AST ≥ 7.5 × ULN*

  • ALT or AST ≥ 5.0 × ULN for more than 2 weeks

  • ALT or AST ≥ 3.0 × ULN and (TB ≥ 2.0 × ULN or INR ≥ 1.5)

  • ALT or AST ≥ 3.0 × ULN with the appearance of fatigue, nausea, vomiting, abdominal pain upper, fever, rash, and/or eosinophilia (> 5.0%)

*× ULN denotes the proportion of how many times the laboratory value is above the upper limit of its normal reference range [19]

The FDA guidance also suggests surveillance of cases that meet the following [12]:

  • ALT or AST ≥ 2.0 × ULN or twofold increases above baseline values for subjects with elevated values before drug exposure

Cases meeting any of the aforementioned criteria were assessed as “positive” for potential DILI.

Manual Case Review

Once each dataset was generated, all cases in the dataset were manually reviewed and analyzed over a 2-week period. The manual reviewer was the same individual for all datasets to maintain consistency in case review, analysis, and assessment throughout the study.

Assessment of each case consisted of applying medical judgment to evaluate laboratory data as well as medical information in the case narrative. ALT, AST, TB, and INR values were evaluated in terms of the proportion of how many times they were elevated above the upper limit of the respective normal value ranges provided. When normal value ranges were not provided, 40 international units per liter (IU/L) was used as the ULN for ALT and AST, 1 mg per deciliter (mg/dL) was used as the ULN for TB, and 1 was used as the ULN for INR. Elevations from ULN as well as elevations from baseline were calculated. Cases that met the selection criteria described in the “Case Assessment” section were assessed as “positive” for potential DILI. Cases that did not meet the selection criteria were assessed as “negative” for potential DILI.

Algorithm Development and Case Review

The categories of laboratory value ranges used above for manual case assessment were used to design and develop an algorithm to identify potential DILI cases. The algorithm was programmed using procedural language for structured query language (PL/SQL) to analyze PV data stored in Oracle System Tables (Oracle Corporation, Redwood City, CA). From user-specified inputs of time period, drug, SMQs, and ICSR source (e.g., clinical trial, post-marketing), the algorithm identified a subset of the global PV database upon which to apply selection criteria. This initial process recreated the same datasets that were assessed manually; however, as the algorithm was applied to a real-time PV database, data were current up to the time of the algorithm run, which occurred after manual review.

Next, the algorithm applied the case selection criteria itemized in the “Case Assessment” section. Using case data entered into laboratory, event, narrative, and date fields, the algorithm simultaneously identified which criteria, if any, were satisfied by the cases. ALT, AST, TB, and INR values were compared to respective ULNs and narratives were scanned for keywords fatigue, nausea, vomiting, abdominal pain upper, fever, rash, and/or eosinophilia.

For assessment, the algorithm designated a case as “positive” for potential DILI if it determined that the case met any one of the selection criteria categories previously described. The algorithm assessed cases as “negative” for potential DILI if the selection criteria were not met. Throughout the course of the study, several practical optimizations and modifications were made to the algorithm in order to improve its accuracy and ability to detect specified selection criteria. These optimizations included improvements of the user-facing output to delineate exact criteria met for potential DILI identification and exact × ULN of laboratory value elevation.

Cases lacking laboratory values for ALT, AST, TB, and INR were unable to be assessed by the algorithm as these laboratory values are part of the selection criteria that the algorithm was programmed to identify. Such cases were assessed only by manual review.

Efficiency Evaluation

To evaluate the standard practice of manual case review, a sample of “positive” and “negative” potential DILI cases were assessed by the manual reviewer and the time required for review and analysis was determined for each assessment. This was used to determine the average amount of time required to manually review “positive” and “negative” potential DILI cases, respectively.

To evaluate the effect on time savings of prescreening the dataset with the algorithm, a sample of “positive” and “negative” potential DILI cases as assessed by the algorithm was then assessed manually, with the manual reviewer being aware of the algorithm outcome beforehand. This was used to determine the average amount of time required to review prescreened “positive” and “negative” potential DILI cases, respectively.

The difference in time required for standard manual review as compared with manual review following prescreening with algorithm review was determined as a measure of time difference. Time difference as a proportion of time required for standard manual review was used to evaluate efficiency as in Eq. (1):

$$ {\text{Efficiency}} = \frac{{{\text{Time}}_{{\text{Manual review}}} - {\text{Time}}_{{\text{Prescreened manual review}}} }}{{{\text{Time}}_{{\text{Manual review}}} }}. $$
(1)

Statistical Analyses

To calculate the sensitivity and specificity of this algorithm case assessment, the manual case assessment was used as the comparator gold standard since it is the only standard method to perform this surveillance pharmacovigilance activity. The positive likelihood ratio (LR+) and negative likelihood ratio (LR−) were then derived from calculated sensitivity and specificity. Calculations for true positive rate (TPR) and false positive rate (FPR) were used to generate a receiver operating characteristic (ROC) curve. The trapezoidal rule was used to calculate the area under the ROC curve (AUROC) [20].

For the population sampled in this study, the prevalence of potential DILI cases was determined from manual case review as the proportion of cases assessed that were positive for potential DILI. Using this as the population prevalence of potential DILI, the positive predictive value (PPV) and negative predictive value (NPV) of the algorithm were calculated per standard practice [21]. Accuracy of algorithm performance was calculated in terms of overall percentage agreement with manual review [22]. All statistical analyses were performed using Microsoft Excel, v.16.0 (Microsoft Corporation, Redmond, WA).

Ethics Compliance

This study is based on previously reported data and does not contain any new studies with human participants or animals performed by any of the authors. Permission to access and analyze deidentified data was granted by Otsuka Pharmaceutical Development & Commercialization, Inc. (Princeton, NJ).

Results

Patient Demographics

A total of 1456 cases were manually reviewed during 13 monthlong periods for detection of potential DILI in patients receiving the same potential hepatotoxic agent. These cases represent 719 unique patients with demographics described in Table 1. Demographic information is displayed by outcome of manual assessment. Of the total cases assessed, 312 of them (21.4%) were identified as potential DILI cases; 165 of these potential DILI cases (52.9%) occurred among female patients, while 462 of 1144 cases negative for potential DILI (40.4%) occurred among female patients. Age distribution was similar for both positive and negative potential DILI cases with mean age 55 years and 52 years, respectively, and median age 52 years and 50 years, respectively. The majority of these cases from the study period were from Japan.

Table 1 Patient demographics of cases assessed manually

Algorithm Performance

The algorithm assessed 476 cases (32.7%) using selection criteria with laboratory values for ALT, AST, TB, and INR. Table 2 shows a comparison between manual case assessments and algorithm case assessments. On the basis of these comparisons, the algorithm was calculated to have a sensitivity of 97.8% and a specificity of 79.3%. Likelihood ratios were calculated as LR+ 4.73 and LR− 0.03.

Table 2 Manual case assessments vs. algorithm case assessments

Of the six case assessments categorized as false negatives, five were due to a conservative lowering of thresholds during manual case assessment, and one was due to a concern for serious liver injury based on adverse event terms reported (Table S1 in the supplementary material). For the 43 case assessments categorized as false positives, 26 were due to serum enzyme elevations that did not persist for 2 weeks, nine were due to patient symptoms that did not correlate with enzyme elevations (lack of temporal association, confounding from non-hepatic adverse event), and eight were due to laboratory values being unavailable at the time of manual review that then subsequently became available at the time of algorithm review (Table S1 in the supplementary material).

Since analyses were performed in monthlong intervals, sensitivities and specificities were also calculated for each month of data reviewed (Table 3). These data were also used to generate an ROC curve with an AUROC of 0.95 (Fig. 1). Given that manual review identified 21.4% of cases as potential DILI, the positive predictive value of the algorithm for this population was calculated as 56.3% and the negative predictive value was calculated as 99.2%. Algorithm accuracy was calculated as 89.7% overall percentage agreement with manual review.

Table 3 Case assessment by month
Fig. 1
figure 1

Receiver operating characteristic curve for algorithm assessment. Receiver operating characteristic (ROC) curve (dashed line) constructed using monthly case assessments from Table 3 with monthly true positive rates and false positive rates plotted as ordered pairs (blue dots). Area under the ROC curve (AUROC) was calculated as 0.95 using the trapezoidal rule

Algorithm Efficiency

Manual review and analysis of cases that were identified as positive potential DILI cases required an average of 18.5 min per case for evaluation, whereas cases that were identified as negative, required an average of 7.5 min per case for evaluation. This difference was attributable to additional time being spent to calculate the extent of hepatic function test elevations, coupled with a narrative review to assess duration and associated symptoms.

The automated algorithm was able to provide assessments for all supplied cases within 1 s. Confirmation of the algorithm’s assessments required an additional 1.5 min on average, for positive potential DILI cases only. Given the distribution of positive and negative cases in each monthlong assessment period and given that the algorithm was applied to cases with relevant laboratory values, Fig. 2 illustrates the potential time savings if the algorithm had been applied to screen all cases prior to manual review.

Fig. 2
figure 2

Estimated time savings with algorithm review prior to manual review. Blue bars represent time required in hours for completion of manual case review and analysis. Orange bars represent estimated time required in hours for completion of case review and analysis following prescreening with algorithm. Labeled values indicate time saved as percentage of total time expended for manual review and analysis

Overall, using the algorithm to screen all 1456 cases prior to manual review would be expected to save 101 h (42.2% of time expended). Time savings per month ranged from 6.3 to 9.5 h with a time savings range of 35.4–47.7% in hours expended.

Discussion

This study evaluated the application of an automated, multifactorial, algorithm in prescreening ICSRs with hepatic adverse events, for identification of potential DILI cases in a real-time, real-world PV database. The algorithm was found to have a sensitivity of 97.8% and a specificity of 79.3% with an AUROC of 0.95. Moreover, given the high prevalence (21.4%) of potential DILI cases in the population studied, the algorithm demonstrated a positive predictive value of 56.3% and a negative predictive value of 99.2%. Application of the algorithm in prescreening datasets for potential DILI cases was estimated to save 42.2% of time expended from manual case review.

Despite the utility of the algorithm in facilitating identification of potential DILI cases, it must be restated that DILI itself remains difficult to predict. The liver functions at the intersection of numerous metabolic pathways and is equally subject to the effects of active metabolites as well as drugs, other agents, and the potential interactions between them [9, 23, 24]. Patient genetics, demographics, comorbidities, behavior, and environment all may play a role in precipitating DILI [9, 23, 25,26,27]. DILI may also occur with delayed onset, adding complexity to assessment of the temporal interplay between all these factors [28]. Thus, predicting DILI from a weighted analysis of risk factors remains difficult to validate consistently, though there has been much progress in the field. The focus of the current study was on the commonly reported pharmacovigilance information from which case assessments need to be made in deciding whether or not hepatic adjudication is required.

For pharmacovigilance purposes, it is essential that no potential DILI case is overlooked [1, 2, 9, 12]. To this end, the algorithm developed for this study was designed to optimize its sensitivity and negative predictive value. This is reflected in the algorithm’s LR− of 0.03 and LR+ of 4.73, where a “negative” potential DILI assessment would greatly decrease post-test probability and a “positive” potential DILI assessment would only moderately increase post-test probability by Bayes’ theorem [20]. However, it must be acknowledged that other algorithms and methods aimed at detecting potential DILI that optimize specificity do so by reducing sensitivity and increasing the risk of false negative, or missed potential DILI cases [17, 20,21,22, 29]. The algorithm was tested for 5 months beyond the study period to confirm accuracy and efficiency before it was incorporated into daily use. During this testing period and thereafter, algorithm assessments were made available during manual review, allowing for direct verification as opposed to independent comparison.

Time savings are a clear benefit of using automated methods and tools to perform pharmacovigilance activities. Though automation may never fully replace the ability to apply medical judgment in assessing case narratives and contextualizing laboratory findings, automation can usefully minimize the time spent performing repetitive comparisons and calculating laboratory values relative to set limits. Efforts to optimize this algorithm’s efficiency revealed areas for improvement in case assessment methodology, as well as in case intake and case processing. The algorithm’s dependence on proper entry of laboratory values identified opportunities to improve internal processes involved in ensuring that this happens.

Limitations of the algorithm included its inability to parse narrative information to correlate with laboratory findings. For example, a patient with ALT or AST elevation ≥ 3ULN reporting “no fever” would have been identified by the algorithm as positive for potential DILI as the existence of the word “fever” in the narrative would have been interpreted as the patient having the symptom. One potential solution would be to incorporate natural language processing capabilities into the algorithm to interpret narrative context [30, 31]. Machine learning processes could provide additional input into making case assessments [32, 33].

Additionally, the algorithm was not always able to determine if the timing of symptoms corresponded to the timing of laboratory elevations. The algorithm also was not always able to determine duration of enzyme elevations. Operationally, one limitation of the preceding analysis was that the algorithm was often applied on a date after the initial period of manual review. The algorithm used case identifiers in the extracted dataset to assess corresponding laboratory values in the global safety database. As the global safety database is maintained in real time, on occasion the algorithm evaluated laboratory values that were not available at the time of manual review, resulting in a number of false positive case assessments. Finally, the datasets derived from ICSRs reporting hepatic adverse events would be expected to have a higher prevalence of potential DILI cases compared to the general population receiving a potential hepatotoxic agent [2, 3, 9]. As the algorithm was designed for analysis of a specific population with hepatotoxic injury and potential hepatotoxic agent use, its application to a generalized population would be difficult to interpret.

Nevertheless, the algorithm had many strengths in addition to its high sensitivity, high negative predictive value, and potential for significant time savings. The algorithm has no dependency on software or MedDRA versioning, which makes it suitable to perform case assessment at any time past, present, or future [18]. The programming is also easily adaptable, allowing for changes in set thresholds as health authority guidance documents are updated.

As an example, the FDA’s guidance for DILI evaluation in clinical trial cases is different from the post-marketing setting of this study, and the corresponding criteria could be easily programmed into the algorithm [12]. Though the algorithm’s language processing was noted as a limitation, it is nevertheless a useful functionality that now has been adapted to identify serious liver injury event terms (e.g., hepatic failure, hepatitis fulminant, liver transplant) since the time of the study, further ensuring that potential DILI cases are not missed.

Finally, the algorithm may be adapted to identify other pathologies with multifactorial laboratory value selection criteria, such as drug reaction with eosinophilia and systemic symptoms (DRESS) syndrome, tumor lysis syndrome, neuroleptic malignant syndrome, and drug-induced renal injury [34,35,36,37].

Conclusion

We successfully developed and implemented a screening algorithm to assist in identifying potential DILI cases in support of routine pharmacovigilance activities. The algorithm demonstrated a high sensitivity, a high negative predictive value, along with significant efficiency and adaptability in a real-time PV database, which will ultimately result in cost savings. Notably, the algorithm’s key features for pharmacovigilance purposes is a focus on sensitivity and negative predictive value, at the expense of specificity and positive predictive value, in our effort to maximize patient safety.