Introduction

The analysis of tumor-specific biomarkers provides information for appropriate targeted treatment decision-making in non-small-cell lung cancer (NSCLC) and metastatic colorectal cancer (mCRC) [1,2,3]. Predictive biomarker test results should therefore be accurate, reproducible and timely.

Several external quality assessment (EQA) schemes, organized on a national or international level, assessed the performance for common biomarkers in NSCLC and mCRC. They revealed varying error rates depending on the evaluated markers and variants, sample types, or scheme rounds [4,5,6,7,8,9,10,11,12,13].

Longitudinal analyses of the EQA schemes organized by the European Society of Pathology (ESP) revealed that participation to multiple EQA scheme rounds improved participants’ performances [12, 13]. Over time, error rates decreased for ALK and EGFR analysis but increased for ROS1. Also, error rates were higher for immunohistochemistry (IHC) compared to fluorescence in situ hybridization (FISH) on formalin-fixed paraffin embedded (FFPE) samples and especially compared to digital case interpretation [12]. Remarkably, lower error rates have been described for cell lines compared to resections, for higher variant allele frequencies [13], and for laboratories who are accredited, test more samples or perform research [14]. In mCRC, error rates increased significantly for mutation-positive samples and for methods that do not cover all required variants [11].

Medical laboratories are advised to participate in EQA schemes [1, 3] sometimes part of their quality framework conform the International Organization for Standardization (ISO) standard 15189:2012 [15] or national equivalents like CAP 15189 [16]. Laboratories should have a documented procedure to identify and manage non-conformities when pre-determined performance criteria are not met, both for EQA as in routine practice.

The providers of these EQA programs are preferably accredited according to ISO 17043:2010 [17], mimic patient samples as closely as possible, and check the entire examination process [15]. EQA providers could guide laboratories by the provision of feedback, reference material, or methodological advice [18, 19]. Some providers (such as the CAP and UK NEQAS) already request a root cause analysis from poor performers [7, 15], but no data has yet been published. Errors can be systematic (e.g., test method failure) while others can be accidental (e.g., clerical or pipetting errors). The time point of error occurrence in the total test process (TTP) has been reported in clinical chemistry and forensics [20, 21] and were mostly pre- (46–86%) and post-analytical (18–47%) of nature [20]. However, data is still lacking for molecular oncology.

Recently, a step-by-step framework for effective EQA results management was proposed for laboratories and EQA providers [22, 23]. A subsequent evaluation of deviating EQA results in clinical chemistry according to this flowchart revealed that most errors (81%) were the laboratory’s responsibility (internal causes) and were mainly clerical errors (i.e., correct outcome entered incorrectly in the results form) (72%) [22].

This study evaluated the feasibility of requesting root causes of deviating EQA results in the ESP schemes for NSCLC and mCRC between 2014 and 2018. The error causes were compared for the different markers, techniques, and sample types, as well as for different laboratory characteristics.

Material and methods

The ESP schemes were organized according to the requirements for EQA programs in molecular pathology [18] and ISO 17043 [17]. Laboratories could register to several subschemes for different techniques and markers. Sample selection and preparation, validation by the reference laboratories, and distribution to participants were previously described [11, 12]. Laboratories received 14 calendar days to analyze all samples by their routine methodology and return an electronic datasheet on the cases’ outcomes, the applied test methodology, and laboratory characteristics. Reported laboratory settings and accreditation statuses were further validated on the websites of the laboratories and national accreditation bodies, respectively. The correctness of the sample outcomes was assessed by a team of international experts according to predefined scoring criteria [11, 12]. Participants received feedback including a general scheme report, participation certificate, and individual comments.

At the end of the EQA schemes, laboratories with at least one error or analysis failure (i.e., no outcome was available due to a test failure) were invited via e-mail to complete a survey with case-specific questions for every incorrect or failed case. The total number of participants and cases analyzed is summarized in Table 1. The survey was drafted in Microsoft Excel Developer and tailored to the participants’ own results (Supplemental Data 1). This information included the case number and the type of deviation from the validated outcome for every subscheme (false-positive or false-negative results, variant reported at an incorrect position or gene, or over- and underestimations of the tumor proportion score (TPS) for PD-L1). Questions included pre-developed dropdown lists and checkboxes for ease of completion.

Table 1 Number of cases analyzed per subscheme offered in the ESP EQA schemes

Laboratories received additional information on the study set-up and a list of definitions on the applied terminology to harmonize responses for statistical analysis. The returned survey data were thereafter linked to the datasheet entries on laboratory setting and methodology during the EQA scheme, and the participants’ performances. The deadline for response was set at 1 month. Laboratories received a first reminder after 14 days and a second reminder the day before the deadline.

All survey responses from the ESP schemes for NSCLC between 2014 and 2018 and mCRC schemes between 2015 and 2018 were included. Statistics were performed using SAS software (version 9.4 of the SAS System for Windows, SAS Institute Inc., Cary, NC, USA). Statistical models with estimations using generalized estimating equations (GEE) were applied for clustering of identical laboratories participating to different schemes (NSCLC vs. mCRC) and years. Binary outcome variables were analyzed by logistic regression models. Ordinal and categorical outcome variables were analyzed by proportional odds models. Detailed statistics are shown in Supplemental Data 2.

Results

Response to root cause surveys

In the period between December 2015 and February 2019, 791 individual surveys were sent, to 315 unique laboratories from 43 countries. The probability of laboratories to receive the survey at the end of the EQA scheme (because they made an EQA error) and to respond to the survey is presented in Table 2 for the different laboratory characteristics.

Table 2 Probability of survey receipt and response for the different laboratory characteristics

Laboratories accredited conform ISO 15189 were less likely to receive the survey, as well as laboratories testing a larger number of annual samples for ROS1, KRAS, or NRAS, but not for the other markers. On the contrary, laboratories (n = 45) who outsourced a part of their analysis were more probable to receive the survey. Exact p values and corresponding odds ratios (ORs) are shown in Supplemental Data 2A. Of the 45 respondents mentioning that they outsourced a part of the analysis, 15 outsourced the variant analysis itself, 6 outsourced both the DNA extraction and variant analysis, and 24 sent the samples to another laboratory for pathology review. There was no difference in the chance to receive the survey based on the laboratory’s setting (university or community hospital), or number of personnel (Table 2).

Of the 791 surveys that were sent, 325 (39.8%) responses were received by 185 unique laboratories (58.4%) from 34 countries (Table 1). On average, the responses were received within 22.5 days (min. 1, max. 211, median 15 days). 139/325 (42.8%) responses were received within the first 2 weeks (no reminder sent), 116 (35.7%) after the first reminder, and 70 (21.5%) after two reminders. The response time or number of reminders sent was not related to the laboratory characteristics (Supplemental Data 2A).

Accredited laboratories were more likely to return the completed survey compared to not accredited laboratories. Other factors did not influence the likelihood of responding to the received survey (Table 2).

Time point in the total test process and cause of deviating EQA results

Of the 988 NSCLC and 179 mCRC cases with a deviating EQA result between 2015 and 2018, data was obtained for 424 (42.9%) NSCLC and 90 (50.3%) mCRC cases.

For the NSCLC EQA schemes (n = 424), errors occurred mostly in the post-analytical (48.1%) phase (Table 3). For the digital cases, the majority of problems occurred in the post-analytical phase, given that these cases only comprised interpretation of pre-made images. This with the exception of some laboratories who implied a problem during the pre-analytical or analytical phase, when the images were created. For analysis of the FFPE samples, mainly post-analytical errors were observed for FISH and IHC, except for ALK FISH with 44.0% (n = 25) analytical issues. During the IHC technical assessment, the staining quality of the applied protocol was evaluated, which is reflected in a high percentage of analytical issues as contributing factors for problems. For variant analysis, causes were mostly post-analytical for EGFR testing (47.2%, n = 108) but analytical for KRAS (53.8%, n = 13) and BRAF (100.0%, n = 3) testing. In the mCRC EQA schemes, all cases were tested by variant analysis, and results (n = 90) revealed mainly issues during the analytical phase itself (42.2%), but percentages varied depending on the marker of analysis.

Table 3 Time point of deviating EQA results in the different subschemes

Analyzing the underlying causes (Table 4), both interpretation of the digital cases and IHC of the FFPE samples were prone to interpretation errors. For FISH analysis of FFPE cases, problems with the provided EQA material were most often reported. During the technical assessment, problems with the reagents were detected for ALK IHC, versus methodological problems for ROS1 IHC. For PD-L1, reasons of suboptimal staining quality were dispersed. For variant analysis in NSCLC, methodological issues were the main sources of errors, while for variant analysis in mCRC, the underlying causes varied also depending on the analyzed marker.

Table 4 Error causes behind deviating EQA results in the different subschemes

The time point in the TTP and cause of the problems differed significantly between the indication (NSCLC vs. mCRC), markers tested and techniques used (Supplemental Data 2B).

Definitions for the different categories and a more detailed cause of problems are given in Supplemental Data 3. Of all interpretation issues, 135 of 144 were reported in the NSCLC schemes. Of these, 51 (37.8%) were reported during interpretation of the IHC staining intensity, 40 (29.6%) during counting of the positive FISH signals, and to a lesser extent (18.5%) due to an incorrect analysis of PCR curves during variant analysis. Causes for methodological problems reported in both schemes (n = 105) occurred mostly because the laboratories were unaware that the variant tested in the scheme was not included in their analysis method (35.2%) or the method had an insufficient sensitivity to detect the variant at its respective frequency (20.0%).

Error causes for the different laboratory characteristics

The probability to encounter a specific error cause in one of the phases of the TTP related to the laboratory characteristics as collected in the EQA datasheets is given in Table 5.

Table 5 Error phase and cause related to laboratory characteristics and EQA scheme performance

Pathology laboratories were significantly less probable of making a mistake in the pre-analytical phase and to denote the received sample material as the cause. On the other hand, they more frequently reported reagent problems. Accredited laboratories less frequently encountered reagent problems.

Laboratories with a larger staff number (usually larger laboratories) had a reduced probability of encountering method-related problems. Testing more samples annually increased the chance of a personnel error to occur. Respondents who changed their testing method in the last 12 months prior to the survey were significantly more likely to obtain a problem with that methodology compared to laboratories who did not change anything to their methodology in this period. There was no significant relationship between any of the other causes and the laboratory characteristics (Supplemental Data 2C).

Detection of errors during the EQA scheme

Post-analytical problems were more likely to be detected after release of the EQA results especially for clerical and interpretation errors (Table 5). On the other hand, pre-analytical and analytical issues, such as equipment/technical or methodological problems and issues with the EQA material, were more likely to be picked up in advance (Table 5).

Laboratories with an error in the pre-analytical phase were more likely to encounter an analysis failure in the scheme. Laboratories with analytical problems more often obtained lower performance scores, and those with post-analytical problems had a significantly higher score, due to the occurrence of fewer technical failures. More specifically, personnel errors, equipment, and reagent problems lowered the score in the EQA scheme, while laboratories reporting a problem with the material were more likely to obtain a technical failure. Exact p values and ORs are shown in Supplemental Data 2C.

The EQA participants undertook specific corrective actions, which were significantly linked to the time in the TPP and cause (Supplemental Fig. 1). Respondents with a personnel error more often had an analysis error in the subsequent EQA scheme, but there was no effect by any other error cause on the performance criteria in the next EQA scheme [24].

Discussion

Several studies have evaluated the longitudinal improvement of biomarker testing in NSCLC and mCRC for different laboratories, samples, and methods [4,5,6,7,8,9,10,11,12,13]. Even though error rates are published [4,5,6,7,8,9,10,11,12,13] and some providers request root cause analyses, no information is yet available on the underlying causes for deviating EQA results in the laboratories for molecular oncology.

Response to root cause surveys

Our data on root causes of deviating EQA results demonstrated that laboratories who are accredited or test more samples annually (for ROS1, KRAS and NRAS) were less likely to receive the survey. Keeping in mind that the surveys were sent only to participants with deviating results, these findings are not surprising. It has been described that accredited laboratories testing more samples demonstrated a better performance in the EQA schemes [14]. In contrast, laboratories that outsourced (a part of) their analysis reported more EQA errors. ISO 15189 states that the laboratory shall have a documented procedure for selecting and evaluating referral laboratories, and is responsible for monitoring their quality [15]. More investigations are needed on which elements of the TTP are being outsourced in routine, the structure of laboratory networks, and how high quality is ensured.

Accredited laboratories were also more likely to reply to the survey. Participation to quality improvement projects such as survey completion or workshop attendance [25] has previously shown to increase EQA performance in mCRC and might contribute to the better performance for accredited participants. We acknowledge that not all countries have responded, and error causes might shift when taking into account data from non-respondents. Nevertheless, with data from 185 laboratories worldwide which encompassed 44.0% of the incorrect samples, this is a valuable first assessment of causes underlying deviating EQA results. The uniform taxonomy and participant-tailored surveys allowed to compare the results between the different survey rounds. A continued follow-up might be useful to evaluate if the conclusions are still valid when evaluating more respondents, as well as for other predictive markers currently not included in the schemes.

Time point in the total test process and cause of deviating EQA results

The causes of deviating EQA outcomes were related to the indication (NSCLC or mCRC) and included subschemes. It must be noted that for the FFPE samples, more interpretation problems were reported for ROS1 compared to ALK, even when tested by the same technique type (FISH or IHC) and even more so for PD-L1 IHC (Table 4). This is consistent with previously reported increased error rates for ROS1 compared to ALK, explained by an increased experience with ALK, as ROS1 testing was only approved since 2016 [12]. In the survey period, fewer guidelines were thus available for ROS1 interpretation, and no Food and Drug Administration-approved companion diagnostic (which was the case for ALK). For PD-L1, a similar assumption can be made as it is only recently required for testing and its interpretation poses additional challenges due the availability of different commercially antibodies with varying cut-offs for positivity for different therapies [26].

In case sample problems were reported for FISH, the most prominent reasons were suboptimal sample quality (20.9%) or too few neoplastic cells (14.9%) (Supplemental Data 3). Estimation of the neoplastic cell content in EQA schemes has been reported as highly variable [27]. Nevertheless, materials were carefully validated beforehand to have sufficient neoplastic cells and lacking tumor heterogeneity, and other peers were able to successfully analyze them. Even though digital FISH cases only assess the post-analytical phase, for two cases, the survey respondents mentioned a problem during creation of the images at the pre- or analytical phase to be at the basis of the interpretation error (Table 3).

For variant analysis, the laboratories frequently reported the lack of a specific variant in the analysis method (Supplemental Data 3), especially for mCRC (17.8%) compared to NSCLC (5.0%). This is a well-known problem, as in 2013, the drug label for cetuximab and panitumumab was extended to include codons 12, 13, 59, 61, 117, and 146 for both the KRAS and NRAS genes, but not all laboratories have adapted their testing strategy [11]. Also, insufficient method sensitivity was reported, as well as misinterpretation of obtained sequencing curves (e.g., results around the threshold), which are especially important in routine for variants at low frequencies such as EGFR c.2369C>T p.(Thr790Met) LRG_304p1. The number of errors reported in wild-type cases was too low to make solid assumptions.

The specific causes suggest that EQA providers could benefit from requesting root cause analyses after the schemes to provide more tailored education to participants. For instance, the provision of digital or paper-based cases to assess interpretation or variant classifications could aid in the interpretation for specific markers. Given the broad variety of methodologies used by the participants completing the survey, the performance of these methods might have further contributed to the error causes. Indeed, different performances have been reported depending on the applied PD-L1 IHC clones (personal observations), ALK IHC clones, or EGFR variant analysis method in the same ESP NSCLC EQA schemes, and depending on RAS analysis methods in the ESP mCRC EQA schemes [11, 13, 28]. Challenging samples might be included (albeit educational) with rare variants to assess the inclusion of all relevant mutations or their detection at low allele frequencies. Schemes should thus be fit for purpose [19] and should cover the entire examination process as required by ISO 15189 [15]. As the samples in the EQA scheme were pre-cut and labeled, several pre-analytical steps were outside the study scope. Research on routine cases is advised to assess problems during sample embedding, cutting, or labelling.

Error causes for the different laboratory characteristics

Previous longitudinal results indicated that experience (by accreditation, a research setting, or testing more annual samples) positively affected EQA scores [14]. Our findings revealed that personnel errors increased when testing more samples, probably due to the increased work pressure. Laboratory automation might be the way forward to reduce these errors. Also, laboratories with an increased number of staff had fewer method-based errors, by the probable larger capacity of professionally trained personnel to perform a certain method [29]. Accredited laboratories less frequently had a reagent problem, possibly due to working according to standard operating procedures. As these reagent problems significantly lowered the EQA performance, this might explain their previously better performance.

Our data also revealed that laboratories operating under the department of pathology less often reported sample-related issues (Table 3), but more frequently encountered reagent problems, as they were more frequently involved in IHC analysis compared to molecular laboratories. The positive influence of pathology review in decreasing specimen problems in this study stresses its importance to obtain accurate results further downstream the TTP.

We did not observe a difference in error rates concerning the method type (i.e., NGS versus pyrosequencing), in agreement with previous studies [14]. However, we observed that a change in test method during the last year resulted in significantly more method-related error causes, highlighting the importance of test validation before implementation in clinical practice.

Detection of errors during the EQA scheme

Post-analytical clerical and interpretation problems were less likely detected before release of the results (Table 5) in contrast to equipment, methodological, and sample-related problems. This seems logical, given that post-analytical issues occur closer to reporting of the results and have less time to be picked up by a quality control step. This might explain the higher error rates previously reported for ROS1 compared to ALK [12], as this marker now indeed revealed a large fraction of clerical and interpretation causes.

Looking at the current scheme performance (Table 5), errors in the pre-analytical phase were more prominent for participants with lower performance scores and more technical failures. This again underlines the importance of pre-analytic quality control to prevent technical failures resulting from selecting insufficient neoplastic cells [27].

None of the causes had a significant effect on future scheme performances except for personnel errors. In this case, laboratories most frequently responded by retraining their staff [24]. Also, for the majority of errors, an appropriate corrective action was undertaken (Supplemental figure 1).

Conclusions

To conclude, causes of deviating EQA results were indication, marker, and technique dependent. The phase and underlying cause differently affected the EQA performance, either by an increase in test failures or false-positive/false-negative results. Our findings advocate using surveys by EQA providers to specifically tailor the schemes for set-up, feedback, and offered sample types. Timely quality checks aid to uncover deviating results and should be additionally implemented in the post-analytical phase as these errors were often not identified in the laboratory. Accredited laboratories were more likely to respond and had fewer reagent problems, which could explain their previously reported better performance. We detected an important effect of pathology review to reduce technical failures and of protocol changes to increase method-related problems.