Introduction

The Food and Drug Administration’s (FDA) Veterinary Laboratory Investigation and Response Network (Vet-LIRN) comprises more than 40 animal diagnostic laboratories within North America. The Vet-LIRN Program Office (VPO) collaborates with the FDA’s Center for Food Safety and Nutrition and Institute for Food Safety and Health joint Proficiency Testing Program, located at the Moffett Proficiency Testing Laboratory (MPTL), to offer voluntary Proficiency Exercises (PEs) to network laboratories. Established in 2012, the Vet-LIRN Proficiency Exercise Program evaluates performance of laboratories, individual analysts, and their methods. Collaborative exercises such as Proficiency Tests (PTs) and Interlaboratory Comparison Exercises (ICE) are used to evaluate performance. Collaborative exercises may provide training to participants, assist in validation of methods, and help identify needs in development of new methods. Both chemical and microbiological analytes are used in diagnostic matrices such as feces, various animal tissues, and occasionally, animal foods. Analytes and matrices used in PEs are selected based on animal diagnostic needs, FDA surveillance priorities, and recent animal food or drug recalls. PE parameters, number of samples and replicates, analyte concentration, and acceptance criteria for results are established using International Organization for Standardization (ISO) and FDA guidelines in consultation with network laboratories, which may wish to test new or modified methods. Some PEs are repeated by using the same analyte and matrix across multiple rounds. This approach allows organizers to improve the schemes round-to-round for microbiological PEs by optimizing inoculum concentrations, number of sample replicates, bacterial strain selections, and conditions for sample handling.

The MPTL became an ISO/IEC 17043 accredited PT provider in 2017 (AP-2123). The MPTL has established a quality system with documented policies and procedures which governs each step of sample preparation [1, 2]. In collaboration with Vet-LIRN, the MPTL can offer PEs using unique matrices needed by diagnostic laboratories. In some cases, PEs have used animal tissues that contained potentially harmful chemicals or residues of interest to FDA. The use of such “real life,” naturally contaminated samples for PEs provides diagnostic laboratories unique and crucial opportunities to evaluate their routine testing procedures and results [3].

Indeed, participation in PEs can help verify laboratory performance, identify areas for improvement, and improve quality of laboratory results [2,3,4,5,6,7,8,9]; however, the cost to participate in multi-laboratory PEs may be high, limiting the opportunity to participate. In 2008, Sacchini and Freeman identified a shortage of PT programs for veterinary laboratories [5]. When testing serum for some analytes such as immunoglobulin antibodies, in 2014, Lee et al. also noted the lack of a regular PT program for the veterinary laboratory community to monitor their quality assurance [10]. Vet-LIRN’s PE program aims to fill this gap in PE providers for the veterinary diagnostic laboratory community. Additionally, Vet-LIRN’s infrastructure grant funding (PAR-17-141) actively supports the laboratories’ costs to participate in PEs. Note that the United States Department of Agriculture National Veterinary Services Laboratory (USDA NVSL) offers a PT program to veterinary diagnostic laboratories for diagnostics associated with USDA program diseases [11].

PE participation offers benefits to participating laboratories in multiple ways. By participating in a PE program, laboratories can exhibit their ability to successfully test for the analyte of interest, determine diagnostic sample testing capabilities, verify their confidence in final testing results, monitor laboratory performance, and assess results for continuous improvement. To support continuous improvement and learning, Vet-LIRN requests an internal performance review and a root cause analysis to identify corrective actions when laboratories do not achieve expected results during a PE [9].

Because network laboratories conduct testing for FDA, PEs help the agency ensure that test results are accurate and reliable [12]. Participation in PEs is also required by many accrediting bodies and may be required by customers [13]. Many Vet-LIRN laboratories are accredited either by the American Association of Veterinary Laboratory Diagnosticians (AAVLD) or to ISO/IEC 17025 standards [14] and must participate in PEs annually to maintain their accreditation. Currently, eight Vet-LIRN network laboratories have either completed or are working toward ISO/IEC 17025 accreditation, and 32 network laboratories are AAVLD accredited.

In this communication we provide a summary of all PEs offered from 2012 to 2018 and emphasize two microbiology and two chemistry exercises as examples.

Material and methods

Preparation of samples

Eight microbiology PEs (Table 1) and eight chemistry PEs (Table 2) have been offered since 2012. Starting in 2016, three ICEs were offered for chemical analytes, and one ICE was offered for microbiological analytes. For microbiology, PT or ICE matrix composition was either canine feces or raw canine food. The primary focus of the microbiology PEs is major foodborne pathogens including Salmonella, Listeria monocytogenes, and Campylobacter. Chemistry PEs offer a wide range of matrices and analytes including tissue, serum, whole blood, and milk. One major focus of this program is to provide PEs using tissue from animals previously exposed to the chemical of interest; providing diagnostic laboratories “real life” samples. Summaries of several microbiology and chemistry PE sample preparations focused on in this paper are outlined in the section below.

Table 1 Summary of microbiology Proficiency Exercises administered by Vet-LIRN and MPC 2012–2018
Table 2 Summary of chemistry Proficiency Exercises administered by Vet-LIRN and MPC 2012–2018

Salmonella PT sample preparation

MPTL prepared 2015 Salmonella PT samples by resuscitating cryopreserved (−80 °C) bacterial cultures provided by Washington State University. A bead of culture was inoculated into 10 mL of tryptic soy broth (TSB) and incubated at 37 °C for 24 h ± 2 h (based on Bacteriological Analytical Manual (BAM) Chapter 5: Salmonella)[15]. Tryptic Soy Agar (TSA) with 5 % sheep blood was used to plate cultures and incubated at 37 °C for 24 h ± 2 h. The plates were checked for purity and identified biochemically using the automated VITEK® 2 identification system. Once the cultures were confirmed pure, three working stock slants were made by streaking each culture to the surface of TSA slants in triplicate for each isolate. The slants were incubated for 24 h ± 2 h at 37 °C and stored in the refrigerator at 4 °C until needed. A 10 µL loopful of culture was inoculated into 10 mL of TSB and incubated at 37 °C for 24 h ± 2 h. A second transfer was made using a 10 µL loopful of the broth culture into 10 mL of TSB and incubated at 37 °C for 24 h ± 2 h. Cultures were enumerated using Aerobic Plate Count 3 M™ Petrifilm® and counted after 48 h ± 2 h of incubation at 35 °C. Bulk samples were prepared by mixing 500 g of thawed Salmonella negative dog feces, from multiple dogs, with 500 mL of Butterfields Phosphate dilution buffer. The appropriate amount of inoculated Salmonella culture was added to the dilution buffer before it was added to the raw feces to achieve a level of 1 CFU/g to 10 CFU/g, depending on the desired spiking level. All samples were stored at 0 °C–4 °C until shipment.

Listeria PT sample preparation

MPTL prepared 2018 Listeria PT samples by resuscitating cryopreserved (−80 °C) bacterial cultures collected from an FDA study examining the presence of bacteria in animal foods [16]. A bead culture was inoculated in 10 mL of Listeria Enrichment Broth (LEB) and incubated for 24 h ± 2 h at 30 °C under aerobic conditions (based on BAM Chapter 10: Detection of Listeria monocytogenes in Foods and Environmental Samples) [17]. The cultures were plated on TSA with 5 % sheep’s blood and incubated for 24 h ± 2 h at 35 °C. Cultures were identified biochemically using the automated VITEK® 2 identification system. Once the cultures were confirmed pure, working stock slants were made by streaking each culture on TSA slants and incubated at 30 °C for 24 h ± 2 h. The working slants were stored refrigerated between 0 °C and 4 °C for use. A loopful of growth from each working slant was transferred to LEB and incubated for 24 h ± 2 h at 30 °C. The broth cultures were enumerated on Aerobic Plate Count 3 M™ Petrifilm® to determine CFU/mL prior to spiking. The Petrifilm®plates were incubated for 48 h ± 2 h at 30 °C. Samples were prepared by aseptically placing one frozen meat patty weighing approximately 25 g into a 1.5 oz sterile plastic jar. The patties were thawed overnight between 0 °C and 4 °C in the refrigerator. The appropriate amount of inoculated Listeria culture was diluted in Butterfield’s Phosphate Buffer and the correct amount was added individually to each jar to achieve the desired spiking level. Once spiked, the samples were held between 0 °C and 4 °C until packaged for shipping.

Melamine PT sample preparation

In 2014, control and melamine contaminated fish fillets were provided by the Center for Veterinary Medicine’s (CVM) Office of Research Aquaculture Team in accordance with the principles stated in the Guide for the Care and Use of Laboratory Animals [18]. The contaminated catfish were prepared by feeding fish with melamine (10 mg/kg or 20 mg/kg BW) and/or with cyanuric acid (10 mg/kg, 20 mg/kg, or 40 mg/kg BW). The catfish were euthanized 1 or 3 days after feeding and the fillets were processed by cutting into slices and then homogenized together with dry ice in a Hobart blender. The resulting powder was subdivided and stored frozen (≤ −25 °C) until shipping. The concentration of melamine and cyanuric acid in these fish muscle samples were determined using an FDA method [19]. Based on the melamine and cyanuric acid concentrations present in the fish samples, one control and five contaminated fillets were selected to prepare PE samples. Table 3 shows concentrations determined by the MPTL.

Table 3 Melamine and cyanuric acid concentrations in fish fillet used during PT in June 2014

Anticoagulant rodenticides ICE sample preparation

In 2017, the Veterinary Diagnostic Laboratory at Iowa State University, College of Veterinary Medicine in Ames, Iowa, under their approved IACUC protocol, collected canine and equine liver samples. The University of Kentucky screened the liver samples for anticoagulant rodenticides of interest to confirm concentrations. Samples were divided into five batches based on concentration. Each batch was further prepared by adding (3 × 4 mL) of spiking solution to 600 g pre-homogenized liver sample. The liver sample was homogenized for 30 s at low speed after adding the 4 mL spiking solution each time and further homogenized for 2 min after adding the last 4 mL. The homogenized samples were subdivided, tested for homogeneity, and stored at ≤ −25 °C until shipping.

Homogeneity and stability

Homogeneity and stability of PT samples are critical factors to address during sample preparation [20,21,22] and are required by ISO standards [14, 23]. The MPTL completes homogeneity and stability testing during the preparation of samples. Using randomly chosen test samples, analyzed in duplicate, homogeneity and stability testing is completed according to conditions outlined in ISO 13528 [23]. For quantitative PEs, the homogeneity check conditions include testing a minimum of 10 samples in duplicate and using the data to calculate homogeneity sample mean, within-sample standard deviation, and between-sample standard deviation. The number of samples for the homogeneity check may be reduced if data are available for similar samples prepared previously by the same procedures. Stability for microbiological samples, i.e., ability to obtain cultures, is established for up to nine days. Chemistry sample stability is usually confirmed for a two-week timeframe, although the samples may be stable for much longer periods.

Participation

The Vet-LIRN PE program is open to active network laboratories and, in more recent years, to laboratories that also participate in the Food Emergency Response Network (FERN) and other networks. Vet-LIRN has modified the program to allow multiple analysts within a laboratory to receive separate sets of test samples and report their results. Laboratories were only allowed to receive one sample set for a single analyst previously. From 2012 to 2018, over 120 individuals participated in the PE program. On average, there were 35 analysts in a microbiology PE and 21 analysts in a chemistry PE.

Results reporting

Laboratories report results through a secure reporting portal. Results are downloaded and analyzed by organizers. Analysts provide date sample received and condition of sample. Method information is captured along with instrumentation and limit of detection or quantification when requested. Analysts may also provide information on how frequently the method is used.

PT evaluation: assigned value by consensus for qualitative data

Due to lack of certified reference materials and fully validated reference methods for matrices of interest, performance of a PT participant is assessed using assigned values determined by consensus agreement based on results reported by all participants in accordance with ISO 13528 and ISO/IEC 17043 [23, 24]. In qualitative PTs, consensus agreement is defined as ≥ 80% agreement among results from all analysts. If consensus agreement is not met, those results are not scored. This is often the case for low concentration challenge samples designed to assess a method’s limitations under extreme conditions rather than analyst capabilities. The overall performance of each analyst is evaluated using combined performance scores [13]. Analyst performance is considered satisfactory if an analyst identified ≥ 75% results (out of all scored PT samples) correctly. Descriptive statistics including sensitivity and specificity rates [23] are often included in reports to summarize overall results as well. Sensitivity rate (rSE) is calculated according to the following formula from ISO 22117:

$${r}_\text{SE}=\frac{{n}_{+}}{{E(n}_{+ tot})}$$

where n+ is the number of positive results found and E(n+ tot) is the total number of expected positive samples. Specificity rate (rSP) is calculated according to the following formula from ISO 22117:

$${r}_\text{SP}=\frac{{n}_{-}}{{E(n}_{- tot})}$$

where n- is the number of negative results found and E(n- tot) is the total number of expected negative samples.

PT evaluation: Z-scores for quantitative data

For quantitative data, the assigned value/PT mean (xpt) for the measurand and the standard deviation of the PT (σpt) are computed using Algorithm A from consensus values of combined replicate results reported by the participants. MPTL completes statistical analysis of the data for each PT using ProLab Plus software developed to assess the quality and accuracy of results. The z-score value shows how far, in standard deviations, a reported data point is from the mean or average of a data set. This is known as standardizing; thus, participants receive standard z-scores. The formula for z-score calculation is as follows (ISO 13528:2015) [23]: zi = (xixpt)/ σpt (where xi is the reported value, xpt is the PT mean/assigned value, and σpt is the standard deviation for the PT, also referred to as target standard deviation) [23]. Normally distributed data shows 95% of values within 2 σ of the mean and 99.7% of values within 3σ [25]. According to ISO 13528 guidelines, results with a z-score (|z|) greater than 2 are considered questionable because only 5 % of correct measurements are expected to be that different from the assigned value [23]. Results with a z-score (|z|) equal or greater than 3 are considered unsatisfactory because only 0.3 % of correct measurements are expected to be that different from the assigned value [13].

The interpretation of z-scores for quantitative results within PT reports are as follows [23]:

  • |z-score| ≤ 2 is acceptable and is indicative of satisfactory performance

  • 2 < |z-score| < 3 is flagged in yellow; analysts/laboratories are issued a “warning signal”

  • |z-score| ≥ 3 is flagged in red; analysts/laboratories are issued an “action signal”

The standard practice is to statistically score data when ≥ 80 % participants reported quantitative data for that sample. Traditionally, if an analyst reports “less than” a certain value when ≥ 80 % of the participants submitted quantitative data, that analyst receives a non-passing z-score (|z|) of 3.0.

ICE evaluation

In addition to PTs, the Vet-LIRN’s PE program offers such collaborative exercises as ICEs. These exercises are primarily designed to assist in evaluating performance of newly developed and recently modified methods or to explore potential uses of existing methods for additional matrices (new matrix extension). ICEs are designed to provide Vet-LIRN laboratories a safe and structured way to evaluate a method that their laboratory is using or planning to use. Although the ICEs are not PTs, the basic procedures used to prepare for the collaborative exercise (preparation of instructions, preparation of samples, shipping of samples, submitting results, stability and homogeneity) all follow ISO/IEC 17043 guidelines. The difference between an ICE and a PT is that the ICE results are summarized using descriptive statistics and compared to multiple estimation values such as target sample spike concentration, consensus of reported results by all participants, MPTL’s results based on application of partially validated methods, and expert opinion. The ICE approach has evolved as a need to evaluate performance of small number of participants (< 15) using methods for exotic/rare matrices and chemicals for which no certified reference materials or reference methods are available. In ICE, analysts are not graded but may be provided with graphical representations of their data in comparison to each other and multiple estimation or assigned values. It is expected that there may be multimodal distributions, especially if results were submitted from methods still under evaluation. ICE evaluations may provide information on the specificity rate, sensitivity rate, and accuracy rate. Each laboratory is expected to use the data to evaluate their method, explore alternate method performance, and to train analysts in a method.

The final report for an ICE should not be used to assess the laboratory’s performance, but it can serve as an example of the laboratory’s desire to continually improve. The organizers occasionally request information from selected laboratories about results that appear to be outliers to determine if those results should be included in certain computations.

Results

Microbiology

Microbiology PEs for Listeria and Salmonella are offered over multiple rounds. In each round the preparation of the PE samples improved along with the laboratory performance (Table 8). Vet-LIRN PEs use matrices that are not only a challenge for the analyst, but also challenging for the PE provider. The MPTL continuously strives to improve sample preparation and the overall administration of PEs over time. The trend of increasing proportions of correct results from 2012 to 2018 is an indicator of improved PE study design and execution as well as enhanced laboratory performance.

Salmonella PT

One of the main reasons Vet-LIRN offered the Salmonella PT was to ensure that specific laboratories participating in a study to evaluate the prevalence of Salmonella in dog and cat feces in the United States demonstrated their ability to use a harmonized method to accurately diagnose Salmonella presence in a fecal sample [26]. Round 1 for Salmonella was offered in February 2012 and twenty-six laboratories participated in this first PT. The bacterial strain used in this PE did not survive well in the fecal matrix. Future rounds replaced the strain with a strain specifically isolated from dog feces. Round 1 was able to demonstrate that the harmonized method performed better than other methods with 8 out of 11 participating laboratories correctly identifying medium and high spiked samples. Salmonella PE round 2 used a strain of Salmonella Typhimurium isolated from dog fecal samples provided by Washington State University. By using a strain originally isolated from dog fecal samples, VPO and MPTL hoped to reduce matrix effect and increase culture stability. Again, 26 laboratories participated and this time 20 laboratories identified all samples correctly. Five laboratories missed one sample and one laboratory missed two samples. Laboratories participating in the study identified all spiked samples correctly, but two laboratories showed cross-contamination in negative samples which they detected as positive. Round 3 of the Salmonella PE increased the difficulty of the PE with the addition of Salmonella Heidelberg, an atypical H2S-negative strain. The goal was to create more realistic testing scenario, because a variety of strains may be isolated from dog fecal samples. Twenty-five laboratories participated and twenty-two of them correctly identified all samples. Three laboratories missed two of the eight samples. Round 4 of the Salmonella PE was a repeat of Round 3 with an increased sample size. Twenty-five laboratories participated and nineteen of them correctly identified all samples. No false positives were reported. False positive and false negative information is reported in Table 4. Two laboratories could not identify the atypical Salmonella strain. Eight out of nine study laboratories reported correct results for all samples.

Table 4 Summary of false positive and negative rates for microbiology Proficiency Exercises administered by Vet-LIRN and MPC 2012–2018

Listeria PT

The goal of the Listeria PT, offered in multiple rounds, was to assess Vet-LIRN laboratories’ ability to detect Listeria spp. in raw pet food products. In 2014, FDA reported that Listeria monocytogenes was present in raw pet food products and thus is a potential health risk for both humans and animals [16]. In 2016, six different raw pet food products were recalled for potential contamination with Listeria [27]. Recently, Listeria monocytogenes in cats was confirmed to be caused by consumption of raw pet food [28]. Due to the documentation of Listeria in raw products and the increased number of recalls from one in 2014 to nine in 2018 [27], it is vital for Vet-LIRN laboratories to be able to detect Listeria spp. in raw pet food products. The first Listeria PE was offered in July 2014. Twenty laboratories participated in Round 1 with results from 26 analysts. Raw pet food test samples were spiked at very low levels with strains of Listeria including L. monocytogenes, L. innocua, and L. welshimeri. All strains were previously isolated from raw pet food products [16]. All analysts accurately reported no Listeria in three out of four blanks samples. One analyst reported a false positive in one of the blank samples (Table 4). Seventeen of the 26 analysts reported correct results in eight of the 12 samples. Four were not evaluated for accuracy because there was no consensus among analysts; all concentrations were low (0.16 CFU/g). Round 1 highlighted the difficulties of conducing a proficiency test designed to test very low spike concentrations of Listeria spp. in a challenging matrix. VPO and MPTL also identified several issues with packing. Subsequent Listeria PE rounds were conducted with higher spike levels and clearer reporting instructions; resulting in improved detection rates among laboratories. Thirty-seven analysts from 26 laboratories participated in Round 2. Raw pet food products were again spiked with various strains of Listeria spp., but this time higher spike concentrations were used. One very low challenge sample was also sent and excluded from scoring because the expectation was fractional recovery in which 25% -75% of participants would report a result of detected, and thus did not meet consensus criteria to be scored in the final report. Laboratories were instructed to report detected or not detected for Listeria spp. and only required to speciate if their method was able to do so. In Round 1, there was confusion over reporting requirements and some laboratories were not able to speciate. Thirty-five out of 37 analysts reported satisfactory results. Round 3 was offered in January 2018 with 42 analysts from 27 laboratories participating. Again, several Listeria spp. were used with one very low challenge sample which was not scored. Forty out of 42 analysts had satisfactory results. There were no false positives in Round 3. Results in Table 5 summarize sensitivity and specificity rates for the cultural and PCR detection of Listeria species in raw canine food over the three rounds of the Listeria PT. These results illustrate how overall participant performance can improve round-to-round and highlight the importance of improving sample preparation, composition, and instructions round-to-round for repeated PEs.

Table 5 Sensitivity and specificity for the detection of Listeria species in raw canine food by cultural and PCR methods over three PT rounds

Chemistry

Chemistry PEs are typically offered as a single round; however, with the introduction of interlaboratory comparison exercises, some matrix and analytes were repeated. Chemistry PEs focus mainly on diagnostic samples; thus, they provide participants unique test matrices that are not offered by most other PT providers.

Melamine PT

Food for human and animal consumption has been adulterated with melamine and cyanuric acid for economically motived reasons because these compounds can increase apparent protein content [19, 29]. The purpose of this PT was to evaluate Vet-LIRN laboratories’ ability to detect and quantify melamine and cyanuric acid in fish, for human consumption, at concentrations close to the level of concern (2.5 mg/kg) [30]. One laboratory was not scored because their method was not sensitive enough to detect the level of concern. Overall, five laboratories showed their ability to determine melamine and cyanuric at levels close to the level of concern. One laboratory reported a false positive for an untreated sample. Laboratories were not only capable of screening, but also quantifying melamine and cyanuric acid at lower levels, which is important for diagnostic purposes. The exercise revealed that network laboratories can analyze large numbers of samples in relatively short period of time which is essential during a potential adulteration event.

Anti-coagulant rodenticides ICE

The first ICE offered using a method developed by a network laboratory examined the ability of laboratories to detect anti-coagulant rodenticides (ARs) in animal liver. ARs are used to control rodent populations, but can be ingested by non-target species, either accidentally or due to malicious baiting. In 2014, Vet-LIRN funded a project at the University of Kentucky to develop a network method to quantify ARs in animal liver [31]. The method was tested successfully under blinded method conditions and then provided to the network as a Vet-LIRN recommended method. An ICE, offered in May 2017, evaluated the performance of this and other methods used by Vet-LIRN diagnostic laboratories to quantify eight ARs in canine and equine liver. Twelve liver samples containing various spiked concentrations of eight ARs were sent to 14 analysts in 13 different laboratories. Three of the 13 laboratories used the recommended method with no modification or minor modification. These three laboratories performance scores were 94% or above for all ARs. Four laboratories used the recommended method with major modifications and their laboratory performance scores varied greatly. All other laboratories used internal methods and their laboratory performance scores also varied greatly. All laboratories did accurately report low concentrations in low spike samples and high concentrations in high spike samples.

The ICE demonstrates that the recommended method works well and there is room for improvement or changing a laboratory’s method to the Vet-LIRN recommended method.

False positive and false negative rates for multiple exercises

The false positive/ false negative rates were calculated for each PT or ICE and are shown in Tables 4 and 6. False positive is the probability of the method providing a positive result when the sample does not contain the analyte. The false negative rate is the probability of the method providing a negative result when the sample does contain the analyte. Edson reviewed pathogen detection in food microbiology laboratories and showed that in over nine years of PEs, laboratories detected Listeria monocytogenes with a 7.2% false-negative rate and Salmonella spp. with a 5.9% false-negative rate [32]. Atypical strains of bacteria lead to higher false-negative [32]. Salmonella inoculated at low concentrations (1–10 CFU/g) resulted in increased false-negative responses [33]. Both Edson and Augustin note that that there was no improvement in pathogen detection over time [32, 33]. In the Vet-LIRN PE program, the false negative rate for Salmonella is 5.3 % for 3 PTs spanning several years. The false negative rate went up across rounds and this may be due to the introduction of atypical strains as well as lower inoculation levels. The first scored round of the Salmonella PE included one typical strain at three different levels. The low concentration was the cause of the false-negative responses. In the second and third scored round for the Salmonella PT, Salmonella Heidelberg, an atypical H2S negative strain, was introduced. The second round contained two Heidelberg samples and the third round contained five Heidelberg samples. The third round also used much lower inoculation levels, going from 10,000 CFU/g in round 2 to 10 CFU/g, 5 CFU/g, and 1 CFU/g in the third round. The false negative rate increased during the third round.

Table 6 Summary of false positive and negative rates for chemistry Proficiency Exercises administered by Vet-LIRN and MPC 2012–2018

Frequency of use

Griffin identified that when laboratories regularly test large numbers of specimens they perform better than laboratories testing smaller numbers of specimens [34]. Veterinary diagnostic laboratories test large numbers of microbiological specimens every year. Fewer veterinary diagnostic laboratories complete toxicology testing services. Vet-LIRN started tracking how PE performance results may be affected by frequency of method use. During Round 3 of the Listeria PT, analysts identified the frequency of method used for both culture and PCR results (Table 7). Analysts could report if the method was used regularly (weekly), intermittently (every 3–6 months), or infrequently (only as needed). For culture methods, two analysts using the method weekly reported one false-negative each. Of the three analysts using the method infrequently, two had a single false-negative and one had two false-negatives. A larger number of analysts reported use of the method was infrequent, but their performance score was not negatively impacted.

Table 7 Frequency of cultural and PCR method use for the detection of Listeria species in raw canine food in round 3 of the PT

Accuracy

Sensitivity, specificity, and accuracy rates (rSE, rSP, rAC) were calculated for each repeated microbiological PE (Table 8)., The largest increase in rSE and rAC was from round 1 to round 2 for the Salmonella, Listeria, and Campylobacter PEs (Table 8). Overall, rSP was less variable than rSE and rAC and ranged from 94 to 100 % across all PE rounds for all organisms (Table 8). Figure 1 shows mean rAC for all the microbiological PEs, and in each case, round 1 accuracy rates are significantly different than subsequent rounds. These results are likely due to a combination of factors including better PE design and sample preparation, clearer instructions, and improved analyst performance.

Table 8 Summary of sensitivity (rSE), specificity (rSP), and accuracy (rAC) rates of microbiology proficiency tests administered by Vet-LIRN and MPC 2012–2018
Fig. 1
figure 1

Mean accuracy rates for the detection of a Salmonella in canine feces b Listeria in raw canine food and c Campylobacter in canine feces. Performance is summarized as mean accuracy rate differences amount rounds detected by Tukey's test for ANOVA (error bars represent standard error)

Discussion

Vet-LIRN offered 16 PTs and 4 ICEs to network laboratories over 6 years, which considerably expanded the number of PEs using veterinary matrices available to veterinary laboratories. The Vet-LIRN PE Program may well fill the gap, noted in 2008 by Sacchini, by providing PEs for veterinary laboratories at no charge with matrices and analytes that focus on animal diagnostic needs, FDA surveillance priorities, and recent animal food or drug recalls [5]. PTs and ICEs allow laboratories to assess and improve performance of standardized methods and their own methods [35,36,37,38]. Laboratories need a system in place that identifies and reduces errors [34]. On average, 25 laboratories participated in microbiology PEs, and 16 laboratories participated in chemistry PEs.

Novak argues that PTs do not improve laboratory performance over time [39]; however, participant level population turnover is not addressed. There is no information on laboratory personnel turnover, because we did not allow individual participants until later years of the PE program. The participants can evaluate results after a PE and determine what kind of improvements, if any, should be made. One would expect that the analyst can apply those findings to the next round. However, if there is high staff turnover in a laboratory, then each time the PE is run at the laboratory a new participant would not learn from the previous round. The Vet-LIRN PE program plans to address this in the future. In more recent PTs and ICEs, we offer the opportunity for multiple participants at each laboratory, and each participant is tracked over time. This will be especially useful for microbiology PEs, which are normally offered in multiple rounds and have more participants than the chemistry PEs. By participating in PTs and ICEs, laboratories are showing their staff that they are committed to implementing quality standards, improving overall performance, and offering learning opportunities.

The guidance in ISO 13528 for statistical review of PT results states that faults in administration of the PT may be apparent after multiple rounds of a PT scheme and that poor results could be due to unclear instructions [23]. The Vet-LIRN PE program offered microbiology PEs in multiple rounds and learned from each round. After the first round of the Salmonella PE, the PE providers changed the inoculation isolate to deal with the poor growth of the strain used in the fecal matrix. The first Listeria PE showed organizers that unclear instructions resulted in lack of consensus among laboratory results. A PE is a learning opportunity not only for the participating laboratories, but also for the PE providers. Over time, Vet-LIRN and the MPTL improved schemes for PEs, developed better instruction documents, and streamlined communications to enhance PTs and ICEs.

To appropriately trend results across rounds, Vet-LIRN may need to consider repeating PEs with consistent schemes and assessment criteria to evaluate laboratory and analyst performance effectively [40]. To date, if a PE was repeated, Vet-LIRN and the MPTL worked to improve the scheme, especially with microbiology-based PEs. Overall, there is evidence to show that continued participation in PEs improves laboratory performance, but there are limitations of PE evaluations. Laboratories may become familiar with PE schemes and encourage only their best analysts to participate. Poor performing laboratories may not participate in multiple rounds. Each analyst may be given extra time and care to each sample for PE analysis. Even with multiple limitations, the goal is that PEs offer diagnostic laboratories the ability to improve quality systems and learn from mistakes.

In recent years, the Vet-LIRN PE Program offers more ICEs which allow laboratories to assess newly validated methods. These exercises help laboratories identify the strengths and weaknesses of their testing services and provide them with support to continuously improve performance.

Overall, there is a large interest from laboratories to participate in the Vet-LIRN PE Program. Vet-LIRN will continue to offer PTs and ICEs to network laboratories and receive the insight into what laboratories would like to test to improve their quality systems.